Genomes of these 25 species will be sequenced to mark 25th anniversary of Wellcome Trust Sanger Institute
Red squirrel, golden eagle and fen raft spider among those selected
The struggling red squirrel, the iconic golden eagle, the intriguing fen raft spider and the invasive giant hogweed are among 25 species that we will soon better understand thanks to a project led by the Wellcome Trust Sanger Institute.
Thousands of schoolchildren and members of the public around the world participated in an online vote to select the final five species for the 25 Genomes Project.
Now the genome – the complete set of genetic information – of each species will be sequenced, meaning scientists will unravel all the DNA instructions that enable the organisms to grow and develop.
The project marks the 25th anniversary of the institute in Hinxton, which is a world-renowned leader in the field of genomics.
Founded in 1993 by Prof Sir John Sulston as part of the Human Genome Project, the Sanger Institute made the largest single contribution to the gold standard sequence of the first human genome, published in 2003, by sequencing eight of the 23 human pairs of chromosomes.
Professor Sir Mike Stratton, director of the Wellcome Trust Sanger Institute, said: “Twenty-five years ago the field of genomics was a budding idea and its implications only dreamed of.
“Today the reality of genomics and biodata is that it is transforming our understanding, diagnosis and treatment of diseases, ranging from cancer and heart disease to malaria and infections.
“The science and technology that is driving this era of discovery is accelerating our understanding of the human body, but also of the world around us.
“This project has come after many thoughtful conversations around the world with regard to how many of the species on our planet could be sequenced in the coming decades – in principle, all of them.
“We are embarking on our contribution to sequencing all life on Earth.”
The Cambridge Independent has reported how the genomes of Anopheles mosquitoes from locations across Africa had been sequenced by the Sanger Institute and an international consortium of research organisations to help uncover why some are becoming resistant to insecticides used in the fight against deadly malaria.
The study found millions of variations in the genomes of the mosquitoes, which helped to explain why insecticides targeting specific genes might prove ineffective.
The 25 Genomes Project could help explain some of the mysteries surrounding the selected species, which are grouped into five categories:
■ flourishing – species on the up in the UK;
■ floundering – endangered and declining species;
■ dangerous – invasive and harmful species;
■ iconic – quintessentially British species that we all recognise; and
■ cryptic – species that are out of sight or indistinguishable from others based on looks alone.
Dr Julia Wilson, associate director of the Wellcome Trust Sanger Institute, said: “Through sequencing these 25 genomes, scientists will gain a better understanding of UK species, how they arrived here, their evolution, and how different species are adapting to a changing environment.
“The results could reveal hidden truths in these species, and will enable the scientific community to understand how our world is constantly changing and evolving around us.
“We want to celebrate the 25th anniversary of the Sanger Institute in a special ‘Sanger’ way, and I am excited to see how the 25 Genomes Project unfolds.”
Among the mysteries that the project could solve is why some brown trout migrate to the open ocean while others do not.
It might reveal more about the magneto receptors in robins’ eyes that allow them to ‘see’ the magnetic fields of the Earth.
And it could help explain why our red squirrels are vulnerable to the squirrel pox virus, while grey squirrels – which were introduced to the UK – can carry and spread the virus to reds without becoming ill.
It is hoped the work might indicate how UK species are responding to environmental pressures.
The results of all the studies will be made publicly available, providing information for future studies and aiding conservation and understanding of the species.
Key reference genomes of many species have already been sequenced, such as those of the mouse, zebrafish, pig and gorilla.
Researchers have also learnt much from reference genomes of infectious diseases and bacteria, including salmonella, MRSA, chlamydia and malaria.
The institute is partnering with other industry leaders – including Pacific Biosciences of California, 10x Genomics and Illumina – to sequence the genomes.
It will employ PacBio long-read sequencing technology from Pacific Biosciences to generate high-quality genomes.
Ken Skeldon, head of public engagement at the Wellcome Genome Campus, said: “Giving the public the opportunity to choose which species have their genomes sequenced in the 25 Genomes Project has brought new perspectives to the project.
“We are delighted to see that so many people and schoolchildren across the UK and beyond have actively engaged in online chats with the scientists and voted for the final five species.”
The voting for these took place over five weeks on the ‘I’m a Scientist, Get me out of here’ website, produced by Mangorolla CIC, and 42 UK species were in the running, each of them championed by a scientist.
Students and the public cast more than 4,000 votes and took part in more than 100 online live chats, asking species champions 500 questions about the species and DNA sequencing.
Tim Littlewood, head of life sciences at the Natural History Museum, said: “The Natural History Museum is proud to be collaborating with the Sanger Institute to celebrate their 25th birthday and also to celebrate the advances that molecular techniques such as genome sequencing can bring to the study of UK wildlife.
“The 80 million specimens we care for, from around the world, hold a wealth of genetic information that enables us to conduct innovative research, addressing global challenges.
“A focus on UK biodiversity with cutting-edge technology is particularly welcome.”
The 25 species that will have their genomes sequenced are:
Flourishing species:
• Grey squirrel
• Ringlet butterfly
• Roesel’s bush-Cricket
• Oxford ragwort
Floundering species:
• Red squirrel
• Water vole
• Turtle dove
• Northern February red stonefly
Dangerous species:
• Giant hogweed
• Indian balsam
• King scallop, also known as the great scallop, coquilles Saint-Jacques
• New Zealand flatworm
Iconic species:
• Golden eagle
• Blackberry
• European robin
• Red mason bee
Cryptic species:
• Brown trout
• Common pipistrelle bat
• Carrington’s featherwort
• Summer truffle
Five species chosen by the public:
• Common starfish
• Fen raft spider
• Lesser spotted catshark
• Asian hornet
• Eurasian otter
A brief history of sequencing
The building blocks of DNA are called nucleotides.
Sequencing involves working out the order of these nucleotides, or bases, in DNA.
Nucleotides are formed of adenine (A), thymine (T), cytosine (C) and guanine (G).
A human genome, a unique sequence of DNA, is more than three billion letters long – and is found in almost every cell of the body.
The Human Genome Project, completed in 2003, revealed this genetic blueprint.
It followed 13 years of work, but the foundation for it was laid much earlier.
Augustinian monk Gregor Mendel is described as the father of modern genetics. He experimented between 1856 and 1863 with crossing pea plant varieties in his monastery garden, indicating how traits could be passed down from parents – by what we now know are genes. His work on plant hybridisation was presented in 1865.
In 1869, Swiss researcher Friedrich Miescher isolated ‘nuclein’ – DNA with associated proteins – from cell nuclei, although its significance lay hidden for many years. He and other scientists continued to believe it was proteins that passed on traits from parents to children.
In 1952, University of Cambridge graduate Rosalind Franklin created Photograph 51 using X-ray crystallography, which showed a pattern that indicated the helical shape of DNA.
In 1953, in Cambridge, Francis Crick and James Watson famously discovered the double helix structure of DNA – announcing it in the Eagle pub. They revealed how DNA is composed of two strands of nucleotides coiled around each other, linked by hydrogen bonds and running in opposite directions.
In their model, they showed how an A on one strand is always paired with T on the other, while C is always paired with G. They suggested this structure allowed each strand to be used to reconstruct the other – which helped explain the passing on of hereditary information between generations.
Marshall Nirenberg then unlocked the genetic code for protein synthesis in 1961.
It was in 1977 that Frederick Sanger, at the MRC Laboratory of Molecular Biology in Cambridge, developed a rapid DNA sequencing technique, known now as the Sanger method, to determine the order of nucleotides in a DNA strand.
In it, enzymes are used to synthesize, or create, short pieces of DNA. The reaction is brought to a stop by adding a ‘terminating’ base to the stretch of DNA being synthesized. Sanger did this by removing an oxygen atom.
Radioactive markers are used to tag terminating bases so they can be identified. The DNA fragments, of different lengths, are separated by how rapidly they move through a gel matrix when an electric field is applied – electrophoresis.
Sanger is one of the few scientists who has been awarded two Nobel prizes - one came for the sequencing of proteins, especially insulin, in 1958, and the other, in 1980, for the sequencing of DNA.
Today’s sequencing machines have rapidly increased the speed at which the process can be completed.
In the 100,000 Genomes Project – sequencing the genomes of 100,000 people – Illumina’s machines are being used. These can sequence one human genome in about a day, although the analysis takes much longer.
It involves comparing the positions of genes to a reference genome. Every person will have millions of variants, or differences, in their genome to this reference. Most will be harmless, but some could be the cause of disease. Bioninformaticians use an array of tools to whittle down the millions of variants to the handful that could be harmful.