Fundación General CSIC

Lychnos

Notebooks of the Fundación General CSIC <span>Digital Edition</span>


Go back to Articles

   Print

Research on Threatened Species

TONI GABALDON

Universitat Pompeu Fabra

The Iberian lynx joins the genomic age

This article describes how the Centro Nacional de Análisis Genómico (CNAG) has applied Illumina technology, one of the most widely used today, to sequence the genome of the Iberian lynx. Like other genome sequencing strategies, Illumina sequencing starts with the massive random fragmentation of the genomic material.

Share |
Introduction
Although Candiles does not realise it, he is about to write an important page in the history of his species, in millions of combinations of just four letters: A, C, G, and T. This male Iberian lynx (Lynx pardinus), drawn from the population in Sierra Morena, was the individual selected to produce the reference genome for the species (Figure 1). Applying the latest technologies in the HIGHLIGHTSProfile: Toni Gabaldon
art of deciphering an organism’s genetic material to a tiny sample allows us to open a window on to the past and present of this species. This window will also be a valuable tool in ensuring the future of the Iberian lynx, the world’s most threatened cat. The aims and general features of the Iberian lynx sequencing project have been described in a previous article in this journal (Godoy J.A. (2010) Lychnos 3:02.1). This article will look at the general sequencing strategy and comparative analysis of the genome.

Biology has undergone a transformation in recent years, driven by new mass sequencing technologies. These techniques, based on the parallel determination of the sequences of millions of short DNA fragments allow a full genome to be deciphered at previously unimaginable speed and cost. Whereas the first draft of the human genome involved hundreds of researchers over several years and cost around three billion dollars, nowadays a human genome can be sequenced by just a handful of researchers in a few months at a cost of a few thousand dollars. Part of this reduction in the cost and effort derives from the fact that we already have a reference human genome available, which makes it easier to reconstruct that of a new individual. However, it is undeni­able that today’s technology makes determining a full genome a much more affordable goal. Thanks to this progress, genomic information is now available for thousands of species, although the bulk of the organisms sequenced so far have been bacteria (around 3,000) and fungi (around 200), as they have much smaller genomes and are of considerable interest from the clinical or industrial viewpoint. However, the number of complex organisms for which we have the full genome sequence has begun to increase considerably. Thus, we currently have some 30 genomes of plants and 60 of vertebrates, and there are ambitious international initiatives to sequence not just one but thousands of species, such as the Genome10K project, which aims to obtain the sequences of 10,000 vertebrate species and the i5K initiative, which aims to do the same thing for 5,000 insects and other arthropods.


lince iberico

Figure 1. Candiles, the specimen sequenced. / Photo: María José Pérez.


Although clinical and industrial uses remain the main reason for sequencing a species, conservation biology has not been forgotten by the genomic era. Genomic projects have begun to turn their sights on endangered species, despite their lack of clinical or industrial interest. The sequencing of the giant panda (Ailuropoda melanoleuca) by the Beijing Genomics Institute (BGI) in 2009 was a milestone in the genomics of endangered species. Multiple benefits are expected from obtaining the complete genome of a protected species. Firstly, having the genetic map will represent a huge leap forward in our understanding the biology of the species, as it reveals the activities coded by the genome. Analysis of the panda’s genome, for example, revealed an absence of genes involved in breaking down cellulose, thus indicating the importance of bacterial flora for its herbivorous diet. The full genome of an endangered species offers powerful tools with which to study population dynamics, demographic structures and particular aspects of special relevance, such as the response to infectious diseases. An individual’s genome contains information about past events and enables some of the risks populations face to be assessed, such as the lack of genetic diversity or accumulation of harmful genetic vari­ations. In the case of the panda, studying the genome revealed the existence of wide genetic diversity, twice that found in humans, thus holding out hope for their potential to recover.


Strategy for sequencing and annotating the genome

The Centro Nacional de Análisis Genómico [National Genomic Analysis Centre] (CNAG) has used Illumina technology, one of today’s most commonly used next generation sequencing techniques, to sequence the genome of the Iberian lynx. Like other genome sequen­cing strategies, Illumina sequencing is applied after massive random fragmentation of the genomic material. Fragments of the appropriate size are selected and subjected to initial treatment. The sequences are then read automatically in a sequencer (Figure 2), a complex device able to perform millions of chemical reactions and take their readings in parallel. In Illumina this parallel processing is based on joining and amplifying the fragments on a solid surface, so that each sequencing reaction, and its reading, is performed simultaneously in a localised way. Deciphering the sequence consists of “sequencing by synthesis,” i.e. nucleotides are read as they are added by copying the DNA molecule by means of an enzymatic replication process. This is achieved using nucleotides with fluorescent marking and controlled replication termination, allowing the genome to be read “nucleotide by nucleotide.” The real advantage of this technology is that millions of fragments are read in parallel during the operation. However, there are limits to the length of sequence that can be read reliably. This is currently around 100 nucleo­tides.


CNAG

Figure 2. Illumina sequencers at CNAG. / Photo: CNAG


Reading short fragments suffers from the drawback that it makes the process of reconstructing the linear sequence of the genome more difficult. In the absence of a reference genome this is done by “assembling” the fragments read. Assembling the genome entails reconstructing the order of the fragments based on detection of sequence overlaps. The process can be understood by analogy to reconstructing Don Quixote from random fragments of text. Thus, the existence of overlaps between the fragments “In a village of La Mancha,” “La Mancha, the name of which,” and “the name of which I have no desire to call to mind” would allow us to reconstruct the first line of the first chapter. As the genome is written with just four letters (the four types of bases making up the DNA) and some zones are highly repetitive, assembly is much more complex than our example would suggest, and genomic assembly is an area of intensive research. One fruitful strategy for genome assembly, despite the length limits that exist, is sequen­cing from both ends of a fragment. This produces two related readings (obtained from the same fragment) a known distance apart. In the case of the Iberian lynx genome, readings of 100 paired nucleotides were combined in this way using two lengths of insert: one short range, with around 500 nucleotides per insert, and one long range, with around four thousand nucleotides. Combining a total of three billion of these readings en­abled a provisional assembly which gave high resolution: each nucleotide was read, on average, a hundred times (100x). However, the assembly is still very fragmented, as is typical of this type of sequencing technology, with the presence of thousands of fragments of differing lengths (from 2,000 to 100,000 bases). The assembled sequence comes to a total of 2.7 million bases, very close to the size of genome estimated for the Eurasian lynx, the closest species (2.86 million), and the preliminary results suggest it contains the majority of the non-repeating regions and the vast majority of genes. Even so, work is underway on increasing the contiguity of the assembly by combining two additional strategies. One of these consists of sequencing random regions of the genome previously cloned in a vector. These cloned regions of approximately 40,000 base pairs are being used in two complementary ways. Firstly, sequencing their ends yields paired sequences as described above, but this time separated by substantially larger distances. Secondly, complete sequencing of a few of these clones each time allows the reconstruction of large fragments of contiguous sequences of up to forty thousand bases, without the complications caused by the presence of sequences in other regions of the genome and the two potentially different versions of each region (one inherited from the father and the other from the mother). Another strategy is random sequen­cing with a different sequen­cing technology –pyrosequencing– which makes it possible to obtain fragments of up to four hundred base pairs. Finally, work is in progress on an assisted assembly, using the cat genome as a guide.





Assembling the genome is just the first step. Once a sufficiently high quality assembly has been obtained, the next step is to “annotate” it. This means delimiting what regions of the genome corres­pond to genes or other functional elements, and assign them, as far as pos­sible, a functional annotation. This is a complex process in which intrinsic features associated with functional sequences (de novo prediction) are used, together with similarities with other already annotated genes in other genomes (prediction by homology). Additionally, the lynx genome project uses mass sequencing of transcripts, RNA molecules that have been copied in the cell from functional regions of the genome. This “transcriptome” is extremely useful in genome annotation, as it makes it possible to delimit genes precisely and discover how their different coding elements combine, also yielding information on the genes expressed in the different tissues. Finally, a possible function will be assigned to the predicted genes by comparing them with sequences annotated in other organisms. On this point, the lynx’s genome map will be a useful tool with which to explore the genetic basis of any of the species’ phenotypic variables.


The genotype of the Iberian lynx and its feline relatives
The lynx is by no means the first member of the cat family to join the club of organisms whose genomes have been completely sequenced. Indeed, since 2007 we have the sequence of a domestic cat (Felis catus): a specimen of the Abyssinian breed, called Cinnamon. The project, funded by the National Institutes of Health (NIH) in the US, was given priority on account of the importance of the cat as a pet, but also its use as a model of human illness, as cats can suffer from hereditary diseases analogous to those affecting humans. There are also other projects in progress which aim to sequence the genomes of a number of cat species. We currently have around 30 genomes of plants and 60 of vertebratesA couple of years ago, the Beijing Genomic Institute (BGI) announced that in collaboration with the San Francisco zoo and other institutions, they intended to sequence several cats, including the tiger (Panthera tigris), lion (Panthera leo) and leopard (Panthera parda). This project, called the Big Cat Genomics Initiative, not only aims to sequence single specimens of each species but also representative individuals of subspecies like the Asiatic lion, and even hybrids between the lion and tiger. Other species of cats will undoubtedly continue to be sequenced and it is very likely that in a few years time we will have reference genomes for each of the 37 surviving cat species (Figure 3). As with the human genome, having a reference genome makes sequencing additional individuals easier (and cheaper). This is because a lower level of reso­lution can be used, given that it can be compared it with the reference genome rather than having to assemble it from scratch. In fact, the Iberian lynx sequencing project will not be limited to producing a reference genome for the species from a single individual, but will have ten additional genomes sequenced to lower reso­lution (four from the Doñana population and six from the Sierra Morena). These specimens were selected using genetic markers to avoid including very closely related individuals, and founders of the captive population were favoured, because we have more information about them and their chances of an untimely death are lower. Finally, a specimen of Eur­asian lynx (Lynx lynx) has also been sequenced at low resolution, allowing comparisons between these two related species.

Having sequences of related species, and of individuals of the same species, available is extremely useful in genomic analysis as it makes it possible to trace in great detail the genomic changes occurring in the recent history of a species. These changes may consist of ad hoc mutations in the sequence, which alter the identity of a nucleotide and, on occasions, the protein sequence that codes it. But they may also involve changes on a larger scale, such as duplications, translocations, or losses of fairly large regions of the genome. The project to analyse the lynx genome aims to study these events over the course of the species’ evolution, so as to understand what evolutionary processes have moulded its genome and so obtain valuable information for its conservation. This comparative analysis will take place on several levels: both within the species, studying genomic variation within different populations, and between different species of cat and other mammals. At population level, zones which have a greater or lesser diversity between populations are studied to yield molecular markers. These can be used to optimise crossbreeding in the captive population or the release of specimens in different areas, or to boost the conservation of genetic diversity of the species and minimise problems of inbreeding. As regards vari­ations between species, the aim is to study changes at the level of mutations or of the number of copies that have recently occurred in the lynx’s evolution, as this will help us understand how the species has adapted to the particularities of its environment. For example, detecting proteins whose sequence has varied more or less than expected could be indicative of the existence of certain forms of selective pressure. Likewise, we hope to obtain information about possible past epidemics and the molecular bases of hereditary disorders that affect lynx populations. Finally, on a broader level, we aim to conduct a phylogenomic ana­lysis with cats and other sequenced mammals, consisting of the reconstruction of the evolutionary history of each of the genes coded in the lynx genome. This ana­lysis, the first of its kind to be conducted on a mammal genome, will make it pos­sible to resolve the corres­pondence between related genes of different organisms with a high degree of precision (a kind of “who’s who” for each genome), and trace the history of duplications, losses and changes in evolutionary speed occurring in each family of genes, over the course of each lineage. It is anticipated that characterising the Iberian lynx genome will allow us to learn more about its biology, understand its evolutionary history better, and ultimately, contribute to the overarching goal of the species’ recovery.

Profile: Toni Gabaldon

Toni GabaldonToni Gabaldon has an honours degree in Biology from the University of Valencia (1996) and a doctorate in Medical Science from the University of Nijmegen, the Netherlands (2005). Since 2008 he has led the Comparative Genomics group at the Genomic Regulation Centre (CRG) in Barcelona, and since 2009 he been an associate lecturer at the Universitat Pompeu Fabra, where he teaches bioinformatics and molecular evolution.

His interests focus on the study of relationships between genotypes and phenotype by means of comparative and evolutionary analysis of genome sequences of a multitude of organisms. To this end he uses large-scale computational evolutionary analysis techniques applied to answering a variety of questions, such as the evolution of cellular organelles or the determination of the molecular basis of genetic or infectious diseases. He has also worked on, and on occasions co-led, various international genome sequencing projects.

He has published over 60 papers in international scientific journals, including Science, Nature and PloS Biology. He is currently the secretary of the Sociedad Española de Biología Evolutiva [Spanish Society of Evolutionary Biology].

More information

Published in No. 09


  • ® Fundación General CSIC.
    All rights reserved.
  • Lychnos. ISSN: 2171-6463 (Spanish print edition),
    2172-0207 (English print edition), 2174-5102 (online edition)
  • Privacy and legal notices
  • Contact

Like what we do? Keep up to date
with our latest
news and activities
on Facebook, Twitter or YouTube

Search options