Genetic Researchers Sequence Genome of Loblolly Pine

Mar 21, 2014 by News Staff

A large multinational team of scientists led by Prof David Neale from the University of California Davis has sequenced the entire genome of the Loblolly pine (Pinus taeda). With an estimated 22 billion base pairs, it is the largest genome sequenced to date and the most complete conifer genome sequence ever published.

Loblolly pines in Mississippi, USA. Image credit: Woodlot / CC BY-SA 3.0.

Loblolly pines in Mississippi, USA. Image credit: Woodlot / CC BY-SA 3.0.

Loblolly pine is the most commercially important tree species in the United States and the source of most American paper products.

The tree is also being developed as a feedstock for biofuel.

The genome sequence will help scientists breed improved varieties of the loblolly pine, which also is being developed as a feedstock for biofuel. The newly sequenced genome also provides a better understanding of the evolution and diversity of plants.

“It’s a huge genome. But the challenge isn’t just collecting all the sequence data. The problem is assembling that sequence into order,” said Prof Neale, who is the senior author of two papers published in the journal Genetics (paper 1 and paper 2) and the first author of a paper published in the journal Genome Biology.

To tackle the enormous size of the loblolly pine’s genome, which until recently has been an obstacle to sequencing efforts, Prof Neale and his colleagues used a new method that can speed up genome assembly by compressing the raw sequence data 100-fold.

Modern genome sequencing methods make it relatively easy to read the individual letters in DNA, but only in short fragments.

In the case of the Loblolly pine, 16 billion separate fragments had to be fit back together – a computational puzzle called genome assembly.

“We were able to assemble the human genome, but that was close to the limit of our ability; seven times bigger was just too much,” said co-author Prof Steven Salzberg from Johns Hopkins University.

The key to the solution was using a new method to pre-process the gargantuan pile of sequence data so that it could all fit within the working memory of a single super-computer.

The method compiles many overlapping fragments of sequence into much larger chunks, then throws away all the redundant information. Eliminating the redundancies leaves the computer with 100 times less sequence data to deal with.

This approach allowed the scientists to assemble a much more complete genome sequence than the draft assemblies of two other conifer species reported last year.

“The size of the pieces of consecutive sequence that we assembled are orders of magnitude larger than what’s been previously published. This will enable the loblolly to serve as a high-quality reference genome that considerably speeds along future conifer genome projects,” Prof Neale said.

The new sequence confirmed that the loblolly genome is so large because it is crammed full of invasive DNA elements that copied themselves around the genome. About 82 percent of the genome is made up of these and other repetitive fragments of sequence.

The genome also revealed the location of genes that may be involved in fighting off pathogens, which will help scientists understand more about disease resistance in pines.

______

Neale DB et al. 2014. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biology 15: R59; doi: 10.1186/gb-2014-15-3-r59

Jill L. Wegrzyn et al. 2014. Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation. Genetics, vol. 196, no. 3, pp. 891-909; doi: 10.1534/genetics.113.159996

Aleksey Zimin et al. 2014. Sequencing and Assembly of the 22-Gb Loblolly Pine Genome. Genetics, vol. 196, no. 3, pp. 875-890; doi: 10.1534/genetics.113.159715

Share This Page