Scientists with the Human Pangenome Reference Consortium have released a new high-quality collection of reference human genome sequences that includes genomes of 47 people, with the goal of increasing that number to 350 by mid-2024.

The human pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. Image credit: National Human Genome Research Institute.
The original reference human genome sequence is nearly 20 years old and has been regularly updated as technology advances and researchers fix errors and discover more regions of the human genome.
However, it is fundamentally limited in its representation of the diversity of the human species, as it consists of genomes from only about 20 people, and most of the reference sequence is from only one person.
“Everyone has a unique genome, so using a single reference genome sequence for every person can lead to inequities in genomic analyses,” said Dr. Adam Phillippy, a researcher with the National Human Genome Research Institute at the National Institutes of Health.
“For example, predicting a genetic disease might not work as well for someone whose genome is more different from the reference genome.”
The current reference human genome sequence has gaps that reflect missing information, especially in areas that were repetitive and hard to read.
Recent technological advances such as long-read DNA sequencing, which reads longer stretches of the DNA at a time, helped researchers fill in those gaps to create the first complete human genome sequence.
This complete human genome sequence, released in April 2022 as part of the Telomere-to-Telomere (T2T) Consortium, is incorporated into the current pangenome reference.
Using advanced computational techniques to align the various genome sequences, the Human Pangenome Reference Consortium constructed a new human pangenome reference with each assembly in the pangenome covering more than 99% of the expected sequence with more than 99% accuracy.
It also builds upon the previous reference genome sequence, adding over 100 million new bases.
While the previous reference genome sequence was single and linear, the new pangenome represents many different versions of the human genome sequence at the same time.
This gives researchers a wider range of options for using the pangenome in analyzing other human genome sequences.
“By using the pangenome reference, we can more accurately identify larger genomic variants called structural variants,” Mobin Asri, a Ph.D. student at the University of California Santa Cruz.
“We are able to find variants that were not identified using previous methods that depend on linear reference sequences.”
Structural variants can involve thousands of bases. Until now, researchers have been unable to identify the majority of structural variants that exist in each human genome using short-read sequencing due to the bias of using a single reference sequence.
“The human pangenome reference will enable us to represent tens of thousands of novel genomic variants in regions of the genome that were previously inaccessible,” Wen-Wei Liao, a Ph.D. student at Yale University.
“With a pangenome reference, we can accelerate clinical research by improving our understanding of the link between genes and disease traits.”
“While it’s still a work in progress, the pangenome is public and can be used by scientists around the world as a new standard human genome reference,” said Dr. Erich Jarvis, a researcher at Rockefeller University.
“This complex genomic collection represents significantly more accurate human genetic diversity than has ever been captured before.”
“With a greater breadth and depth of genetic data at their disposal, and greater quality of genome assemblies, researchers can refine their understanding of the link between genes and disease traits, and accelerate clinical research.”
The team’s work was published in the journal Nature.
_____
WW. Liao et al. 2023. A draft human pangenome reference. Nature 617, 312-324; doi: 10.1038/s41586-023-05896-x