A first in science: Human genome sequenced completely

The earlier Human Genome Project was hailed as a breakthrough, but it lacked eight percent of the human genome. Now, a group of about 100 scientists have totally processed the human genome and deciphered the DNA blueprint.

In new work published on March 31, 2022 in the journal Science, a team of scientists talk of the first ever sequencing of an entire human genome.
Getty Images

In new work published on March 31, 2022 in the journal Science, a team of scientists talk of the first ever sequencing of an entire human genome.

The Human Genome Project (HGP) “was an inward voyage of discovery led by an international team of researchers looking to sequence and map all of the genes -- together known as the genome – of members of our species, Homo sapien,” according to US National Institute of Health’s National Human Genome Research Institute. 

Howard Hughes Medical Institute (HHMI) investigator Evan Eichler was always attracted to “the most complex regions of humanity’s genome – those with bizarrely long stretches of repeated DNA or with ezra copies of genes.”

A news release notes that Eichler suspected these regions might play crucial roles in evolution and disease. Which is why Eichler became part of the HGP more than 20 years ago. “Beginning on October 1, 1990 and completed in April 2003, the HGP gave us the ability, for the first time, to read nature's complete genetic blueprint for building a human being,” the NIH website explains.

The $3 billion effort was declared a success, but the sequencing effort did not bring Eichler to the completion of his scientific goal. More than eight percent of the genome was missing.

The press release points out that some scientists dismissed the missing chunks of DNA, which contained highly repetitive sequences, as junk. Eichler was of a different mind.

“It turned out that many of the regions I was interested in were in the gaps.” He became “committed to finishing the job – reading the entire genome, tricky bits and all.”

Eichler and a team of about 100 scientists, led by Adam Phillippy of the National Human Genome Research Institute (NHGRI) and Karen Miga of the University of California, Santa Cruz, (UCSC) “have finally gotten it right.”

In new work that was initially published as a preprint on bioRxiv.org and now published on March 31, 2022 in the journal Science, the team talk of the first ever sequencing of an entire human genome. This time, the missing eight percent of previously hidden DNA is there, adding a whole chromosome’s worth of data.

 In the genetic manuscript for life, “we are seeing chapters that were never read before,” says Eichler.

University of Washington geneticist Robert Waterston puts it as: “There are no longer any hidden or unknown bits.”

“I think that is psychologically a big thing,” adds Waterston, a leader in the original Human Genome Project who was not involved in the new effort. “I just admire these scientists for sticking with it.”

A complex project

“A gene is a segment of DNA that provides the cell with instructions for making a specific protein, which then carries out a particular function in your body. Nearly all humans have the same genes arranged in roughly the same order and more than 99.9% of your DNA sequence is identical to any other human,” says the NHGRI website.

“On average, a human gene will have 1-3 letters that differ from person to person. These differences are enough to change the shape and function of a protein, how much protein is made, when it's made, or where it's made,” the NHGRI site explains. “They affect the color of your eyes, hair, and skin. More importantly, variations in your genome also influence your risk of developing diseases and your responses to medications.”

The human genome is made up of just over six billion individual letters of DNA, the news release explains, about the same number as other primates like chimps – spread among 23 pairs of chromosomes.

To read a genome, scientists break down all the DNA into pieces hundreds to thousands of letters long, process them in sequencing machines which read individual letters per piece, and reassemble in the right order.

Some regions of the genome repeat the same letters over and over, such as the centromeres (the parts that hold the two strands of chromosomes together and that play a very important role in cell division) and ribosomal DNA (which provides instructions to the cell’s protein factories). 

The news release also mentions other repetitive parts that include new genes that may help species adapt. Then there are the two genomes, one paternal and one maternal, whose sequences can mix together and conceal the actual variation in each individual genome.

Before, all the repetition made it impossible to assemble DNA in the correct order. In the mid-2000s, faced with these challenges, “we came up with the idea of getting a complete genome by sequencing just one of the genomes instead of solving two at the same time,” recalls Eichler.

Eichler contacted University of Pittsburgh reproductive geneticist Urvashti Surti, who was studying a set of cell lines that had two copies of the father’s DNA and none of the mother’s.

Such a cell line, with only one genome, “is what made this genome assembly possible,” says HHMI Investigator Erich Jarvis, a Rockefeller University neurogeneticist who collaborated on the new work.

Over the years, the gene sequencing machines improved and were more able to crunch data. In 2017, NHGRI’s Phillippy and UCSC’s Miga created the Telomere-to-Telomere (T2T) consortium to sequence each chromosome from one end (telomere) to the other.

T2T was a risk, but “we had the benefit of youthful optimism and we were fired up by the promise of these new technologies,” recalls Phillippy.

“The team ran their Nanopore machines nonstop for six months and brought in scores of scientists to assemble the pieces and analyze the results,” the press release notes. “At the same time, sequencing data were being generated by other team members and Pacific Biosciences using their long-read sequencing platform.”

By combining the readouts from two machines, one a sequencing machine by Pacific Biosciences that generated results that were more than 99 percent accurate, and the other, an Oxford Nanopore machine, they were able to “fill all the gaps.”

“It was the last piece of the puzzle ­– like putting on a new pair of glasses,” says Phillippy.

The consortium had assembled two chromosomes and arranged a “hackathon” to assemble the other 21 by summer 2020, working remotely over the lockdown.

The algorithms couldn’t handle the highly repetitive DNA in the centromeres, but the human eye could. So the researchers untangled repetitive sequences manually, “like untangling a string in your yo-yo,” Jarvis says. By autumn, the team had sequenced every chromosome.

‘Rosetta Stone’ of genetic material

Researchers ended up publishing six papers in Science and more than a dozen papers in other publications. One of the things the team discovered was the unexpectedly high levels of genetic variation in centromeres and other regions –  “a whole new treasure chest of variants that we can study to see if they have functional significance,” says Phillippy.

The data offer “the foundation for a new era” in studying centromeres, says Miga, who co-led the T2T centromere satellite working group. She also says that scientists will now be able to explore how this newly discovered variation contributes to disease, and how centromere DNA changes over time.

While the successful completion of a single genome is a breakthrough, there still remains a lot of work to be done. T2T consortium members are already working on sequencing a genome with different chromosomes inherited from the mother and the father.

Scientists are also starting a pan-genome effort to sequence the entire DNA sequences of hundreds of people from around the world. “The goal is to create as complete a human genome as possible, representing much more of human diversity,” explains Jarvis, co-leader of the pan-genome effort.

Yet according to Eichler, the new sequence is the indispensable first step. “Now we have a Rosetta stone for looking at complete variation in hundreds of thousands of other genomes going forward.”

Route 6