Linking genes over generations

Editor’s note: This is a guest post by Dr. Ajay Royyuru, the senior manager of IBM’s Computational Biology Center.

We started the Genographic Project with National Geographic six years ago, and the first thing we worried about was how the general public would respond to our request for their DNA. There’s nothing more personal than that.

Well, we ran out of our initial supply of 30,000 kits – and reached 100,000 DNA samples – in the first year! (I personally didn’t think we would hand out the 30,000 kits over the five years of the project.)


Genetic Privacy

The ethical insight used on the Genographic Project helped IBM form its Genetic Privacy Policy. The policy was one of the testimonies IBM provided in a US Congressional hearing that led to the passage of the Genetic Information Non-discrimination Act in 2008.

Read the IBM testimony to the House Education and Labor Subcommittee on Health, Employment, Labor and Pensions.

What did we do right?

Having a thorough and well-communicated ethics policy led to why we succeeded in collecting the 470,000 samples. We assured the public that though we’re taking their DNA, participation would be anonymous; the DNA would be used only for determining the migratory history of humankind; and that their DNA would not be analyzed for clinically informative markers (such as a family history of hypertension).

Tracking paternal and maternal ancestry

At the outset, we analyzed two pieces of genetic evidence in the Genographic Project. First, looking at the entire human genome, certain fragments pass from one parent to a child that does not mix with the genetic information from the other parent. In males, this is the Y chromosome.

The Y chromosome goes from father to son with almost no modification. But the transition is a bit like copying a book by reading and re-writing – occasionally there’s a typo. That typo is what we call a mutation, or a marker of descent.

A mutation may only appear in one copy of one instance of DNA that could be passed from father to son. For example, only one of two brothers may get the mutation. Now, they are marked by this difference and the brother with the mutation (and his descendants) will carry that marker. We can see that mutation and track the male descendants for generations, back to the first male who showed that marker.

The other piece of genetic evidence we tracked also comes from genetic fragments that can only pass from one parent to a child. DNA contained in mitochondria, structures in the body of every human cell, is passed from a mother to all of her children and provides a means of tracing a maternal line of ancestry.

Genome recombination

Y chromosome and mitochondrial DNA constitute less than 1 percent of the human genome. The rest of the genome is not directly inherited from a single parent. Rather, it undergoes a process of recombination, effectively shuffling fragments of DNA from each parent to create the unique genome of each child.

Tracing the ancestral history on genomic regions besides Y and mitochondrial is a daunting task, confounded by the active recombination that occurs every generation.

Crunching the genetic data

IBM in Genetics

IBM has invested in genomics and computational biology, for more than 15 years. These are disciplines that inform the life science, pharmaceutical and biotechnology industries. IT plays a vital role in enabling new science and discovery in biology, transforming the field into an information science.

We worked with population geneticists in various regional centers across the world to analyze this genetic data in populations of Sub-Saharan Africa, North Africa, the Middle East – and other regions are being concluded. Results published so far about the migratory history of the earliest humans in Africa include genetic evidence of relatively recent migratory events, such as the arrival of Crusaders in the Middle East, and the spread of the Phoenicians into the Mediterranean.

But the computational task of analyzing the data of our 470,000 samples is not a brute force exercise. A supercomputer is not required. Laxmi Parida, a member of the IBM team led a three-year effort in collaboration with Jaume Bertranpetit at the University of Pompeu Fabra in Spain, to develop an elegant algorithm that reconstructs recombinant history of the genome – using only workstations. They analyzed markers on the X chromosome of 1,240 male participants, from 30 different ethnicities across Africa, Middle East, Europe, and Asia.

On the point of migration, our findings showed that Eurasian groups were more similar to populations from southern India, than they were to those in Africa. This supports a southern route of migration from Africa via the Bab-el-Mandeb Strait in Arabia, before any movement heading north. It suggests a special role for South Asia in the “out of Africa” expansion of modern humans.

Keep in mind, exactly which direction or route humans took in migrating out of Africa is still not settled. This new genetic evidence suggests that other fields of research such as archaeology and anthropology should look for additional evidence on the migration route of early humans to further explore this theory.

How to participate

Visit the Genographic Project website to order a DNA kit. IBM employees can order a kit, internally, here.


Labels: , , , ,