Local Ancestry?

Where are you from?  I mean, genetically?

A lot of work in my lab involves genetic ancestry.  One project in particular done by Wen-Yun Yang in our lab involved taking a person’s genetic data and using it to pinpoint where that person is from on a map.  A few weeks ago he successfully defended his PhD! We are all very excited for him!

But I am asking a different question, not where are you from, but rather, where is your DNA from?  In an individual from England, chances are that their entire DNA is from Europe. A person from Japan most likely has their entire DNA from Asia and a person from Kenya will likely have DNA from Africa.  But what about an African American? Their DNA is from both Africa and Europe (and often a bit from indigenous people from North America); we call individuals with DNA from multiple populations admixed individuals.


Due to a genetic randomization process called recombination, the genome of an African American will be a mosaic of ancestry. Some stretches of their genomes will be from only African ancestry or only European ancestry and other regions will be from both European and African Ancestry.

A locus in the genome is a particular location in the genome (i.e.  ‘c’ is in the third locus of the word locus). Unfortunately for genetic researchers, the genome is not conveniently colored red and blue according to ancestry and so my research has focused on an extremely fast and accurate method to determine the locus specific ancestry (also called local ancestry) of an admixed individual.  More on that later, but first, why do we care about knowing this?

When we know the locus specific ancestry of one individual we can tell that person about their heritage, about where their ancestors may have come from. However, the main reason we want to determine locus specific ancestry in individuals is so that we can help determine personalized medical treatments and risk assessments. Many diseases have different risk and prevalence amounts in different ethnic populations. So knowing which ancestry an individual has at a risk location in a genome is very important. When looking at many admixed individuals, patterns in the ancestry mosaic can give us interesting clues as to the population history, migration rates and cultural dynamics of the interacting populations.

But how do we do determine the locus specific ancestry of an individual?  Recent advances in the field allow research to literally read every base pair, these are the As, Cs, Ts and Gs that stand for the nucleic acids that make up the genome, in other words, the letters in the novel that is our genome. It used to be that only about a million of the 3 billion base pairs were read, but now we can read all of them.

From one individual to the next the genome doesn’t change very much, changes occurred when at a base pair position, one individual has one nucleic acid (e.g. A) and the other a different (e.g. T).  We call these differences single nucleotide polymorphisms (SNPs) A few years ago we could only gather information on the SNPs that are common, such as one where 90% of the population has a T and the other 10% has a G.

Now that we can read all of the SNPs we have determined that many of these SNPs are very rare and are only seen in certain ethnic groups. We find that there are some SNPs that are seen in individuals from one continent, but never in individuals from any other continent. We call these continent-specific variants (CSVs).   While still not as good as color coding the genome for us, there are enough of these variants that they can quickly and easily be used to determine the locus specific ancestry of an individual. In the example below, the black lines represent the true ancestry of a hypothetical African American. The blue and green marks represent CSVs that were observed in that individual.  Because humans are diploid, having two copies of each chromosome, there are three possible ancestry combinations: having both copies of a locus on a chromosome from Africa, having both copies from Europe and having one copy from each. From the pattern of CSV mark, it is easy to infer the true ancestry.  Since you can do this with your eyes, we constructed a simple statistical model (a hidden Markov Model) which very quickly and accurate determines the ancestry of admixed individuals.

Screen Shot 2014-02-04 at 2.35.16 PM

With this method, researchers can more quickly and effectively investigate theories of population movements and changes and histories, develop personalized genomics techniques and can better understand the recombination process and how it shapes our genomic landscapes.

Leave a Reply

Post Navigation