Reading SNP Data

Last year I tested my genome with 23andMe (luckily, just before the FDA freeze on the health results of 23andMe).  23andMe has a wonderful interface for displaying mutations and the associated studies linking mutations to phenotype.  Unfortunately this display is only available to tests completed prior to November 22, 2013.  Tests run after this date have access to ancestry and their raw data.  The raw data is important because it still allows you to look into your health report without using 23andMe.  Today, I'll show you how to navigate a SNP database (Single Nucleotide Polymorphism, pronounced snip).  Lets say we read somewhere that people with the recessive phenotype at SNP rs601338 are resistant to some strains of Norovirus.  We'll check the validity of this fact and see if the recessive phenotype is in our 23andMe raw data.  

The National Center for Biotechnology Information (NCBI) manages many biotechnology databases including BLAST for nucleotide and protein sequences, PubMed for medical journal articles, dsSNP for SNPs, and many others.  We'll use dsSNP.  Search for rs601338 in dsSNP.  

From the search results we can see the SNP is on chromosome 19, the alleles are either A or G (resulting in AA, AG, and GG phenotypes), the gene is FUT2, and there is a link to the Online Mendelian Inheritance in Man (OMIM) database.  Follow this link.  (In addition, the Varview link provides a great view of the gene neighborhood).

Screen Shot 2014-10-06 at 2.43.28 PM.png

We see this SNP is on the FUT2 gene on chromosome 19 again (a quick check), and finally we see this gene has some association with Norovirus (Norwalk Virus) resistance.  If we scroll further down we can find information about the polymorphisms.  In 1995, Kelly et al. showed the AA phenotype produces proteins with Tryptophan changed to a terminating codon (TAG) resulting in a non-secreting phenotype.  In 2003, Lindsmith et al. showed that the non-secreting phenotype confers resistance to Norwalk Virus infection  Unfortunately, Lindsmith et al.'s article is behind a paywall which is an issue for another day.

Finally, we need to check our raw data from 23andMe to see which phenotype we have.  23andMe organizes the raw data into a TAB-seperated file with the SNP id, chromosome, position, and phenotype.  As shown below, I have the AA, non-secretor phenotype.  This does not mean I am immune to every flu or flu-like virus.  It only means I am resistant (not immune) to the most common strain (not all strains) of Norovirus.  We must be very carful and reserved when interpreting genome data.  

I find this process of digging through biotech databases and articles fun.  However, the TAB-seperated txt file for every SNP tested by 23andMe is 24.8MB (huge!), and it would take forever to go through this process for every SNP.  Fortunately, there are many SNP services to help you comb through your data and understand it readily. (Prometheus is well regarded, but I have not used it).