My workflow towards the start of 2018

I’ve often found it helpful to read about how others organize and attempt to optimize their time. Here are a few of the things that I’ve been doing recently on that front, in case anyone else finds them helpful:

  • Vitamin D 5000 IU daily. As far as I can tell the jury is still mixed on vitamin D, but to me it seems worth the cost/benefit analysis.
  • A daily scoop of Metamucil with water in the morning. A colorectal fellow at my school got me hooked on this, and I’ve found it has added a lot of value to my life. My sister doesn’t like the artificial flavoring ingredients, and I respect her point, but for me I think great is the enemy of good.
  • Bright light in the AM. Especially helpful for East Coast winters. I’ve been doing it for 2 years, but more consistently recently.
  • Smartphone on greyscale mode 24/7. I’ve been doing this for several months now, and I think it has helped to mitigate some of my phone addiction. It especially makes photo sharing sites like FB and instagram a lot less interesting.
  • Charging my smartphone in the living room at night. I’ve done this for a month or two as well. Beyond the probably helpful but more nebulous effects on sleep hygiene, this definitely decreases the probability of getting a stressful email/text right before bed or in the middle of night when I wake up briefly.
  • Blocking FB and Twitter on web and mobile. I’ve been doing this for the past month or two as a mental health break from these websites, and I’ve found it calming. I occasionally break the lock to look up a particular thing, but I always try to add it back right away. Just adding barriers is helpful for me when I have the “itch” to check something that’s likely to make me unhappy in the long run.
  • Blue blocking orange glasses at night. I’ve been doing this for about 4.5 years. While I think it probably helps my sleep by preventing inhibition of melatonin production, the most obvious effect is that it helps decrease eye dryness.
  • Caffeine pills instead of coffee. The first thing I do on most days is to take a 100 mg caffeine pill. I like coffee and drink it occasionally, but for daily use I find caffeine pills cheaper, more convenient, and much easier to dose. The precise dosing is key for me because I’m very sensitive to caffeine withdrawal and can end up with withdrawal headaches just from drinking a particularly large cup of coffee one day.

Samuel Wilks on the need for multidisciplinary neurologic research (in 1864)

A history of dementia often starts in 1907 with the work of Alois Alzheimer, but in reality it should start much sooner. In 1864 Samuel Wilks wrote “Clinical Notes on the Atrophy of the Brain,” which was one of the first studies to point out gross atrophy of the sulci in the brains of persons who had dementia prior to death. This is a great paper! I loved the intro:

WERE an occasional comparison instituted between the experiences of those who practise in special but different departments of the profession, it would conduce not only to the fulfilment of some higher general truths than we now possess, but afford to the individual labourer in his department a more just and less narrow view of the field of observation which is always more immediately before his eye. A close observance to one section of medicine may produce much accurate and minute knowledge, but since the division of our art into branches is artificial rather than real, the knowledge therein obtained is regarded apart from its natural relations, and becomes so distorted as to lose much of its value as truth. If the various sciences into which we divide nature for the purposes of study are artificial, and it be true that an exclusive devotion to one of them can never give to its follower a correct insight into the operations of nature, so more true must it be that the general laws of human pathology can scarcely be gleaned in an exclusive practice in one single department.

It may seem almost impertinent to make these remarks in a Journal devoted to a special object, nor were they, indeed, intended to apply to the study of mental disorders, which must be undertaken in an almost isolated manner; and yet an opinion has obtained hold of me (which, however, may be erroneous) that even here some too narrow views may be held of cerebral pathology, and this opinion, right or wrong, has suggested the remarks in the present communication. To be more explicit: I have thought that those who are occupied in the practice and study of any one department might possibly look upon some morbid condition or other feature in a case, as peculiar to a certain form of disease. Thus, in connection with the subject on which I purpose to make a few remarks, it has seemed to be inferred that a certain morbid phenomenon has been found exclusively in lunatic asylums; and, at the same time, to be inferred by a writer on infantile diseases, and who is probably destitute of the knowledge just mentioned, that this phenomenon is intimately connected with the cerebral affections of children. So, also, with the general subject of the following observations, atrophy of the brain: this has appeared to me to have been regarded by some as a condition attaching to those who have died of mental affections, and not only so, but of some special form of insanity; others would describe a similar condition as resulting from repeated attacks of delirium tremens; whilst others write of a state not distinguishable from these as the ordinary result of old age. From having no inclination towards any of these special departments, I have endeavoured to take a comprehensive view of such pathological changes, and, as regards the subject before us, to discover at what stage our knowledge has reached of this morbid condition, and what is its true pathological significance; leaving it for further research to elucidate its varieties and the different methods by which these are brought about.

Is progeria related to aging in the same way that familial AD is related to sporadic AD?

Attention conservation notice: Someone has probably made this point before.

Progeria is a genetic disorder caused by mutations in the lamin A nuclear lamina protein. Since it manifests in several ways that resemble an aged state (eg wrinkled skin, atherosclerosis, kidney failure, loss of eyesight), it is widely believed to be an early-onset version of aging.

Yet, few people think that nuclear membranes are the only thing that is altered in aging, as aging is generally considered too complicated for that. Instead, nuclear membranes are recognized to be one aspect within a larger pathway that is altered in aging.

Familial Alzheimer’s disease (AD) is a genetic disorder caused by mutations in APP, PSEN1, or PSEN2, which are all part of the APP processing pathway and thus (among other things) amyloid plaque production. Since it manifests in several ways that resemble sporadic AD (episodic memory loss, Aβ plaques, tau tangles), it is widely believed to be a an early-onset version of sporadic AD.

In contrast to progeria and aging, familial AD is generally thought to be a model of sporadic AD that captures almost all of the key pathways involved. As a result, one of the major justifications for clinical trials to treat sporadic AD by removing amyloid plaques is that the genetics of familial AD are all related to APP processing and thus amyloid plaque production.

There are probably several good arguments for why this progeria:aging::familial AD:sporadic AD contrast doesn’t make sense, but I still thought it might be interesting.

Making a shiny app to visualize brain cell type gene expression

Attention conservation notice: A post-mortem of a small side project that is probably not interesting to you unless you’re interested in molecular neuroscience.

This weekend I put together an R/Shiny app to visualize brain cell type gene expression patterns from 5 different public data sets. Here it is. Putting together a Shiny application turned out to be way easier than expected — I had something public within 3 hours, and most of the rest of my time on the project (for a total of ~ 10 hours?) was spent on cleaning the data on the back end to get it into a presentable format for the website.

What is the actual project? The goal is to visualize gene expression in different brain cell types. This is important because many disease-relevant genes are only expressed in one brain cell type but not others, and figuring this out can be critical to learning about the etiology of that disease.

There’s already a widely-used web app that does this for two data sets, but since this data is pretty noisy and there are subtle but important differences in the data collection processes, I figured that it’d be helpful to allow people to quickly query other data sets as well.

As an example, the gene that causes Huntington’s disease has the symbol HTT. (I say “cause” because variability in the number of repeat regions in this gene correlate almost perfectly with the risk of Huntington’s disease development and disease onset.) People usually discuss neurons when it comes to Huntington’s disease, and while this might be pathologically valid, by analyzing the data sets I’ve assembled you can see that this gene is expressed across a large number of brain cell types. This raises the question of why — and/or if — variation in its number of repeats only causes pathology in neurons.

Screen Shot 2016-06-13 at 11.35.10 AM

Here’s another link to the web app. If you get a chance to check it out, please let me know if you encounter are any problems, and please share if you find it helpful.


Aziz NA, Jurgens CK, Landwehrmeyer GB, et al, et al. Normal and mutant HTT interact to affect clinical severity and progression in Huntington disease. Neurology. 2009;73(16):1280-5.

Huang B, Wei W, Wang G, et al. Mutant huntingtin downregulates myelin regulatory factor-mediated myelin gene expression and affects mature oligodendrocytes. Neuron. 2015;85(6):1212-26.

Eight years of tracking my life statistics

Attention conservation notice: Borderline obsessive navel-gazing.

Most mornings, I start my day — after I lie in bed for a few minutes willing my eyes to open — by opening up a Google spreadsheet and filling in some data about how I spent that night and the previous day. I’ve been doing this for about eight years now and it’s awesome.

I decided to post about it now because self-tracking as a phenomenon seems to be trending down a bit. Take for example former WIRED editor Chris Anderson’s widely shared tweet:

Screen Shot 2016-05-07 at 6.07.48 PM

So this seems a good time to reflect upon the time I’ve spent self-tracking so far and whether I’m finding it useful.

But first, a Chesterton’s fence exercise: why did I start self-tracking? Although it’s hard to say for sure, here’s my current narrative:

  • When I was a senior in high school, I remember sitting in the library and wishing that I had extensive data on how I had spent my time in my life so far. That way when I died, I could at least make this data available so that people could learn from my experiences and not make the same mistakes that I did. I tried making a Word document to start doing this, but ultimately I gave up because — as was a common theme in my misspent youth — I became frustrated with myself for not having already started it and decided it was too late. (I hadn’t yet learned about future weaponry.)
  • I used to read the late Seth Roberts’ blog — it was one of my favorites for a time — and he once wrote a throwaway line about how he had access to 10 years of sleep data on himself that he could use to figure out the cause of his current sleep problems. When I read that early in college I thought to myself “I want that.”
  • In sophomore year of college my teacher and mentor Mark Cleaveland assigned me (as a part of a class I was taking) to write down my sleep and how I spent my time in various activities for a week. This was the major kick into action that I needed — after this, I started tracking my time every morning on the spreadsheet.

It takes about 66 days to develop a habit. The more complex the habit, the longer it takes. I think that by about 100-150 days in it was pretty ingrained in me that this was just something that I do every morning. After that, it didn’t take much effort. It certainly did take time though — about 3-5 minutes depending on how much detail I write. That’s the main opportunity cost.

Three of the categories I’ve tracked pretty consistently are sleep, exercise, and time spent working.

Here’s hours spent in bed (i.e., not necessarily “asleep”):

Screen Shot 2016-05-07 at 8.54.22 PM

black dots = data points from each day; red line = 20-day moving average

Somewhat shockingly, the mean number of hours I’ve spent in bed the last 8 years is 7.99 and the median is exactly 8.

Here’s exercise:

Screen Shot 2016-05-07 at 8.58.39 PM.png

I’m becoming a bit of a sloth! Hopefully I’ll be able to get this back up over the next few years. Although note that I have no exercise data for a few months in Summer ’15 because I thought that I would switch solely to Fitbit exercise data. I then got worried about vendor lock-in and started tracking manually again.

Here’s time spent working (including conventional and non-conventional work such as blogging):

Screen Shot 2016-05-07 at 8.49.54 PM

One of the other things I’ve been tracking over the past few years is my stress, on an arbitrary 1-10 scale. Here’s that data:

Screen Shot 2016-05-07 at 9.12.26 PM

In general, my PhD years have been much less stressful than my time studying for med school classes and Step 1. Although it’s not perfect, I’ve found this stress level data particularly valuable. That’s because every now and then I get stressed for some reason, and it’s nice to be able to see that my stress has peaked before and has always returned to reasonably low levels eventually. I think of this as a way to get some graphical perspective on the world.

I track a few other things, including time spent on administrative tasks (like laundry), time spent leisure reading, time spent watching movies, and time spent socializing.

I also track some things that are too raw to write about publicly. Not because I’m embarrassed to share them now, but because I’m worried that writing them in public will kill my motivation. This is definitely something to consider when it comes to self-tracking. For me, my goal has first and foremost been about self-reflection and honesty with myself. If I can eventually also share some of that with the world, then more’s the better.

Overall, I’ve found three main benefits to self-tracking:

  1. Every now and then, I’ll try to measure whether a particular lifestyle intervention is helping me or not. For example, a couple of months months ago I found that there was a good correlation between taking caffeine (+ L-theanine) pills and hours worked. Although this is subject to huge selection bias, I still found it to be an interesting effect and I think it has helped me optimize my caffeine use, which I currently cycle on and off of.
  2. There have been a few times these past 8 years when I’ve suddenly felt like I’ve done “nothing” in several months. One time this happened was about a year into my postbac doing science research at the NIH when it seemed like nothing was working, and it was pretty brutal. That time and others, it’s been valuable for me to look back and see that, even if I haven’t gotten many tangible results, I have been trying and putting in hours worked. Especially in science where so many experiments fail, it’s helpful for me to be able to measure progress in terms of ideas tried rather than papers published or some other metric that is much less in my control. GitHub commits could also work in this capacity for programmers, although that’s slightly less general.
  3. The main benefit, though, has not been my ability to review the data, but rather as a system for incentivizing me to build process-based habits that will help me achieve my goals. I enjoy the bursts of dopamine I get when I’m able to write that I worked hard or exercised the previous day — or that I got a lot of high-quality socializing in with friends or family — and it makes me want to do that again in the future.

Do you want to try a similar thing? Check out this blank Google spreadsheet for a quick start; it has a bunch of possible categories and a few example days for you to delete when you copy it over to your own private sheet. I like Google sheets because they are free and able to be accessed anywhere with an internet connection, but it’s certainly not a requirement.

Even if you don’t try it, thanks for reading this essay and I hope you got something out of it.


Arterial aging and its relation to Alzheimer’s dementia

I’m a big proponent of the role of arterial aging in explaining dementia risk variance, in large part because it explains the large role that vascular-related risk factors have in promoting the likelihood of Alzheimer’s disease (AD). However, some data suggests that the burden of ischemic events and stroke cannot explain all of the vascular-related AD risk. Recently, Gutierrez et al. published a nice paper which suggests that non-atherosclerotic artery changes with age may explain some of this residual vascular-related risk of AD. In particular, they used 194 autopsied brains and found five arterial features which strongly correlated with aging, including decreased elastin and concentric intimal thickening. Importantly, these features also correlated with AD risk independently of age.

The authors propose that the arterial aging features are a consequence of mechanical blood flow damage that accumulates over the years. If it is true that the damage is mechanical, it suggests that it may be difficult to reverse with existing cellular and molecular anti-aging therapies. For those people who are interested in slowing down aging, the brain must be a top priority because it cannot be replaced even by highly advanced tissue engineering approaches to replace the other organs. Thus, this sort arterial damage needs to be addressed, but to the best of my knowledge it has not been, which is one of the many reasons that I expect that serious anti-aging therapies are much further out than are commonly speculated in the popular press.

Are four postulated disease spectra due to evolutionary trade-offs?

I recently read Crespi et al.’s interesting paper on this subject. They describe eight diseases as due to four underlying diametric sets that can be explained by evolutionary/genetic trade-offs:

  1. Autism spectrum vs psychotic-affective conditions
  2. Osteoarthritis vs osteoporosis
  3. Cancer vs neurodegenerative disorders
  4. Autoimmunity vs infectious disease

Of these, #2 and #4 seem obviously correct to me based on my fairly limited med school exposure, and they describe the evidence in a systematic way. I don’t know enough about the subject matter to speculate on #1, but I would like to see more genetic evidence.

Finally, I found their postulated explanations for #3 somewhat weak and I personally think that it is a selection bias trade-off, i.e. a case of Berkson’s bias as applied to trade-off. That is, since both cancer and neurodegeneration are age-related conditions, you could think of aging as the “agent” that selects either neurodegeneration or cancer as the ultimate cause of age-related death. I could be persuaded to change my mind on the basis of genetic predisposition evidence or some other mechanism, but I found the mechanism of apoptosis to be weak since apoptosis occurs (or doesn’t occur when it should) in many, many diseases, and moreover it is far from clear that neurodegeneration is mostly due to apoptosis as opposed to some other mechanism of cell death. A mechanism that might be most persuasive to me is one related to immune cells, since they clearly play a large role in regulating cancer growth, and also have high expression for the most GWAS risk factors for Alzheimer’s disease. But I still suspect that the selection bias is primary.

How to do Bayesian shrinkage estimation on proportion data in Stan

Attention conservation notice: In which I once again document my slow march towards Bayesian fundamentalism. Not of interest unless you are interested in shrinkage estimation and Stan.

As I describe in my essay on trying to determine the best director on imdb, you usually can’t trust average ratings from directors with a small number of movies to be a good estimate of the “actual” quality of that director.

Instead, a good strategy here is to shift the average rating from each director back to the overall median, but to shift it less the more movies that person has directed. This is known as shrinkage estimation, and in my opinion it’s one of the most underused statistical techniques (relative to how useful it is).

The past few weeks I’ve been trying to learn the Bayesian modeling language Stan, and I came across a pretty good model for shrinkage estimation using a beta-binomial approach in this language (described in 1, 2). Here’s the model, which uses batting averages from baseball players.

In order to determine the amount of shrinkage in this model, I plotted the “actual” (or “raw”) average versus the estimated average using this model, and colored the data points by the log of the at bats (lighter blue = more at bats).

Screen Shot 2016-03-10 at 7.39.03 PM

As you can see, players with more at bats have less shrinkage. At the extreme, two players who are 0/1 on the season still have an estimated average of ~ 0.26 (which is the median of the “actual” batting averages).

Notably, there are fewer players whose averages are decreased due to the shrinkage estimation than the reverse. Perhaps managers are inclined to give players a few more shots at it until they prove that their early success was just a fluke.

A clinical trial for omental transposition in early stage AD

A couple of years ago I wrote about treating AD with omental transposition, a radical therapy with success in ~ 35% of patients in one case series. Today I just noticed that there is a non-randomized, single-arm clinical trial on its use in patients with early stage AD (MoCA score 11-18), in Salt Lake City, UT. Estimated study completion date: May 2019.

This is especially interesting because they have a relatively thorough explanation of how the surgery works. In the general surgery portion of the procedure, an omental flap is created, which receives blood supply from the right gastric and gastroepiploic arteries. Next, a subcutaneous tunnel is created that travels up the chest wall and neck to behind the ear.

In the neurosurgery portion of the procedure, a portion of bone is removed near the temporal-frontal area, followed by removal of the dura and arachnoid membrane. The omentum is then placed on the parietal-temporal-frontal area of one cerebral hemisphere, and connected to the dura via a suture.

Besides this tissue grafting approach, other neurosurgical approaches to Alzheimer’s have included:

  1. CSF shunts (to the atria or ventricles)
  2. Intraventricular infusions (of bethanecol, NGF, or GM1)
  3. Gene therapy with infusion of NGF-expressing cells
  4. Electrical stimulation (of the vagus nerve, nucleus basalis of Meynert, or the fornix)

To what extent does whole genome sequencing add value above SNP arrays?

Attention conservation notice: I wrote this as the final essay for my course on personal genomics with Michael Linderman at Mt Sinai. The main question of the essay was: what is the point, in 2016, of getting your whole genome sequencing (WGS) data, if you already have your SNP data? Overall, I found analyzing my WGS data an interesting experience, but the vast majority of known genomic info is still at the SNP level, and there are some bugs in contemporary variant callers that make WGS calls more likely to be false-positives, as I experienced first-hand.

This fall, I was lucky enough to be a part of a course at ISMMS where we learned about genomics by analyzing our own whole genome sequencing results, which was graciously paid for by the school [1]. Amazingly, the cost of genome sequencing dropped from $5000 to $1500 (or even $1000) in just this past year, but it’s still a significant investment in our education by ISMMS and I appreciate it. According to the course director, Michael Linderman, there’s only on the order of around 1000 people with access to their own whole genome sequencing results and the ability to interpret them, which puts us in a pretty small, fortunate group. That said, it probably won’t be a small group for long, since just over the past few months, Veritas Genetics announced that it will offer WGS alongside analysis commercially for a new low-price of $999 [2].

I already had access to single-nucleotide polymorphism (SNP) array results from 23&Me, so a very basic question was what kind of data I could get from having my whole genome sequenced that I didn’t already have access to. First, some terminology: the difference between a SNP and a normal genetic variant is that the alternate allele of a SNP must be present in at least 1% of the population. Not surprisingly, most of the papers published about genomics on PubMed study the effect of SNPs, in large part because those are the variants for which there is sufficient power to address biomedical questions robustly. So I already had access to the majority of the well-studied variants through my SNP data. So from one perspective, going from the ~300,000 SNPs that I got from 23&Me to the ~3,000,000,000 base pair calls in the human genome seems like a classic case of the big data trap: collecting more data without any point. And I’ll freely admit that I’ve fallen victim to this tendency at least a few times in my life.

Upon a little bit more literature and soul searching about what I expected to learn, it became apparent that what whole genome sequencing is best at is detecting very private variants – that is, unsurprisingly, things that are present in less than 1% of the population. Any such rare variants that I would found might be present just in my immediate family a countable number of generations back, or they might even be found only in me. But these rare variants can add up to a fairly non-trivial number. As it turns out, the average person has about 100 heterozygous loss of function variants, which includes stop insertions, frameshift mutations, splicing mutations, and large deletions [3]. And since my dad was on the older side when I was born, and older male age is associated with more new genetic variants [4], I knew that I was liable to have an especially large burden of new variants.

On the big day when our sequences had been finally aligned and the variants had been called, the first thing I did was to filter those variants down to the 2000 or so ones that were most likely to be damaging. I scanned down the gene list meticulously, looking for gene names that I recognized. Since I had to memorize a fairly large number of disease-causing genes during my preclinical med school courses, I figured recognizing a gene name would in general be a bad sign. I was relieved and felt lucky to discover no major disease-causing mutations in genes that I knew would cause major disease, such as the cancer-promoting genes BRCA1/2 [5]. Overall this process was not very efficient, but it was pretty fun.

The next time that I sat down to analyze my genetic variants, I decided to filter for variants that were likely to have an effect on the way I think. So I intersected the genes in which I had predicted function-altering variants with another list from a study [6] that measured which genes have the highest RNA expression – a proxy for “are made the most” – in neurons. Here’s a plot of the results:

Screen Shot 2016-02-22 at 6.31.02 PM

The green dot represents the gene in which I have a predicted damaging mutation with the strongest expression in neurons, which is the gene SYN2. The protein that this gene codes for is thought to be selectively produced in synapses, where it probably plays a role in synaptic vesicle transport [7]. Synaptic vesicles, in turn, are what neurotransmitters are stored in before are they are released into the synaptic cleft to communicate with the postsynaptic neuron. You might think of them as the “cargo trucks” of the synapse, storing and carrying around the payload of neurotransmitters before they are sent to the next neuron. So naturally, I became curious about what the effect of that variant might be.

First, I took a look at what my actual predicted variant in the SYN2 gene was. Specifically, I was predicted to have a frameshift mutation, due to the deletion of a CGCGA sequence at chromosome 3, position 12,046,269. In general, frameshift mutations are pretty cool. DNA is made into proteins three nucleotides at a time, so mutations in multiples of three only alter a small number of amino acids. But if a frameshift mutation messes up this three nucleotide reading frame, then the whole rest of the protein is totally different. What was predicted to happen in my version of the SYN2 protein is that, 66 nucleotides later after the frameshift, a new stop signal was introduced. So I would have 22 amino acids in my version of SYN2 that are not found in most people, and then the protein was predicted to end. Although it’s fun to speculate that maybe those 22 amino acids could turn me into a mutant supergenius if I could just learn how to tap into its mythical synaptic powers, most likely my predicted mutant version of SYN2 would be simply degraded. And since I’m predicted to be heterozygous for the mutation, my non-mutated version of SYN2 could simply pick up the slack. That said, in the absence of compensation, I’d be expected to have ~50% less of this key synaptic protein than the average person.

Naturally, next I did a search for the functional role of a loss of function mutation in SYN2. The first paper I found [7] had the suddenly ominous title: “SYN2 is an autism predisposing gene: loss-of-function mutations alter synaptic vesicle cycling and axon outgrowth.” Specifically, this paper showed that two missense (amino-acid changing) and one frameshift mutation were found in male individuals with autism spectrum disorder, but none were found in male controls with autism spectrum disorder. They also showed that neurons lacking SYN2 have a lower number of synaptic vesicles ready to be released from their synapses, which is consistent with the predicted role of SYN2. I had some qualms about this paper, like the fact that they extrapolated from SYN2 homozygous knock-out mouse studies to humans that were heterozygous for a loss-of-function variant in SYN2, and indeed the mouse study that they built upon did not find a phenotype in SYN2 heterozygous knock-out mice [8]. But overall, this study was a sign that my predicted frameshift mutation might really be playing a significant functional role.

Given that I also had access to SNP data from both of my parents through 23&Me, my next step was to find out which of them I inherited the predicted SYN2 frameshift variant from, so that I could figure out which of my parents I would be able to subsequently blame for all of my problems. But this is where things took another unexpected turn. In order to discover which of my parents was the culprit, I had to analyze the raw reads in the Integrated Genome Viewer (IGV), to find another tagging SNP that I could also see in the data from 23&Me. But when I actually looked at the reads, what I discovered here instead was way more homozygous variation (seen via the single-colored vertical lines) relative to the reference genome than I expected:

Screen Shot 2016-02-22 at 6.30.38 PM

This homozygosity of the variants is surprising and makes us suspicious that maybe there’s something going on other than just the mutation – maybe there was a problem in aligning my reads to the reference genome. And indeed, for technical reasons that are beyond the scope of this essay, in class we aligned to the hg19 build of the reference genome, which as it turns out, happens to differ from the hg38 reference genome at this region pretty substantially. And when I aligned one of the individual sequencing reads against the hg38 reference at this location, what I detected was not a deletion, but rather an insertion of 12 base pairs. Since 3 divided by 12 is a whole number, 4, that means that this is an in-frame mutation, which is much less likely to have the serious loss-of-function effect that a frameshift mutation would. And indeed, looking at the DNA sequence that was inserted, it appears that the insertion is probably due to a tandem repeat, with one mismatch:

Screen Shot 2016-02-22 at 6.29.56 PM

So, to recapitulate, analyzing the raw reads using the updated reference genome, I found out that likely I do not have a frameshift mutation in SYN2 after all. That said, the potential presence of a tandem repeat expansion within the coding sequence of SYN2 – leading to four extra amino acids in that protein – is itself pretty interesting and could still have some sort of a biological effect. After all, this protein is likely a key component of the cargo truck for my neurotransmitters.

In summary, I think I can say that if you’ve had your SNP data analyzed, that’s going make up the lion’s share of digestible information. However, there are likely to be some interesting things for you to learn from having your WGS data analyzed as well. First, although I didn’t/haven’t yet found any rare variants in my genome that might significantly increase my risk of disease in a potentially actionable way, I certainly could have. You don’t bring a life jacket on a boat because you think you’re going to fall overboard – you bring it because you might. Second, it was enlightening to learn first-hand about the lack of adequate tools for analyzing genomes, especially at the variant calling and variant analysis steps. We really are in the Wild West era of genomics. This is both exciting and motivating. I now have a better idea of what it is like to have a likely false positive variant call like I had with SYN2.

Finally, getting your genome sequenced isn’t just about your own health – it’s also about your family’s health and the health of society at large. For example, I’m also in the process of donating my whole genome sequencing data to the Personal Genome Project (I’ve already put up my VCF file). If you have access to SNP data and/or you want to try to have your whole genome sequenced, and you are willing to make the data publically available, then you should consider joining too. I think that by pooling genome and phenotype data in an open way, we’re going to make some discoveries that will improve human health in a big way.


[1]: Linderman MD, Bashir A, Diaz GA, et al. Preparing the next generation of genomicists: a laboratory-style course in medical genomics. BMC Med Genomics. 2015;8:47.

[2]: whole-genome-barrier-300150585.html

[3]: Macarthur DG, Balasubramanian S, Frankish A, et al. A systematic survey of loss- of-function variants in human protein-coding genes. Science. 2012;335(6070):823- 8.

[4]: Kong A, Frigge ML, Masson G, et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature. 2012;488(7412):471-5.

[5]: You might be wondering, if you’re male, then why are you worried about a BRCA mutation? Well, although BRCA mutations are much more dangerous in women, they can also increase the risk of certain cancer types in men. For example, according to one study with 1000 participants, there is around a 5-fold increased risk for prostate cancer in men with a BRCA2 mutation. See: Kote-jarai Z, Leongamornlert D, Saunders E, et al. BRCA2 is a moderate penetrance gene contributing to young-onset prostate cancer: implications for genetic testing in prostate cancer patients. Br J Cancer. 2011;105(8):1230-4.

[6]: Zhang Y, Chen K, Sloan SA, et al. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci. 2014;34(36):11929-47.

[7]: Corradi A, Fadda M, Piton A, et al. SYN2 is an autism predisposing gene: loss-of- function mutations alter synaptic vesicle cycling and axon outgrowth. Hum Mol Genet. 2014;23(1):90-103.

[8]: Greco B, Managò F, Tucci V, Kao HT, Valtorta F, Benfenati F. Autism-related behavioral abnormalities in synapsin knockout mice. Behav Brain Res. 2013;251:65- 74.