An R package to retrieve evolutionary selection calculations for gene lists

Earlier this year I became interested in learning about which genes in the genome are more evolutionarily novel or selected on by evolution. This can be informative for learning about disease states.

For example, since Alzheimers dementia is present in humans, arguably some non-human primates (more on that in a subsequent post), and probably no other animals, genes coding proteins that are associated with its development for it are, all things equal, more likely to have had recent evolutionary selection. The APOE gene is a canonical example of this, since APOE alleles other than ε4 don’t exist in other animals. Of course, this is coarse, but it’s another layer of evidence in addition to many others.

A commonly used way of quantifying this is with the dn/ds ratio. This is computed via the quotient of the number of non-synonymous amino acid changes (thus, dN for Non-synonymous change) to the number of synonymous amino acid changes (thus, dS). As an example of what this ratio can tell you, any gene with dN/dS > 1 is usually considered to have undergone positive selection, an example being the FOXP2 gene, which has been associated with language development.

A common method for finding dn/ds values for particular genes in particular species is to use the estimates calculated by Ensembl, the methodology for which you can read about here, using the biomaRt query package. I’ve written some code to do this for a list of genes (just to be clear, in R terms, actually a character vector), which turned out to be mildly annoying, so I decided to make it available for others.

If you want to download this as a package, and you have the devtools packages installed, then the following R commands should do the trick:


This is still very much under development and was largely an exercise in me learning how to make my first package (for more, see Hilary Parker’s excellent tutorial here). Please let me know if you have any questions or problems!