How to do Bayesian shrinkage estimation on proportion data in Stan

Attention conservation notice: In which I once again document my slow march towards Bayesian fundamentalism. Not of interest unless you are interested in shrinkage estimation and Stan.

As I describe in my essay on trying to determine the best director on imdb, you usually can’t trust average ratings from directors with a small number of movies to be a good estimate of the “actual” quality of that director.

Instead, a good strategy here is to shift the average rating from each director back to the overall median, but to shift it less the more movies that person has directed. This is known as shrinkage estimation, and in my opinion it’s one of the most underused statistical techniques (relative to how useful it is).

The past few weeks I’ve been trying to learn the Bayesian modeling language Stan, and I came across a pretty good model for shrinkage estimation using a beta-binomial approach in this language (described in 1, 2). Here’s the model, which uses batting averages from baseball players.

In order to determine the amount of shrinkage in this model, I plotted the “actual” (or “raw”) average versus the estimated average using this model, and colored the data points by the log of the at bats (lighter blue = more at bats).

Screen Shot 2016-03-10 at 7.39.03 PM

As you can see, players with more at bats have less shrinkage. At the extreme, two players who are 0/1 on the season still have an estimated average of ~ 0.26 (which is the median of the “actual” batting averages).

Notably, there are fewer players whose averages are decreased due to the shrinkage estimation than the reverse. Perhaps managers are inclined to give players a few more shots at it until they prove that their early success was just a fluke.

Four thoughts on how amyloid beta mediates cognitive decline

A new paper addresses the question “how does the presence of amyloid beta (Ab) on neuroimaging correlate with memory deficits?” in a variety of interesting ways. Here are some assorted thoughts:

1) Their strongest direct association (in Tbl 2) is the -0.46 correlation between logical memory scores and Ab positivity on PET imaging in MCI patients. “Positive” status is based on a previously defined threshold basically meaning “lots of Ab.” This is much stronger than the -0.1 correlation between logical memory and Ab positivity on PET imaging in cognitively normal controls. Per their discussion, that Ab correlates well with memory deficits in MCI and not cognitively normal people has been found before, and is rightly part of the justification for why people say that Alzheimers pathology is not normal.

2) This is actually quite nitpicky, since it is not the central point of their article, but to me it’s worth pointing out that the line in their methods, “we examined whether the p value for the association between Ab and cognition changed from significant to nonsignificant when adjusting for GM volume or FDG-PET” is troubling, since differences in significance are not necessarily significant differences. But they address the same question in three ways, and this is just one of them.

3) I like their use of ridge regression in the case of correlated predictors. I became especially interested in how it is a special case of the more general Bayesian ridge regression method, and tried out an example of it, the code for which you can find here, if you’re interested in wasting 1-3 minutes of your life.

4) Stepping back, and although this point has been made before, the general proposition that Ab positivity on PET imaging has a moderate-to-strong correlation with worse memory in MCI populations is a non-trivial finding and is in clear support of the amyloid hypothesis. While it seems that tau correlates more strongly with memory deficits, this is still an important point for those who doubt the AH [1] to keep in mind.


Mattsson et al, 2015. Brain structure and function as mediators of the effects of amyloid on memory. Pubmed.


[1]: Then again, people may be right to doubt the AH. I’m not trying to use “doubter” in the pejorative sense here. But, I don’t immediately see an easier way to phrase this.

How Neurologists Make Diagnoses

Nice article from Dhand et al in the journal Neurology. The authors identified six neurologists with > 10 years of experience and both interviewed and observed them in practice.

Using these experiences, they then classified the extent to which a number of major diagnoses rely on three modalities of information: clinical (history & physical), laboratory and electrodiagnostics (e.g., EMG), and neuroimaging.

Along with other results, they then present a “diagnosis cube” that visualizes how diagnosis types vary along these dimensions. Note that all diagnosis types receive a rating of “4” in the clinical category, so the variance is in the other dimensions.

Diagnosis cube; doi: 10.1212/WNL.0b013e3182a840c7

Diagnosis cube; doi: 10.1212/WNL.0b013e3182a840c7

It will be interesting to see whether certain diagnoses, especially psychiatric, increase in their reliance upon non-clinical information modalities as our understanding of their pathophys improves. Also of note is that Alzheimers is included as a subset of dementia, which is C4L1N2.


Dhand et al. 2013 How experienced community neurologists make diagnoses during clinical encounters. Neurology, doi: 10.1212/WNL.0b013e3182a840c7. (Image included via educational fair use.)

A Mock Case of Neonatal Meningitis

Attention Conservation Notice: 1138 words using stiflingly simple computations to go through an example of finding the cause of made-up case of a medical condition that is unlikely to ever affect you personally. Also, I am not a doctor.

Imagine that a one and a half week year old girl comes in to the hospital with a four day illness consisting of cyanosis, pain unresponsiveness, neck tenderness, and temperature instability. You perform a lumbar puncture and the CSF has elevated neutrophils and decreased glucose, suggestive of bacterial invasion. Which organism is most likely to be causing the infection?

In order to calculate prior probabilities, let’s use this data set of empirical frequencies from recent years. The most likely cause is Group B Strep (GBS; p = 0.46), followed by E. coli (p = 0.15).

Below, the prior probabilities that each organism in the data set will cause the infection are shown in a bubble chart using ggplot2. The probability is proportional to the area (not the radius) of the circle. The x axis denotes the typical gram stain of that organism; Acinetobacter and Serratia are considered gram variable.

The textbook empirical regimen for neonatal meningitis is cefotaxime and ampicillin. Most of the possible bacterial causes of neonatal meningitis are susceptible to cefotaxime, while ampicillin is used to treat Listeria and Enterococus.

It is desirable to reduce the antibiotic spectrum as much as possible, so as soon as you know that the agent causing your infection is susceptible to one of the two antibiotics, the administration of the other is typically stopped.

The y-axis, therefore, denotes the resistance of each of the bacteria to cefotaxime, calculated using this pdf data set. Note that Acientobacter, Enterococcus, and Listeria are actually “off the charts” insofar as they don’t even have their susceptibility determined as a MIC. Their values are set to an artibrarily high value for visualization purposes.

Prior Probabilities For Neonatal Meningitis

prior probabilities for neonatal meningitis

Gram Stain Test

In order to narrow down the possibilities, we will first perform a gram stain of the CSF. Imagine that this is the result (modified from here):

gram positive bugs under the microscope

gram positive bugs under the microscope

In addition to the color indicating gram positive bacteria, this is informative because it allows us to evaluate the morphology of the bacteria. Among the gram positive organisms, S. pneumo, GBS, S. aureus, and Enterococcus are all cocci, while Listeria is a rod. Among the gram variable organisms, Acinetobacter are coccobaccili and Serratia are rods.

Theoretically, we could distinguish between S. pneumo, GBS, S. aureus, and Enteroccocus on the basis of how the cocci are distributed within the slide (i.e., in pairs, chains, or clusters), but that type of information is slightly more challenging to put into quantitative form given our currently available data and technology, and it won’t be as informative.

Gram stain sensitivities and specificities (e.g., for S. pneumo) are each about 97.5%, which corresponds to a likelihood ratio of 39 for each of the positive cases and a likelihood ratio of 0.026 for the negative cases.

So, multiplying the likelihood that each bug is causing the infectious by its prior probability gives us the posterior probability; these are plotted for each agent below, and probabilities less than 0.01 are not shown.

probabilities after staining shows gram positive cocci

probabilities after staining shows gram positive cocci

In terms of case management, the probability of Listeria has dropped significantly, but we must still keep our hypothetical patient on ampicillin because of the possibility of Enterococcal infection, which is consistent with the gram stain and has therefore increased in probability.

Hemolysis Test

The next test that we perform is to streak the bacteria on sheep blood agar and see whether they demonstrate beta hemolysis. Imagine that this is the result:

weak beta-hemolysis on sheep blood agar

beta-hemolysis on sheep blood agar

Although it is slightly weak, the region in the middle of the plate from which bacteria has been removed demonstrates beta-hemolysis. About 33% of Enterococci, 99%+ of S. aureus, 99%+ of GBS, and 1% or less of S. pneumo would show beta-hemolysis. (S. pneumo classically shows alpha hemolysis.) 
We can convert these proportions into likelihoods, use the posterior probability of the last test as the prior for this test, and multiply the prior times the likelihood to get the new posterior. The results are below: 
probabilities following beta hemolysis on sheep blood agar

probabilities following beta hemolysis on sheep blood agar

Although Enterococcus is now less likely to be the bug because a smaller proportion of its typical clinical isolates show beta-hemolysis, it is still very possible, and the patient should remain on the course of ampicillin.

Lancefield Test

At this point, we pull out one of our big diagnostic guns to really narrow down the possibilities: the Lancefield antigen test. Specifically, we’ll use the BBL Streptocard Acid Latex Test (pdf). Here is the result of that test, with  Lancefield antigens A, B, and C in the first three positions of the upper row from left to right:


Lancefield test shows agglutination when group B antigen is added (#2)

The test has a sensitivity of 98% and a specificity of 99% (well they claim 100%, but it’s a small sample size), which yields the high likelihood ratio of 98.

As a result of the positive result on this test and none of the other antigens, Group B Strep is much more likely to be the cause of meningitis in this patient; this is reflected in its dominance of the probability mass below.

probabilities after positive test for Lancefield antigen B

probabilities after positive test for Lancefield antigen B

At this point, you could probably take the patient off of ampicillin, since GBS is susceptible to cefotaxime.


  • In reality, most of the error in each of the tests is likely due to contamination, and since this same contamination might affect all three of the tests in the same way, it is somewhat foolhardy to assume that the tests are independent, as we have. However, a) this type of correlation data between tests is very difficult if not impossible to find, and b) the positive result on the Lancefield antigen test in particular is unlikely to be confounded by contamination. Ways to mitigate this this are to a) take multiple samples of CSF and b) be scrupulous in laboratory procedures.
  • The probability of a false negative is considered uniform for all alternative possibilities. In reality, some bugs are probably much more likely to falsely appear to have a different set of properties on a particular test. If there are any bugs that are “great imitators” on a few such tests, then small differences in probability would begin to aggregate, and this method might miss one of those bugs really badly.


  • Raw data and computations can be found in this Google Spreadsheet.
  • R code for making the bubble plots is on GitHub.
  • Thanks to Mike Chary and Joe Lerman for help on this. All mistakes are mine.