Uncategorized

The Priest of Sepsis

Bayes’ theorem applied to neonatal sepsis diagnosis

The Man of the Cloth

Late nights in our neonatal intensive care unit, I am often asked to evaluate a baby with potential symptoms of sepsis. In small babies especially, those potential symptoms are incredibly broad, so the evaluations are always hard. Usually the request comes with a pretty strong hint from the nurse that we should do lab testing for sepsis. I decline to order those tests sometimes because of my deep devotion to my favorite priest.

This exceptional man of the cloth has been dead since 1761. Born in Southeast England and educated in Edinburgh, he followed his father into the Presbyterian ministry. He penned one treatise on “Divine Benevolence” that I’ve never read and another in defense of Newton’s calculus (1) that I’ve also never read but seems to have been well received at the time. He was in his grave before his work of genius was discovered, the work profound enough to spread its wisdom over 250 years to 2 am in our NICU.

Rev Thomas Bayes
Apparently there’s no evidence this is actually what Rev Thomas Bayes looked like, but it’s all we’ve got.

Like many brilliant insights, Thomas Bayes‘ theorem is quite simple. P(A|B) = [P(B|A) * P(A)] / P(B). That is, for 2 events A & B, the probability of A given B is the probability of B given times the overall probability of A divided by the overall probability of B. At first encounter in med school, it struck me as rather banal. Yet it informs every decision we make in clinical medicine…or at least it ought to.

Bayes in Clinical Practice

Consider an example I often pose to trainees. Say we think the risk of sepsis in a particular baby or in the overall population is 1%. Imagine (more on just how imaginative this is to follow) our lab tests for sepsis are nearly perfect – 99% sensitive (% of true sepsis cases they detect) and 99% specific (% of cases they detect which are truly sepsis rather than false positives). We do the tests. They come back positive. Now, knowing that, what is the probability that baby actually has sepsis? Is it 99%? 90%? 75%?

The correct answer, thanks to the genius of Bayes: from an initial probability of 1%, a test with 99% sensitivity and specificity – just about the best test possible – yields a post-test probability of merely 50%.

Hard to believe? Let’s do the math. Assume we have 10,000 babies to illustrate the point. 1% of them, or 100, actually have sepsis. We said the sensitivity was 99%, so our test will be positive in 99 of those septic babies while 1 will be falsely negative. We also said the specificity was 99%, so of the babies without sepsis, the test will only be falsely positive in 1% of them. Of course, there are 9,900 (i.e., 10,000 total – 100 septic) of those non-septic babies, so that leaves us with 99 false positives in our population. Thus, we have 99 true positives, 99 false positives and no way to distinguish between them – a 50% chance.

True SepsisNot Sepsis
Total100
10,000 total * 1% prevalence
9,900
10,000 total *
(1 – 1% prevalence)
+ve Test99
100 sepsis cases * 99% sensitivity
99
9,900 not sepsis *
(1 – 99% specificity)
-ve Test1
100 sepsis cases * (1 – 99% sensitivity)
9,801
9,900 not sepsis *
99% specificity

Bayes in the NICU

Of course, in the NICU, we’d gladly take a 50% chance of correctly identifying sepsis and would treat all 198 babies without hesitation. The risk of 99 unnecessary antibiotic courses is far outweighed by the risk of missing a single case of neonatal sepsis. Unfortunately, we are rarely so lucky.

The classic first test we use for sepsis in the complete blood count (CBC). Most folks look at the ratio of immature to total white blood cells (I:T ratio). The performance of this test varies considerably (depending on the baby’s age and the cutoff you choose), but in the best case, it offers a sensitivity of 78% and a specificity of 75% (2). Again starting with a 1% pre-test risk of sepsis, that gives a post-test probability of 3%. 78 true positives, 2475 false positives. Ouch. So bad, in fact, that one would be perfectly justified in looking at the test result, looking back at the baby and, if the baby looked well, ignoring the test and continuing close observation of the baby rather than starting antibiotics.

Thus, our first Bayesian conclusion – with a low initial (pre-test) probability, anything short of a truly exceptional test adds so little information that it may not be worth doing the test.

Fortunately, our situation in the NICU is just a little better than this as we also use the C-reactive protein (CRP) test. Data vary, but most folks accept the CRP is very sensitive but not at all specific. So, when we do an I:T together with a CRP, we could simplify the analysis by calling them a single test with sensitivity 100% & specificity 75%. Indeed, this is what I see most physicians do implicitly in their clinical practice. That helps a little…

Combining the I:T & CRP, a positive test takes a pre-test probability of 1% all the way up to a post-test probability of…4%.

If your clinical suspicion that a baby has sepsis is low, doing our current best tests adds very little information.

This stark conclusion begs two questions before we can put it all into clinical practice:

  1. At what pre-test probability does I:T & CRP testing add clinically relevant information?
  2. What is the actual pre-test probability of sepsis in our population?

Pre- & Post-Test Probabilities for the I:T + CRP

Fortunately for us clinicians, the Bayesian post-test probability rises rapidly as the pre-test probability rises:

The I:T + CRP combo gets good certainty once the pre-test probability gets up around 40%. Of course, when trying to avoid baby sepsis, we’d settle for far less than good certainty. Zooming in:

Once the pre-test probability gets to around 5%, the post-test probability reaches ~17% (3).

Even at that point, however, we have found a “probability gain” of only 12% from our labs over the initial clinical assessment – our exam, vital signs, risk factors, etc – and we should expect at least 4 out of 5 babies treated this way not to have sepsis (4).

The Pre-Test Probability of Sepsis

Given that we’ve seen how much hinges on the pre-test probability, it would be great to know what it is. Unfortunately, this is a tough number to pin down and so ends up being where the art of NICU medicine comes in.

The overall risk of sepsis in American babies, at least those born after 34 weeks gestation, is 0.005 – 0.1% – at least an order of magnitude below our thought experiments above. (5)

Even with the high end of that pre-test probability range, a positive I:T + CRP yields a post-test probability of just 0.3%. At that level, far better to examine the babies and test only those with symptoms that increase their pre-test probability. Indeed, this is just what highly cited Pediatrics guidelines suggest.

What about a baby with some worrying clinical symptoms? Fortunately, my former teacher Mike Kuzniewicz and colleagues explored this question using a huge dataset from the Kaiser system. They grouped symptoms into 2 buckets. Clear symptoms of newborn illness had a likelihood ratio of 21.2. For equivocal symptoms, it was 5.0 (6).

Thus, in an otherwise “normal”, term baby with clear symptoms – e.g., requiring CPAP for respiratory support – the pre-test probability jumps from 0.1% to 2.1% (7). Assuming a positive I:T + CRP result, we have a post-test probability of 8%. I think most would agree on treating such a baby with empiric antibiotics while awaiting the blood culture results. Then, it is worth keeping in mind that we fully expect 92% of such babies to have a negative blood culture and so be off antibiotics within 48 hours.

Preemies

In premature infants, it is harder to get concrete data. Thus, there are a few caveats in the footnotes, and if you have better data, please send them my way (8). The baseline prevalence of sepsis in premature infants is substantially higher, if still low in absolute terms, at ~0.7%. Thus, with a likelihood ratio of 21 for clearly concerning symptoms, we’d update our pre-test probability to ~15%. Unsurprisingly then given our graph above, a positive I:T + CRP then yields a post-test probability of ~40%.

Necrotizing enterocolitis (NEC) is another preemie condition with potentially devastating consequences, subtle clinical signs and very poorly discriminating diagnostic tests. I may do a dedicated post on the topic. In a nutshell, it’s just like the statistical conundrum here described for sepsis but worse.

Where’s My Bayesian Robot?

If you’ve read this far, a) thank you and b) you can tell I absolutely love this domain of applied statistics. Indeed, in my career to date, I can only think of a few physicians who love it more than I do, and several of their names are cited here. Even so, I struggle to apply Bayes theorem rigorously in clinical practice. With thousands of tests at our disposal, memorizing the sensitivity and specificity of each is nigh impossible. I don’t know anyone who consistently does Bayes’ calculation of post-test probability in their head. Even such a superhuman would still need some way of estimating pre-test probabilities for a host of clinical scenarios.

Bayesian inference is one of many areas where clinical medicine desperately needs better automated decision support. We frail human physicians need Bayesian robots to help us practice better medicine.

Computers are hard to beat when it comes to storing numbers and multiplying them together. Hounding medical students to compete with the robots makes little sense. I dream of a milieu where students learn why Bayes’ theorem is so important for their work…and then are handed digital tools that make it easy to apply (9).


Wonderful visualizations of Bayes’ theorem, along with many other statistical concepts are on the Seeing Theory project.

As I hope is obvious, nothing in the post, or indeed the entire website, should be construed as medical advice or any sort of diagnostic or treatment recommendation.


Footnotes

  1. In the Principia Mathematica & related works in which he pretty much invented calculus as we now know it, Newton coined the term fluxion to refer to what we’d now call a gradient or tangent line. That is, a fluxion was a change in a value in an infinitesimally small fragment of time. If you were anything like me when I learned calculus in high school, you thought that was a curious concept but figured “hey, we put a man on the moon with calculus” and shrugged it off. In Newton’s pre-moon landing day, things were not so simple. The concept of a limit (as the time interval approaches zero) was not formalized until 1821, but philosophers had been wrangling with infinitesimally small fragments of time since at least Zeno ~440 BC. Thus, in 1666, when Newton figured out most of calculus / fluxions, it was a crazy and controversial idea. Newton himself seemed troubled by it in his writings, and he was definitely not the only one. Most famously in 1734, Bishop George Berkeley wrote a scathing critique in which he said Newton (and Leibniz) were not just wrong but also godless heathens. Our man Bayes wrote in support of Newton in 1736.
  2. The best case for the I:T ratio is a baby > 4 hr old. Unfortunately, we often feel pressed to make treatment decisions based on an initial CBC obtained shortly after birth. In that case, the specificity & sensitivity are 62% & 73% respectively. Furthermore, the I:T ratio is, of course, continuous not dichotomous. A baby with 99 immature cells out of a 100 is higher risk than one with 20, but in practice, not many babies, and certainly not many whose management is uncertain, are far enough from the threshold for that to matter much. As above, these data are from my former UCSF mentor Tom Newman’s 2014 paper.
  3. Obviously, this conclusion must vary greatly depending on the clinical scenario and should not be construed as advice. For example, in the NICU, we have the luxury of continuous monitoring, both from technology and top-notch, experienced nurses. In few other settings is it so easy to defer testing & empirical treatment in lieu of further clinical monitoring.
  4. Once upon a time, many were inclined to call these cases of “culture negative sepsis” and to keep treating the babies for a full course of antibiotics. Fortunately, that practice seems to be on the wane. Recent studies affirming the sensitivity of blood culture tests as the gold standard for genuine sepsis, worthy of treatment, have helped with that change. But by thinking carefully about Bayes’ theorem, we might have arrived at that conclusion sooner. As our analysis here suggests, most clinicians are probably implicitly – and sometimes unwittingly – using post-test probability treatment thresholds around 10-20%. We need not invoke serious methodological flaws in blood cultures to explain why so many come back negative despite positive I:T + CRP results. It is simply that our screening tests generate many false positives despite the absence of dangerous bacteria to grow in the blood culture.
  5. These data actually quantify the risk of bacteremia which is not quite the same thing as sepsis but is a fair proxy in this population. The paper includes some discussion of contaminants (i.e., “bacteremia” without sepsis) as well.
  6. Clear symptoms include things like need for respiratory support (anything more than supplemental oxygen for at least 2 hours and/or high flow nasal cannula), need for vasoactive drugs for hemodynamic stability or evidence of seizure or serious perinatal depression. Equivocal symptoms were defined as at least 2 abnormalities – from among tachycardia, tachypnea, temperature instability & increased work of breathing – for at least 4 hours.
  7. A 2.1% updated pre-test probability of sepsis in a baby with clear symptoms including mechanical ventilation & vasoactive medications may not seem like much given that level of illness. There are 2 explanations for this that derive from the nature of the data problem Kuzniewicz et al have to solve. First, there are many things besides sepsis that could make a baby that sick, and so the overall probability sepsis is behind it all is lower than we might guess when sepsis is all we’re talking about. Mike partially addresses this by including risk factors like the mother’s GBS status elsewhere in the model. Second, like Dr Newman does for the I:T ratio, Mike’s group solves a thorny data problem by using a dichotomous variable. Certainly, a baby on mechanical ventilation is sicker than one on high flow nasal cannula, but breaking those data up into many levels of sickness creates all kinds of other data problem. Thus, they pragmatically leave this a dichotomous variable and let us in the clinic infer the finer levels of sickness. Just about any baby sick enough to be mechanically ventilated will also have gotten empiric sepsis treatment.
  8. The main caveat with this analysis is that the likelihood ratios from Kuzniewicz et al were developed in a term population. As far as I know, no similar analysis has been done in a preterm population. I don’t think anyone doubts that similar symptoms have directionally similar likelihood ratios, but what those LRs actually are is anyone’s guess. I’ve just used the ones for term babies for simplicity. If you know of better data, please send them my way!
  9. To their great credit, Mike Kuzniewicz et al have built one such user-friendly Bayesian robot for newborn sepsis evaluations and, indeed, early data suggest it has improved testing and treatment practices.
Tagged , , ,