Category Archives: Biostatistics

A branch of statistics with a focus on biological systems — in my case almost always human health.

Sharing data while not sharing data

There has been a major shift among journals towards making data available at the time of publication.  The PLoS stable of journals which includes PLoS Medicine, PLoS Biology, and PLoS One, for example, have a uniform publication policy that is quite forthright about the need to share data.

I have mixed feelings about this.  I have certainly advocated for data sharing and (with Pascale Allotey)  conducted one of the earliest empirical investigations of data sharing in Medicine.  I can understand, however, why researchers are reluctant to provide open access to data. The data can represent hundreds, thousands, or tens of thousands of person-hours of collection and curation. The data also represent a form of Intellectual Property in the development of the ideas and methods that lead to the data collection. For many researchers, there may be a sense that others are going to swoop in and collect the glory with none of the work. There have certainly been strong advocates for data sharing where the motivation looked to be potentially exploitative (see our commentary).

I recently stumbled across a slightly different issue in data sharing.  It arose in an article in PLoS One by Buttelmann and colleagues. Their study looked at whether great apes (Orangutans, Chimpanzees and Bonobos) could distinguish in a helping task between another’s true and false beliefs.  The data set comprised 378 observations from 34 apes in two different studies, and they made their data available … as a jpg file.  A small portion of it appears below, and you can download the whole image from PLoS One.

Partial data from Buttleman et al. (2017)

It seems strange to me to share the data as an image file.  If you wanted people to use the data, surely you would share it as a text file, CSV, xlsx, etc.  If the intention was to satisfy the journal requirements but discourage use, then an image file looks (at first glance) to be a perfect medium.  Fortunately, there are some excellent online tools for optical character recognition (OCR), and the one I used made quick work of the image file.  I downloaded it as in xlsx format, read it into R, and cleaned up a few typographical errors that were introduced by the OCR. You can download their data in a machine-readable form here. I have included in the download an R script for reading the data in and running a simple mixed effects model to re-analyse their study data. My approach was a little better than theirs, but the results look pretty similar. I am not sure why they did not account for the repeated measurement within ape, but ignoring that seems to be the typical approach taken within the discipline.



Babies have less than a 1 in 3 chance of recovery from a poor 1 minute Apgar score

We recently completed a study of 272,472 live, singleton, term births without congenital anomalies recorded in the Malaysian National Obstetrics Registry (NOR). We wanted to know what proportion of births had a poor 1 minute Apgar score (<4); and the likelihood that they would recover (Apgar score ≥7) by 5 minutes.

As we noted in the paper:

While the Apgar score at 5 minutes is a better predictor of later outcomes than the Apgar score at 1 minute, there is a necessary temporal process involved, and a neonate must pass through the first minute of life to reach the fifth. Understanding the factors associated with the transition from intrauterine to extrauterine life, particularly for neonates with 1 min Apgar scores <4, has the potential to improve care.

Surprisingly, to me at least, we could find no research looking at that 1 minute to 5 minute transition.  Ours was a first.

From the 270,000+ births, you can see (Figure 1) that the probability of a 5 minute Apgar score ≥7 rises dramatically as the 1 minute Apgar score increases. There is an almost straight line relationship between a 1 minute Apgar score of 1, a 1 minute Apgar score of 6, and the chance of  a 5 minute Apgar score ≥7.

Fig 1: The probability (with 95% CI) of an Apgar score at 5 min (≥7) given any Apgar score at 1 minute

A 1 minute Apgar of 6 almost guarantees a 5 minute Apgar score ≥7; in contrast a 1 minute Apgar of 3 has only a 50% chance of recovery, and a 1 minute Apgar of 1 has only less than a 10% chance of recovery.

Fortunately, only 0.6% of births had poor Apgar scores (<4).  The type of delivery (Caesarean section, or vaginal delivery) and the staff conducting the delivery (Doctor or Midwife) were both significantly associated with the chance of recovery.  The challenge is working out the causal order.  Do certain kinds of delivery cause poor recovery, or are babies likely to have poor recovery delivered in particular ways?  Does the training of Doctors or Midwives exacerbate/improve the risks of poor recovery, or are babies likely to have poor recovery delivered by particular personnel?

Our study cannot answer the questions, but it does raise interesting points for future studies of actual labor room practice — questions not easily answered with registry type data.



Zika Causes Birth Defects In 1 In 10 Pregnancies

Well … Not really.  But that was the misleading headline of an article I saw in the “healthy living” section of The Huffington Post. And then chased it up to its source — an article published by Reuter‘s journalist Julie Steenhuysen.

There were 3,978,497 births in the US in 2015.  Assuming similar numbers in 2017 (and no seasonal variation which is unlikely), you would be looking at a whopping 400,000 births with a Zika virus related birth defect.  The usual rate of birth defects in the US from all causes is about 3 per 100, so with a cumulative total in excess of three times the current numbers one could anticipate a swift, dramatic (and possibly ineffective) response from the government.

Moving down from the headline, however, a very different story is revealed:

About one in 10 pregnant women with confirmed Zika infections had a fetus or baby with birth defects, offering the clearest picture yet of the risk of Zika infection during pregnancy, U.S. researchers said on Tuesday.

No longer is it 1 in 10 pregnancies. Its 1 in 10 pregnancies with Zika.  The facts are not half as dramatic as the headline.  What am I talking about?  “Not half as dramatic”?  The total number of pregnancies in the US Zika Pregnancy Register for 2016–2017 (on 8 April 2017) was 1,311. Fifty-six of the pregnancies resulted in liveborn infants with birth defects, and 7 of the pregnancies were associated with losses with birth defects.  That just doesn’t sound as impressive a number as the headline suggested.  Undoubtedly personally tragic, but far from as significant a population health issue.