Category Archives: Biostatistics

A branch of statistics with a focus on biological systems — in my case almost always human health.

Parsing the NIH Reform Debate

I was recently alerted to Martin Kulldorff’s Blueprint for NIH Reform — a document that’s stirred some intense reactions among my colleagues. A few view it as a needed critique of systemic inefficiencies. Most regard it as an ideological Trojan horse—an attack on science dressed as reform. So where does the truth lie?

The short answer is: it’s complicated—and the messenger matters.

Kulldorff, once a Harvard professor and biostatistician, became a polarising figure during the COVID-19 pandemic for promoting ideas widely dismissed by the mainstream scientific community, including opposition to lockdowns, masking, and even some aspects of vaccination policy. He was also a co-author of the controversial Great Barrington Declaration, which called for herd immunity through natural infection — a strategy many experts considered unscientific and dangerous at the time.

This background understandably colors how his recent proposals are received.

But here’s the nuance: the Blueprint itself raises a number of ideas that aren’t inherently fringe. Calls for reforming NIH grant structures, enhancing academic freedom, incentivising open science, and streamlining peer review are echoed by many researchers across disciplines — including those with no ties to politicised public health debates. Frustrations with bureaucratic inefficiencies and perverse incentives in scientific funding are real and shared.

Where it becomes tricky is in the framing. Kulldorff doesn’t just argue for reform — he implies that current structures are suppressing truth, and that controversial views (like his own during the pandemic) have been silenced not because they lack merit, but because of groupthink or institutional bias. That framing, for many, crosses the line from constructive critique into undermining the scientific process itself.

There’s also a risk that pushing for more “openness” in what research gets funded — while laudable in theory — could result in resources being diverted to low-evidence, high-noise pursuits. Or, as one colleague aptly put it, “sending the ferret down an empty warren.” Science thrives on curiosity, but it also requires discipline and evidence-based filters.

Venue choice also matters. If this proposal were intended as a serious intervention into science policy, it might have been published in a mainstream medical or policy journal where it could be openly debated across the full spectrum of scientific opinion. Instead, it was published in the Journal of the Academy of Public Health — a platform co-founded and edited by Kulldorff himself, with close ties to politically conservative and contrarian public health figures. That choice raises questions about whether the article is seeking reform through consensus, or carving out space for alternative narratives that have struggled to find support in mainstream science.

So how should we engage with this?

Acknowledge the valid points: There is room — and need — for reform in how science is funded, reviewed, and communicated.
Be vigilant about context: Not all calls for reform are neutral. Motivations and affiliations matter, especially when public trust is on the line.
Defend the integrity of science: We can advocate for better systems without abandoning the core principles of evidence, rigor, and accountability — including fair peer review and a balance of risk and reward.

In the end, this is not a binary question of “pro-science” vs “anti-science.” It’s about how science evolves, who gets to shape that evolution, and what values we prioritise along the way — openness, yes, but always in service of evidence and public good.

This is an independent submission, edited by D.D. Reidpath.

What is the optimal number of broken jaws?

I was chatting with a friend recently about the COVID-19 response in different countries. Reflecting on her own country, she said, “It is so hard to know what is right!”; that is, it is so hard to know what the right response to COVID-19 should be.

The variation, for instance, in countries’ lockdown responses is substantial, but which country is doing the right thing? In some countries, there has been no lockdown. The government asked the people to be sensible. In other countries, the government legally confined people to their homes — only one person was allowed out at very specific (restricted) times to buy essentials. Given these two policy extremes (be sensible and house arrest), which one is the right one, and how do you know?

An economist, I have forgotten who once asked tongue-in-cheek, what is the optimal number of dead babies? The very purpose of such a crass question is to make you stop and think. What tradeoffs are you prepared to make to save the lives of babies? Sure, you could be lazy, condemn the questioner as immoral (for even asking you to think), and declare zero dead babies to be the right number. As a simple policy proposition, if zero dead babies is the right number, then all the resources of society should be aimed at preventing neonatal deaths. ALL RESOURCES! Until the policy goal has been achieved, there is more work to be done to reduce the number. One dead baby is too many!!! Farmers may farm, but only to produce the food that supports the workforce that is striving to reduce baby deaths to zero. Teachers may teach, but only to educate the people to fill the jobs to support the policy goal to reduce baby deaths to zero. There is very limited use for art, music, cinema, sport, fashion, restaurants, etc. They will all have to go! If five-year-old deaths increase, that is something to live with, just as long as we can save another baby.

At this point, you’re probably thinking, well that’s stupid. That’s not what I meant when I said the optimal number of dead babies is zero. What I meant was something more along the lines of, “In an ideal world there would be zero dead babies”. Equally, if you were asked about poverty or crime, or amazing works of art, you presumably would have stated the ideals in terms of zero poverty, zero crime, and lots more wonderful art. And this is quite a different proposition. An ideal world is not ideal in virtue of its achievement of a single goal. It is ideal in having achieved all sorts of different outcomes. And that is why the real and the ideal do not intersect. In the real world, we do not achieve the ideal anything. We seek to achieve many ideals, and realistically, we hope to make progress against them, knowing that there is always more to be done. In striving to improve the societal position against a basket of goals, we allocate limited resources and make trade-offs.

This is one part of the COVID-19 problem, and, as my friend observed, why it is so hard to know what is right. What is the right number of COVID-19 deaths? There are lots of important, rational debates to be had around this topic because it is about the tradeoffs we are prepared to make against a basket of societal goals against the myopic achievement of one. Muscular public health responses — effective house arrest — are very good at reducing the number of new COVID-19 cases. They are also very effective at increasing domestic violence, increasing depression, lowering child immunisation rates, degrading child education, increasing poverty and increasing unemployment. If the societal goal should be zero COVID-19 deaths, what is the optimal number of broken jaws, suicide attempts, measles encephalitis cases, illiterate and enumerate children, beggars, and soup kitchens?

All these issues, under normal circumstances, are things of concern to Public Health and maybe, one day, they will be again.

Another part of the COVID-19 problem is that, whether a government “did the right thing” will be determined in hindsight, and by making (inadequate) historical comparisons between the outcomes across countries’. In democracies, at least in the short-term, “did the government do the right thing?” will often be decided at the ballot box. This will surely get the answer wrong. In less-than-democracies, astute rulers will write the history books themselves ensuring that, without regard to the outcome, the government did the right thing.

One of the main reasons that “it is so hard to know what is right!” is that we rarely have a societal view about the long term goals we wish to achieve and the tradeoffs we are prepared to make. Furthermore, we are reluctant to accept the fact that one can do the right thing and still fail. We assume that the right course of action will, by definition, result in success. We are prospective Kantians and retrospective Utilitarians.

Donald Trump’s BMI: getting the measure of the man.

I find myself fascinated by a pointless lie because it is inescapably tragic. All it can do is diminish the person in the eyes of others. And this brings us to Donald Trump’s height. In January 2018, the Physician to the President, Ronny L. Jackson MD asserted that Donald Trump was 6’3″ tall (1.90m). This is so unlikely to be true, that it stretches credulity. There is no reason for Jackson to lie spontaneously about a patient’s height, and it seems probable that he was encouraged to add a few inches by the President himself.

When asked to self report height both men and women in the US tend to overstate it. Burke and Carman have suggested that overstating height is motivated by social desirability — you can never be too tall. There is ample evidence of Donald Trump’s (misplaced) search for the socially desirable with respect to his hair, his tan, his ethnicity, his intelligence and now his height.

In 2018 we learnt that Donald Trump was officially not quite Obese (body mass index (BMI) <30), and in 2019 he had nudged over the line into the obese range (BMI ≥30). Overstating height creates a problem in the calculation of BMI — which is mass (in kilograms) divided by height (in meters squared). Given that Donald Trump is likely shorter than 1.9m (6’3″), and probably closer to 1.854m (6’1″) this will have implications for whether he was really obese in 2018 (not just overweight as stated by his Physician) and just how obese he probably is (Figure 1).

Figure 1: Donald Trump’s BMI in 2018 and 2019 given different assumptions about his height [R-code here].

In 2018 Donald trump was just below the obese category if and only if he was really 6’3″ (1.9m) tall. At any height less than that he was obese in 2018 and he is obese today. His most likely true height given comparisons with others (cf, Barack Obama) is 6’1″, and this puts him comfortably in the obese range.

Misrepresenting one’s height does not create a problem if the lie is reserved for others — except perhaps in a political sense. Problems arise if one deludes oneself. Telling others that you are taller and healthier than you really are is one thing; if you lie to yourself you cannot properly manage your health.

Prevalence of sexual assault at Australian Universities is … non-zero.

A few days ago the Australian Human Rights Commission (AHRC) launched Change the course, a national report on sexual assault and sexual harassment at Australian universities lead by Commissioner Kate Jenkins. Sexual assault and sexual harassment are important social and criminal issues, and the AHRC report is misleading and unworthy of the gravity of the subject matter.

It is statistical case-study in “how not to.”

The report was released to much fanfare, receiving national media coverage including TV and newspapers, and a quick response from universities. “At a glance …” the report highlights among other things:

30,000+ students responded to the survey — remember this number, because (too) much is made of it.
21% of students were sexually harassed in a university setting.
1.6% of students were sexually assaulted in a university setting.
94% of sexually harassed and 87% of sexually assaulted students did not report the incidents.

From a reading of the survey’s methodology, any estimates of sexual harassment/assault should be taken with a shovel-full of salt and should generate no response other than that of the University Of Queensland’s Vice-Chancellor, Peter Høj‘s, that any number greater than zero is unacceptable. What we did not have before the publication of the report was a reasonable estimate of the magnitude of the problem and, notwithstanding the media hype, we still don’t. The AHRC’s research methodology was weak, and it looks like they knew the methodology was weak when they embarked on the venture.

Where does the weakness lie? The response rate!!!

A sample of 319,252 students was invited to participate in the survey. It was estimated at the design stage that between 10 and 15% of students would respond (i.e., 85-90% would not respond) (p.225 of the report). STOP NOW … READ NO FURTHER. Why would anyone try to estimate prevalence using a strategy like this? Go back to the drawing board. Find a way of obtaining a smaller, representative sample, of people who will respond to the questionnaire.

Giant samples with poor response rates are useless. They are a great way for market research companies to make money, but they do not advance knowledge in any meaningful way, and they are no basis for formulating policy. The classic example of a large sample with a poor response rate misleading researchers was the Literary Digest poll to predict the outcome of the 1936 US presidential election. They sent out 10 Million surveys and received 2.3 Million responses. By any measure, 2.3 Million responses to a survey is an impressive number. Unfortunately for the Literary Digest, there were systematic differences between responders and non-responders. The Literary Digest predicted that Alf Landon (Who?) would win the presidency with 69.7% of the electoral college votes. He won 1.5% of the electoral college votes. This is a lesson about the US electoral college system, but it is also a significant lesson about the non-response bias. The Literary Digest had a 77% non-response rate; the AHRC had a 90.3% non-response rate. Who knows how the 90.3% who did not respond compare with the 9.7% who did respond? Maybe people who were assaulted were less likely to respond and the number is a gross underestimate of assaults. Maybe they were more likely to respond and it is a gross overestimate of assaults. The point is that we are neither wiser nor better informed for reading the AHRC report.

Sadly, whoever estimated the (terrible) response was even then, overly optimistic. The response rate was significantly lower than the worst-case scenario of 10% [Response Rate = 9.7%, 95%CI: 9.6%–9.8%].

In sharp contrast to the bad response rate of the AHRC study, the Crime Victimisation Survey (CVS) 2015-2016, conducted by the Australia Bureau of Statistics (ABS) had a nationally representative sample and a 75% response rate — fully completed! That’s a survey you could actually use for policy. The CVS is a potentially less confronting instrument, which may account for the better response rate. It seems more likely, however, that recruiting students by sending them emails is neither sophisticated enough nor adequate.

Poorly conducted crime research is not merely a waste of money, it trivialises the issue. The media splash generates an illusion of urgency and seriousness, and the poor methodology means it can be quickly dismissed.

If there is a silver lining to this cloud, it is that AHRC has created an excellent learning opportunity for students involved in quantitative (social) research.

Addendum

It was pointed out to me by Mark Diamond that a better ABS resource is the 2012 Personal Safety Survey, which tried to answer the question about the national prevalence of sexual assault. A Crime Victimisation Survey is likely to receive a better response rate than a survey looking explicitly at sexual assault. I reproduce the section on sample size from the explanatory notes because it highlights the difference between a well conducted survey and the pile of detritus reported by AHRC.

There were 41,350 private dwellings approached for the survey, comprising 31,650 females and 9,700 males. The design catered for a higher than normal sample loss rate for instances where the household did not contain a resident of the assigned gender. Where the household did not contain an in scope resident of the assigned gender, no interview was required from that dwelling. For further information about how this procedure was implemented refer to Data Collection.

After removing households where residents were out of scope of the survey, where the household did not contain a resident of the assigned gender, and where dwellings proved to be vacant, under construction or derelict, a final sample of around 30,200 eligible dwellings were identified.

Given the voluntary nature of the survey a final response rate of 57% was achieved for the survey with 17,050 persons completing the survey questionnaire nationally. The response comprised 13,307 fully responding females and 3,743 fully responding males, achieving gendered response rates of 57% for females and 56% for males.