Category Archives: Policy

Ideology and the Illusion of Disagreement in Empirical Research

There is deep scepticism about the honesty of researchers and their capacity to say things that are true about the world. If one could demonstrate that their interpretation of data was motivated by their ideology, that would be powerful evidence for the distrust. A recent paper in Science Advances ostensibly showed just that. The authors, Borjas and Breznau (B&B), re-analysed data from a large experiment designed to study researchers. The researcher-participants were each given the same dataset and asked to analyse it to answer the same question: “Does immigration affect public support for social welfare programs?” Before conducting any analysis of the data, participant-researchers also reported their own views on immigration policy, ranging from very anti- to very pro-immigration. B&B reasoned that, if everyone was answering the same question, they would be able to infer something about the impact of prior ideological commitments on the interpretation of the data.

Each team independently chose how to operationalise variables, select sub-samples from the data, and specify statistical models to answer the question, which resulted in over a thousand distinct regression estimates. B&B use the observed diversity of modelling choices as data, and examined how the research process unfolded, as well as the relationship of the answers to the question and researcher-participants’ prior views on immigration.

B&B suggested that participant-researchers with moderate prior views on immigration find the truth–although they never actually say it that cleanly. Indeed, in the Methods and Results they demonstrate appropriate caution about making causal claims. However, from the Title through to the Discussion, the narrative framing is that immoderate ideology distorts interpretation—and this is exactly the question their research does not and cannot answer—by design.

Readers of the paper did not miss the narrative spin in which B&B shrouded their more cautious science. Within a few days of publication, the paper had collected hundreds of posts and it was picked up in international news feeds and blogs. Commentaries tended to frame pro-immigration positions as more ideologically suspect.

There are significant problems with the B&B study, however, which are missed or not afforded sufficient salience. To understand the problems more clearly, it helps to step away from immigration altogether and consider a simpler case. Suppose researchers are given the same dataset and asked to answer the question: “Do smaller class sizes improve student outcomes?” The data they are given includes class size, test scores, and graduation rates (a proxy for student outcomes). On the surface, this looks like a single empirical question posed to multiple researchers using the same data.

Now introduce a variable that is both substantively central and methodologically ambiguous, a measure of the students’ socio-economic disadvantage. Some researchers treat socio-economic disadvantage as a covariate, adjusting for baseline differences to estimate an average effect of class size across all students. Others restrict the sample to disadvantaged pupils, on the grounds that education policy is primarily about remediation or equity. Still others model heterogeneity explicitly, asking whether smaller classes matter more for some students than for others. Each of these choices is orthodox. None involves questionable practice, and all of them are “answering” the same surface question. But each corresponds to a different definition of the effect being studied and, most precisely, to a different question being answered. By definition, different models answer different questions.

In this setting, differences between researchers analyses would not normally be described as researchers answering the same question differently. Nor would we infer that analysts who focus on disadvantaged students are “biased” toward finding larger effects, or that those estimating population averages are distorting inference. We would recognise instead that the original prompt was under-specified, and that researchers made reasonable—if normatively loaded—decisions about which policy effect should be evaluated. B&B explicitly acknowledge this problem in their own work, writing: “[a]lthough it would be of interest to conduct a study of exactly how researchers end up using a specific ‘preferred’ specification, the experimental data do not allow examination of this crucial question” (p. 5). Even with this insight, however, they persist with the fiction that the researchers were indeed answering the same question, treating two different “preferred specifications” as if they answer the same question. It would be like our educationalists treating an analysis of outcomes for children from socio-economically deprived families as if answered the same question as an analysis that included all family types.

B&B’s immigration experiment goes a step further, and in doing so introduces an additional complication. Participant-researchers’ prior policy positions on immigration are elicited in advance of their data analysis, and then B&B used that as an organising variable in their analysis of participant-researchers.

Imagine a parallel design in the education case. Before analysing the data, researchers are asked whether they believe differences in educational outcome are primarily driven by school resources or by family deprivation. Their subsequent modelling choices—whether to focus on disadvantaged pupils, whether to emphasise average effects, whether to model strong heterogeneity—are then correlated with these priors. Such correlations would be unsurprising. If you think disadvantage is more important than school resources to student outcomes, you may well focus your analysis on students from deprived backgrounds. It would be a mistake, however, to conclude that researchers with strong views are biasing results, rather than pursuing different, defensible conceptions of the policy problem.

Once prior beliefs are foregrounded in this way, a basic ambiguity arises. Are we observing ideologically distorted inferences over the same shared question, or systematic differences in the questions being addressed given an under-specified prompt? Without agreement on what effect the analysis is meant to capture, those two interpretations cannot be disentangled. Conditioning on ideology (as B&B did) therefore risks converting a problem of an under-specified prompt into a story about ideologically biased reasoning. This critique does not deny that motivated reasoning exists, or that B&B’s research-participants were engaged in it. They simply do not show it, and the alternative explanation is more parsimonious.

The problems with the B&B paper are compounded when they attempt to measure “research quality” through peer evaluations. Researcher-participants in the experiment are asked to assess the quality of one another’s modelling strategies, introducing a second and distinct issue. The evaluation process is confounded by the distribution of views within the researcher-participant pool.

To see this, return again to the education example. Suppose researchers’ views about the importance of family deprivation for educational outcomes are normally distributed, with most clustered around a moderate position and fewer at the extremes. A randomly selected researcher asked to evaluate another randomly selected researcher will, with high probability, be paired with someone holding broadly similar views (around the middle of the distribution). In such cases, the modelling choices are likely to appear reasonable and well motivated, and to receive high quality scores. The evaluation implicitly invites the following reasoning: “your doing something similar to what I was doing, and I was doing high quality research, therefore you must be doing high quality research as well”.

By contrast, models produced by researchers in the tails of the distribution will more often be evaluated by researchers further away from their ideological view. Those models may be judged as poorly framed or unbalanced—not because they violate statistical standards, but because they depart from the modal conception of what the broadly framed question is about. Under these conditions, lower average quality scores for researchers with more extreme priors may reflect distance from the dominant framing, not inferior analytical practice. B&B, however, argued the results show that being ideologically in the middle produced higher quality research.

The issue here is not bias but design. When both peer reviewers and reviewees are drawn from the same population, and when quality is assessed without a fixed external benchmark for what counts as a good answer to the question, peer scores inevitably track conformity to the field’s modal worldview. Interpreting these scores as evidence that ideology degrades research quality is wrong.

B&B’s paper is useful. It shows that ideological commitments are associated with the questions that researchers answer. Cleanly, that is as far as it goes. Researchers answer the questions they think are important. The small, accurate interpretation is not as impressive a finding as “ideology drives interpretation”, but B&B’s research is most valuable where it is most restrained. The further it moves from firm ground describing correlations in researchers’ modelling choices towards the quick-sand of diagnosing ideological distortion of inference, the worse it gets. What they present as evidence of bias is more reasonably understood as evidence that their framing question itself was never well defined. Through its narrative style, and not withstanding quiet abjurations against causal inference, the paper invites the conclusion that researchers working on a divisive, politically salient topics simply find what their ideologies lead them to find. And taken at face-value, it licenses the distrust of empirical research on contested policy questions.

 

An image of a goldfish in a glass fishbowl. The bowl is on a gas stove with a blue flame under the bowl. The water is boiling and steam is rising.

When Calm Is Not Enough

Douglas Adams fans will know that The Hitchhiker’s Guide to the Galaxy has the words “Don’t Panic” inscribed on the front cover in large friendly letters. Useful advice in challenging times, because panic implies thoughtless and impulsive action.

In contrast, in a recent substack essay, Sandro Galea offers a thoughtful and philosophically grounded argument for equanimity in times of political and institutional upheaval. His call is for calm, intellectual humility, and measured action—especially in the face of uncertainty, polarisation, and erosion of trust. And his argument is rooted in moral seriousness and philosophical tradition. His invocation of stoicism, pragmatism, and the idea that passion should be channeled, not indulged, is both admirable and deeply needed in many contexts.

There are certainly lessons in the essay for me, because I am not beyond the cathartic tweet and angry rant.

Nonetheless, even while I appreciate his concern about performative outrage, alarm fatigue, and the risks of losing strategic focus, I worry more that equanimity will not fully meet the gravity of the current moment.

Calm is positioned as a centrepiece of a rational response, but in the face of authoritarian drift, this may not be true. Galea rightly warns against seeing every disagreement as existential. But in some cases, that is exactly what they are. And here I must be careful. By “existential” I do not mean that human life will cease to exist. I do mean, however, that political traditions that hold the rights of the individual as core, are at existential risk. The current political landscape—marked by widespread disinformation, open contempt for liberal norms, and attempts to consolidate executive power through legal, rhetorical, and administrative means—is not a policy disagreement. It is a strategic project to transform liberal democracy into a performative, illiberal system.

In such a context, remaining calm (which is different from not panicking) is quite possibly the least rational response. It risks underestimating the nature of the threat—a threat that deliberately weaponises chaos, disorientation, and norm erosion to exhaust democratic opposition. When “everything, everywhere, all at once” (EEAAO) is the strategy for destroying a liberal democracy, false calm doesn’t preserve clarity—it masks danger. We become the poached goldfish cooking in the ever warming water.

Equanimity can blur the line between a policy disagreement and an ethical breech. Galea urges restraint in response to funding cuts, institutional restructuring, and ideological pressure. But what if these are not just isolated matters of administrative efficiency or political difference? What if they are tools in a broader campaign of harming the very system of government best designed to preserve the interests of the people? And that is exactly what the EEAAO strategy would suggest they are—autocracy trumped up as a policy disagreement.

When thousands are dismissed from public health agencies, when HHS disseminates misinformation about vaccines, when the infrastructure for climate science is actively dismantled, when court orders are ignored, when immigration laws a weaponised to silence dissent—this is not a policy disagreement. These are tactics with critical consequences, and they demand a response that acknowledges their moral stakes. Calm analysis may aid clarity, but when calm becomes habitual, it risks normalising that which should provoke action.

Equanimity is not ideologically neutral unless both sides approach an argument with the same calm. And this is one of the structural challenges in Galea’s framing. Equanimity is easier for those with institutional protection, social capital, and professional standing. It can become a posture of the “polite center-left”—technocrats, academics, and professionals committed to liberal norms—even while those norms are being strategically exploited or dismantled by authoritarian actors.

The political right in the U.S. has repeatedly shown a greater willingness to break norms, delegitimise elections, and mobilise extra-institutional power—sometimes violently. The political left, especially its center, remains norm-bound and institutionally deferential. This asymmetry means that equanimity, when over-applied, can function less as a virtue and more as a strategic vulnerability.

The threat of violence is asymmetrical. Before the 2020 election, journalists and political scientists openly worried about the possibility of civil war in the U.S.—largely premised on a Democratic victory being treated as illegitimate by the right. Those fears proved partially correct: Trump lost, and an attempted insurrection followed. In contrast, after Trump’s 2024 victory, those concerns vanished—not because the threat disappeared, but because the side most prone to violent refusal of democratic outcomes had won.

This reveals a deeper point: a center-left government is far more likely to provoke armed reaction than an authoritarian right-wing government is to provoke institutional noncompliance. Equanimity in such a context does not meet the moment. It plays into the imbalance and helps normalise a tilted playing field.

Triage under fire requires more than calm—it requires strategic urgency. “Don’t panic” is the better guide. In a war zone, triage doesn’t require serenity—it requires adrenaline management, urgency, and the ability to act decisively under pressure. Calm may feel virtuous, but if it becomes a default stance, it can dull the moral reflexes at precisely the moment they must be sharpest.

Not every act of protest needs to be loud. The language does not have to be obnoxious. But when the fundamental institutions of public health, science, and democracy are being deliberately undermined, a more direct, even disruptive, form of resistance may not only be justified—it may be morally required.

Sandro Galea is right that equanimity is not the same as inaction. And he is right that outrage alone does not build durable progress. But like any virtue, equanimity must be applied with discernment. When the rules are being rewritten, when the democratic compact is under open threat, and when harm is immediate and lasting, too much calm may serve not wisdom but delay—and delay is its own kind of complicity.

I admire Galea’s clarity of tone and seriousness of thought. My disagreement is with how best to meet a moment shaped not by healthy debate, but by coordinated disruption. In such times, clear-eyed, unpanicked urgency may better serve the cause of justice than calm.

A dark, dystopian government data center filled with towering servers and flickering computer screens. Dust-covered books and old research papers sit abandoned, while glowing terminals display files. A lone researcher, illuminated by the cold blue light of a monitor, desperately tries to recover lost data from a corrupted drive. The atmosphere is eerie, with dim overhead lights and an air of secrecy, symbolizing the slow decay of knowledge in a forgotten digital vault.

The Purge

The Trump administration has started one of the most significant assaults on human knowledge in centuries. Well-collected, curated and communicated data are facts—an evidence base. When facts contradict a political narrative, they are dangerous. The US government has realised the danger and begun The Purge. The government will now establish new “facts” to replace old facts. Purge-and-replace is part of the process of state capture. Evidence represents dissent, and the government must crush dissent. Reality is altered.

Until a week ago, successive US governments had invested in a data, evidence-based policy enterprise with generous global access. It was a resource for the world that supported research and evidence-based decision-making. And, unless the information was classified or subject to privacy laws (e.g., HIPAA for health data), anyone could look at everything from labor and criminal justice statistics to environmental and health data.

Going, going … !

Starting late last week, government websites began to disappear; among them, the USAID website vanished without a trace. All the development evidence USAID published has disappeared. If you try to reach the website today (2 Feb, 2025), you will get a message from your internet provider informing you the site does not exist. Perhaps you have the wrong address…or maybe it was never really there. (Queue spooky music.)

Individual pages on government websites are also disappearing. The Centers for Disease Control and Prevention (CDC) webpage, for example, providing evidence-based contraceptive guidelines has vanished. A week ago, the guidelines helped people exercise their reproductive choice using the best available evidence. But facts are dangerous. The idea of personal autonomy in reproduction runs counter to the authoritarian narrative of the current US administration. CDC is being scrubbed clean.

Data are also disappearing. The Youth Risk Behavior Surveillance System (YRBSS) is a longitudinal survey of adolescent health risks coordinated by the CDC. If I search the CDC website for “YRBSS”, I get links. If I follow the links: “The page you’re looking for was not found”. This loss of data is a tragedy. A quick look at PubMed reveals the kind of research that has used YRBSS data: everything from adolescent mental health to smoking. Without those data, no one today could do the same kind of research that was done before. Trends in adolescent health are lost and we will not know about any emerging health risk factors. It is hard to know precisely why the YRBSS has disappeared. However, in keeping with the religiously conservative nature of the current US government, maybe it is adolescent sex that is too dangerous for people to know about.

The US government is not content with just removing facts. They also want CDC scientists to rewrite their research to adopt a single, approved, authoritarian view of the world. Their research must conform to the Trump government’s ideology. An approach which is oddly reminiscent of Stalin’s insistence that Soviet researchers adopt the dead-end genetic science of Trofim Lysenko.

The CDC has instructed its scientists to retract or pause the publication of any research manuscript being considered by any medical or scientific journal, not merely its own internal periodicals…. The move aims to ensure that no “forbidden terms” appear in the work. The policy includes manuscripts that are in the revision stages at a journal (but not officially accepted) and those already accepted for publication but not yet live.

It hasn’t happened yet, but I have to wonder what will happen when the US Government targets PubMed and PubMed Central—exceptional scientific resources provided free of charge to the world by the National Library of Medicine (NLM)? NLM could be directed to purge from the database all abstracted data on every journal article that contains ideas that do not support the government’s worldview: gender, transgender, climate change, vaccines, air pollution (from fossil fuels)…. Commercial providers could still abstract those articles, but the damage would be enormous.

The vaccine denier, Robert F. Kennedy, junior, is currently being confirmed as Secretary of Health. He believes the widely debunked, fraudulent claim that vaccines cause autism. What happens when he decides that the National Library of Medicine should selectively purge evidence debunking the vaccine-autism link? Will that mean vaccines cause autism in the US (a “US-fact”) but not in the rest of the world (a “fact-fact”)? Researchers in universities and institutions that can afford subscription services can avoid such excesses, but that will not be the case for many Global South researchers who rely on PubMed for their research, nor will it be the case for the general public, who also have free access to PubMed.

I have focused on health because it is the domain I know the best. There is, however, almost no factual resource of the US government that will be safe from the purge. Facts that endanger a Trump administration political narrative must not be allowed to exist.

The US government is a climate-denying administration that has again pulled out of the Paris Climate Accord. It has already targeted climate change research. Justice, labour, and population statistics that do not conform to the US government’s socially conservative, racist and xenophobic views about the world will also be in danger. Trade data that don’t support Trump’s political narrative of a “golden age” will need to be adjusted.

One of the great tragedies is that, now that the US government has shown itself to be institutionally disinterested in (or actively opposed to) facts, it has endangered the value of its entire evidence-based policy enterprise. If you visit a US government website in a year, will you trust the content? You shouldn’t. Instead, you should ask yourself what political interest influenced the information. Researchers, policymakers, journalists—everyone— will need to parse US government websites like they parse information from any other authoritarian regime. Sadly, research coming out of US universities will also require extra scrutiny. Where we trusted the voices before, now we would need to ask, has US government policy biased it, what is the nature of the bias, and can we manage the bias?

Sometimes, it will be easier to ignore US research altogether because verification carries a cost.

There are small glimmers of hope. Archive.org (the Wayback Machine) has historical snapshots of US government websites, including some data snapshots, such as the YRBSS. These snapshot are BTP (befor the purge). Unfortunately, the archive is not as easy to navigate as the World Wide Web nor as easy to navigate as dedicated government websites. The value of the archived information also relies on the snapshot being taken at the right time to capture the latest BTP information. The CDC contraceptive-use guidelines purged a few days ago, are available on archive.org from a snapshot taken on 25 December, 2024. Assuming the CDC made no BTP updates since the last snapshot, the information is up to date…for now. Of course contraceptive guidelines evolve with new data and new technology and they will be out of date in the coming years.

If we are to survive the worst damage of The Purge, other government and non-government institutions worldwide will have to step into the breach. Historical data may need to be reconstructed and curated from sources such as archive.org. The Pubmed and Pubmed Central databases should be copied before the US government corrupts them. Where US data are still available, copy them. Outside the US, we will need to put in place prospective mechanisms to collect valuable global data that we can no longer trust from US sources.

…going…

We cannot assume that the facts from US government sources will remain uncorrupted tomorrow because they are uncorrupted today. The preservation of the truth will require resources and investment.

… GONE!

Welcome to The Purge

UKRI go its A.I. policy half right

UKRI AI policy: Authors on the left. Assessors on the right

UKRI AI policy: Authors on the left. Assessors on the right (image generated by DALL.E)

When UKRI released its policy on using generative artificial intelligence (A.I.) in funding applications this September, I found myself nodding until I wasn’t. Like many in the research community, I’ve been watching the integration of A.I. tools into academic work with excitement and trepidation. In contrast, UKRI’s approach is a puzzling mix of Byzantine architecture and modern chic.

The modern chic, the half they got right, is on using A.I. in research proposal development. By adopting what amounts to a “don’t ask, don’t tell” policy, they have side-stepped endless debates that swirl about university circles. Do you want to use an A.I. to help structure your proposal? Go ahead. Do you prefer to use it for brainstorming or polishing your prose? That’s fine, too. Maybe you like to write your proposal on blank sheets of paper using an HB pencil. You’re a responsible adult—we’ll trust you, and please don’t tell us about it.

The approach is sensible. It recognises A.I. as just one of the many tools in the researcher’s arsenal. It is no different in principle from grammar-checkers or reference managers. UKRI has avoided creating artificial distinctions between AI-assisted work and “human work” by not requiring disclosure. Such a distinction also becomes increasingly meaningless as A.I. tools integrate into our daily workflows, often completely unknown to us.

Now let’s turn to the Byzantine—the half UKRI got wrong—the part dealing with assessors of grant proposals. And here, UKRI seems to have lost its nerve. The complete prohibition on using A.I. by assessors feels like a policy from a different era—some time “Before ChatGPT” (B.C.) was released in November 2022. The B.C. policy fails to recognise the enormous potential of A.I. to support and improve human assessors’ judgment.

You’re a senior researcher who’s agreed to review for UKRI. You have just submitted a proposal using an A.I. to clean, polish and improve the work. As an assessor, you are now juggling multiple complex proposals, each crossing traditional disciplinary boundaries (which is increasingly regarded as a positive). You’re probably doing this alongside your day job because that’s how senior researchers work. Wouldn’t it be helpful to have an A.I. assistant to organise key points, flag potential concerns, help clarify technical concepts outside your immediate expertise, act as a sounding board, or provide an intelligent search of the text?

The current policy says no. Assessors must perform every aspect of the review manually, potentially reducing the time they can spend on a deep evaluation of the proposal. The restriction becomes particularly problematic when considering international reviewers, especially those from the Global South. Many brilliant researchers who could offer valuable perspectives might struggle with English as a second language and miss some nuance without support. A.I. could help bridge this gap, but the policy forbids it.

The dual-use policy leads to an ironic situation. Applicants can use A.I. to write their proposals, but assessors can’t use it to support the evaluation of those proposals. It is like allowing Formula 1 teams to use bleeding-edge technology to design their racing cars while insisting that race officials use an hourglass and the naked eye to monitor the race.

Strategically, the situation worries me. Research funding is a global enterprise; other funding bodies are unlikely to maintain such a conservative stance for long. As other funders integrate A.I. into their assessment processes, they will develop best-practice approaches and more efficient workflows. UKRI will fall behind. This could affect the quality of assessments and UKRI’s ability to attract busy reviewers. Why would a busy senior researcher review for UKRI when other funders value their reviewers’ time and encourage efficiency and quality?

There is a path forward. UKRI could maintain its thoughtful approach to applicants while developing more nuanced guidelines for assessors. One approach would be a policy that clearly outlines appropriate A.I. use cases at different stages of assessment, from initial review to technical clarification to quality control. By adding transparency requirements, proper training, and regular policy reviews, UKRI could lead the way with approaches that both protect research integrity and embrace innovation.

If UKRI is nervous, they could start with a pilot program. Evaluate the impact of AI-assisted assessment. Compare it to a traditional approach. This would provide evidence-based insights for policy development while demonstrating leadership in research governance and funding.

The current policy feels half-baked. UKRI has shown they can craft sophisticated policy around A.I. use. The approach to applicants proves this. They need to extend that same thoughtfulness to the assessment process. The goal is not to use A.I. to replace human judgment but to enhance it. It would allow assessors to focus their expertise where it matters most.

This is about more than efficiency and keeping up with technology. It’s about creating the best possible system for identifying and supporting excellent research. If A.I. is a tool to support this process, we should celebrate. When we help assessors do their job more effectively, we help the entire research community.

The research landscape is changing rapidly. UKRI has taken an important first step in allowing A.I. to support the writing of funding grant applications. Now it’s time for the next one—using A.I. to support funding grant evaluation.