Tag Archives: Trust in Science

Ideology and the Illusion of Disagreement in Empirical Research

There is deep scepticism about the honesty of researchers and their capacity to say things that are true about the world. If one could demonstrate that their interpretation of data was motivated by their ideology, that would be powerful evidence for the distrust. A recent paper in Science Advances ostensibly showed just that. The authors, Borjas and Breznau (B&B), re-analysed data from a large experiment designed to study researchers. The researcher-participants were each given the same dataset and asked to analyse it to answer the same question: “Does immigration affect public support for social welfare programs?” Before conducting any analysis of the data, participant-researchers also reported their own views on immigration policy, ranging from very anti- to very pro-immigration. B&B reasoned that, if everyone was answering the same question, they would be able to infer something about the impact of prior ideological commitments on the interpretation of the data.

Each team independently chose how to operationalise variables, select sub-samples from the data, and specify statistical models to answer the question, which resulted in over a thousand distinct regression estimates. B&B use the observed diversity of modelling choices as data, and examined how the research process unfolded, as well as the relationship of the answers to the question and researcher-participants’ prior views on immigration.

B&B suggested that participant-researchers with moderate prior views on immigration find the truth–although they never actually say it that cleanly. Indeed, in the Methods and Results they demonstrate appropriate caution about making causal claims. However, from the Title through to the Discussion, the narrative framing is that immoderate ideology distorts interpretation—and this is exactly the question their research does not and cannot answer—by design.

Readers of the paper did not miss the narrative spin in which B&B shrouded their more cautious science. Within a few days of publication, the paper had collected hundreds of posts and it was picked up in international news feeds and blogs. Commentaries tended to frame pro-immigration positions as more ideologically suspect.

There are significant problems with the B&B study, however, which are missed or not afforded sufficient salience. To understand the problems more clearly, it helps to step away from immigration altogether and consider a simpler case. Suppose researchers are given the same dataset and asked to answer the question: “Do smaller class sizes improve student outcomes?” The data they are given includes class size, test scores, and graduation rates (a proxy for student outcomes). On the surface, this looks like a single empirical question posed to multiple researchers using the same data.

Now introduce a variable that is both substantively central and methodologically ambiguous, a measure of the students’ socio-economic disadvantage. Some researchers treat socio-economic disadvantage as a covariate, adjusting for baseline differences to estimate an average effect of class size across all students. Others restrict the sample to disadvantaged pupils, on the grounds that education policy is primarily about remediation or equity. Still others model heterogeneity explicitly, asking whether smaller classes matter more for some students than for others. Each of these choices is orthodox. None involves questionable practice, and all of them are “answering” the same surface question. But each corresponds to a different definition of the effect being studied and, most precisely, to a different question being answered. By definition, different models answer different questions.

In this setting, differences between researchers analyses would not normally be described as researchers answering the same question differently. Nor would we infer that analysts who focus on disadvantaged students are “biased” toward finding larger effects, or that those estimating population averages are distorting inference. We would recognise instead that the original prompt was under-specified, and that researchers made reasonable—if normatively loaded—decisions about which policy effect should be evaluated. B&B explicitly acknowledge this problem in their own work, writing: “[a]lthough it would be of interest to conduct a study of exactly how researchers end up using a specific ‘preferred’ specification, the experimental data do not allow examination of this crucial question” (p. 5). Even with this insight, however, they persist with the fiction that the researchers were indeed answering the same question, treating two different “preferred specifications” as if they answer the same question. It would be like our educationalists treating an analysis of outcomes for children from socio-economically deprived families as if answered the same question as an analysis that included all family types.

B&B’s immigration experiment goes a step further, and in doing so introduces an additional complication. Participant-researchers’ prior policy positions on immigration are elicited in advance of their data analysis, and then B&B used that as an organising variable in their analysis of participant-researchers.

Imagine a parallel design in the education case. Before analysing the data, researchers are asked whether they believe differences in educational outcome are primarily driven by school resources or by family deprivation. Their subsequent modelling choices—whether to focus on disadvantaged pupils, whether to emphasise average effects, whether to model strong heterogeneity—are then correlated with these priors. Such correlations would be unsurprising. If you think disadvantage is more important than school resources to student outcomes, you may well focus your analysis on students from deprived backgrounds. It would be a mistake, however, to conclude that researchers with strong views are biasing results, rather than pursuing different, defensible conceptions of the policy problem.

Once prior beliefs are foregrounded in this way, a basic ambiguity arises. Are we observing ideologically distorted inferences over the same shared question, or systematic differences in the questions being addressed given an under-specified prompt? Without agreement on what effect the analysis is meant to capture, those two interpretations cannot be disentangled. Conditioning on ideology (as B&B did) therefore risks converting a problem of an under-specified prompt into a story about ideologically biased reasoning. This critique does not deny that motivated reasoning exists, or that B&B’s research-participants were engaged in it. They simply do not show it, and the alternative explanation is more parsimonious.

The problems with the B&B paper are compounded when they attempt to measure “research quality” through peer evaluations. Researcher-participants in the experiment are asked to assess the quality of one another’s modelling strategies, introducing a second and distinct issue. The evaluation process is confounded by the distribution of views within the researcher-participant pool.

To see this, return again to the education example. Suppose researchers’ views about the importance of family deprivation for educational outcomes are normally distributed, with most clustered around a moderate position and fewer at the extremes. A randomly selected researcher asked to evaluate another randomly selected researcher will, with high probability, be paired with someone holding broadly similar views (around the middle of the distribution). In such cases, the modelling choices are likely to appear reasonable and well motivated, and to receive high quality scores. The evaluation implicitly invites the following reasoning: “your doing something similar to what I was doing, and I was doing high quality research, therefore you must be doing high quality research as well”.

By contrast, models produced by researchers in the tails of the distribution will more often be evaluated by researchers further away from their ideological view. Those models may be judged as poorly framed or unbalanced—not because they violate statistical standards, but because they depart from the modal conception of what the broadly framed question is about. Under these conditions, lower average quality scores for researchers with more extreme priors may reflect distance from the dominant framing, not inferior analytical practice. B&B, however, argued the results show that being ideologically in the middle produced higher quality research.

The issue here is not bias but design. When both peer reviewers and reviewees are drawn from the same population, and when quality is assessed without a fixed external benchmark for what counts as a good answer to the question, peer scores inevitably track conformity to the field’s modal worldview. Interpreting these scores as evidence that ideology degrades research quality is wrong.

B&B’s paper is useful. It shows that ideological commitments are associated with the questions that researchers answer. Cleanly, that is as far as it goes. Researchers answer the questions they think are important. The small, accurate interpretation is not as impressive a finding as “ideology drives interpretation”, but B&B’s research is most valuable where it is most restrained. The further it moves from firm ground describing correlations in researchers’ modelling choices towards the quick-sand of diagnosing ideological distortion of inference, the worse it gets. What they present as evidence of bias is more reasonably understood as evidence that their framing question itself was never well defined. Through its narrative style, and not withstanding quiet abjurations against causal inference, the paper invites the conclusion that researchers working on a divisive, politically salient topics simply find what their ideologies lead them to find. And taken at face-value, it licenses the distrust of empirical research on contested policy questions.