Recognizing How Cognitive Biases Shape—and Distort—Clinical Evidence

Prof. Baxendale's new publication "How to be a Better Doctor" explains cognitive biases that plague gender medicine research

Spotlight Home

February 21, 2025

Key points:

Personal and group biases can perpetuate medical practices that lack rigorous evidence, particularly in off-label treatment areas such as hormone therapy for gender dysphoria.
Physicians face unique challenges in overcoming bias, as acknowledging new evidence may require them to confront the possibility of past harm to patients.
Pediatric gender medicine demonstrates the dangers of conflicting evidence hierarchies, with systematic reviews and consensus-based guidelines often at odds.
Cognitive biases can lead to the suppression of evidence, as seen in the delayed NIH study on puberty blockers.
Promoting critical thinking and prioritizing high-quality evidence over opinion-based consensus are essential steps to mitigate bias in medicine.

This week, the British Journal of Hospital Medicine published “How to be a Better Doctor: Recognizing How Cognitive Biases Shape—and Distort—Clinical Evidence,” by neuropsychologist Sallie Baxendale of University College London. Baxendale details how personal bias can often supersede objective evidence, perpetuating practices that might be ineffective or even harmful.

Baxendale shares the example of past use of mercury to cure a range of illnesses, including infections, despite a lack of evidence of the benefits and disregarding obvious signs of harm. The practice of prescribing mercury persisted into the 20th century, with "the beleaguered patient battling both the disease and mercury poisoning, the effects of which included excessive salivation and tooth loss, neurological dysfunction, major organ failure and ultimately coma and death."

Baxendale’s critique zeroes in on biases in “off-label spaces,” where treatments with regulatory approval for one indication are regarded as "standard practice" for another condition but without rigorous study. As Baxendale writes, “it is in this treatment gap that the combination of cognitive biases and group dynamics can significantly influence the perception of the evidence base … [and] can perpetuate harmful practices long after the objective evidence points in a different direction.” Baxendale points out that pediatric gender medicine provides a striking example of the dynamic of physicians ignoring evidence and persisting with a practice which has not been found to be beneficial, and one that disrupts a normal, physiological developmental process.

Baxendale explains that cognitive biases and problematic group dynamics allow non-beneficial or even harmful treatments to persist long after such treatments should have been abandoned. The article offers a primer on a range of cognitive biases, from familiar examples like “confirmation bias” and the “sunk cost fallacy” to less well-known influences like “status quo bias” and the “bandwagon effect.” One particularly powerful bias in medicine is the “authority principle,” which stems from the trust placed in experts and leaders. When new evidence challenges these entrenched beliefs, Baxendale observes that it can generate “powerful cognitive dissonance,” leading individuals to dismiss contrary evidence to maintain psychological comfort. For physicians, acknowledging such evidence can be especially daunting, as it may mean acknowledging that well-intentioned actions have caused inadvertent harm.

A Tale of Two Hierarchies

Baxendale’s analysis assumes a sharper focus when addressing the “sacred cows” of medicine—deeply embedded practices shielded from scrutiny by a mix of opinion and belief. Once again, pediatric gender medicine provides a striking example, marked by divergent standards: near-prohibition on puberty blockers in the UK versus permissive policies in US “sanctuary states.” Baxendale points out both sides sincerely believe they are pursuing the best interest of gender dysphoric youth, yet these are fundamentally incompatible positions.

Baxendale suggests that two hierarchies—the hierarchy of evidence and the hierarchy of disagreement—provide a path forward to build professional consensus that is based on evidence, rather than being guided by unchecked cognitive biases.

In the evidence pyramid, each level contributes evidence, but with increasing scientific certainty as one travels up the pyramid. Baxendale points out that when conclusions from the lower levels of the evidence pyramid contradict those that emerge from the higher levels, it is the top-of-the-pyramid evidence that provides more reliable conclusions. Unfortunately, it is common in gender medicine to prioritize evidence from the lowest levels of the pyramid, such as physicians' own opinions backed only by poor quality studies, over findings from systematic reviews of evidence, which reside at the top of the pyramid.

To illustrate the conflict between systematic reviews and expert consensus in pediatric gender medicine, Baxendale contrasts the Cass Review’s policy recommendations to sharply restrict the reliance of hormonal interventions in gender-dysphoric youth, which are based on systematic reviews of evidence, with WPATH's Standards of Care, which promote the use of endocrine and surgical interventions of minors based on the WPATH authors' “consensus.“ (Of note, initially WPATH intended to create evidence-based guidelines, but abandoned its evidence-based approach after the systematic reviews it commissioned failed to yield the hoped-for results). Baxendale notes that “expert consensus is frequently wrong in medicine, sometimes with catastrophic consequences,” which underscores the urgent need for a shared evidence hierarchy to ground professional consensus.

"Comparing conclusions from evidence at the bottom of the pyramid (expert consensus) to that at the top (systematic reviews) is a sobering lesson in the impact of cognitive bias on medical practice. While we expect certainty and confidence to reduce as more robust evidence is required to support a conclusion, we hope the evidence emerging from every level points in the same direction. However, all too often, conclusions from studies at the top of the evidence pyramid directly contradict those from lower down. This is particularly stark when expert consensus goes head-to-head with evidence from systematic reviews." (Baxendale, 2025).

Debate is how science self-corrects and human knowledge grows, but just like not all forms of evidence are created equal, not all forms of argumentation carry the same weight in resolving scientific disputes. To discuss this Baxendale provides a complementary tool: the Disagreement Pyramid, which she explains is taken from Graham's Hierarchy of Disagreement. This tool can help evaluate the credibility of the arguments in scientific debate, and foster constructive discourse by placing reasoned refutations and counterarguments above name-calling and tone-policing. The framework can not only restore decorum to debates but also help professionals challenge their own biases more effectively.

Baxendale invites the readers to apply the Disagreement Pyramid to a recent scientific dispute over the Cass Review. She cites McNamara et al.’s 2024 attempt to critique the Cass Review (now thoroughly discredited in two peer-reviewed papers) as a textbook example of how clinical biases can be substituted for solid evidence and sound reasoning. Unfortunately, McNamara et al.'s harsh criticism of the Cass Review operates on the low levels of the disagreement pyramid. Baxendale contrasts it with Cheung et al.'s response to McNamara et al., which leveraged the upper levels of the pyramid, with robust counterarguments and refutations. (After the publication by Baxendale, another high quality peer-reviewed rebuttal of McNamara et al. was published).

Baxendale’s work underscores the collective responsibility of addressing bias in research and medical practice. By prioritizing rigorous evidence and fostering open-minded dialogue that adheres to the Disagreement Pyramid, the medical community can ground patient care in science rather than subjective belief.

SEGM Take-away

Two notable examples illustrate how cognitive biases among leading clinicians in pediatric gender medicine may have compromised not only the quality of care they provide to their own patients, but also biased the scientific record as a whole. When the UK first adopted the Dutch Protocol approach, it initiated an "early intervention" study which was intended to evaluate the practice. However, the then-director of UK’s Gender Identity Development Service (GIDS) withheld the results after the study failed to demonstrate any improvements in youth treated with puberty blockers (with as many as 1/3 experiencing deterioration). Only after pressure from researchers (including SEGM advisor Michael Biggs), journalists, and scrutiny in the Keira Bell court proceedings was the study finally released in 2021. Even then, the GIDS authors camouflage the disappointing study results by making them appear positive, as explained in a recent analysis. The study's failure to replicate the psychological benefits of puberty suppression was passed over by simply stating that it "identified no changes in psychological function,” then attempting to refocus the reader on the positive “overall patient experience" with puberty blockers, and describing the troubling evidence of problems in bone density of the treated children as merely “consistent with suppression of growth.”

A similar situation appears to have taken place in the NIH-funded study in the U.S., led by a group of America's leading gender clinicians. In October 2024, the New York Times (NYT) reported on the delayed release of a multi-year, multimillion-dollar NIH study investigating the mental health impacts of puberty blockers on gender dysphoric youth. According to the NYT article, the lead investigator chose not to publish findings that showed no mental health improvements, in part because the findings contradicted her experience of seeing benefits firsthand. Following the NYT story, the lead investigator accused the reporter of making “false claims,” yet offered no explanation for why—after nearly a decade since its launch and nine years since its protocol was approved—crucial research findings remain unpublished. Further, despite it being a funding requirement under NIH's 2003 Data Sharing Policy to share its data in an anonymized form with other researchers, SEGM is aware of several researchers whose submitted NIH Data Use Agreement requests to date: all have been ignored by the project team.

What will it take for the medical community to begin providing safe and effective care for gender dysphoric youth? Although there are no foolproof solutions, Baxendale suggests a combination of strategies, including targeted training and comprehensive clinical feedback mechanisms. She advises clinicians to challenge initial judgments, seek second opinions, and remain open to new perspectives. Even so, Baxendale concedes that “no surefire methods exist to completely eradicate biases.”

The past makes it uncomfortably clear that non-beneficial and even self-evidently harmful treatments can persist for a long time, remaining entrenched until new treatments emerge that replace them. As Baxendale reminds us, only the advent of antibiotics finally curtailed the use of mercury as a medical ”treatment.” In that light, the Cass Review's directive to develop “an explicit clinical pathway … for non-medical interventions” (p. 157) for youth confronting gender-related distress may well transpire to be its most significant and consequential recommendation of all.