The "Utah Review" of Hormonal Treatments for Gender-Dysphoric Minors: A Methodological Appraisal

Assessing the Trustworthiness of the Analysis, Conclusions and Recommendations

April 2, 2026

Suggested citation:
Society for Evidence-Based Gender Medicine. The “Utah Review” of Hormonal Treatments for Gender-Dysphoric Minors: A Methodological Appraisal. April 2, 2026. https://doi.org/10.5281/zenodo.19388024

In May 2025, the Utah Department of Health and Human Services released a 1,051-page evidence review of hormonal interventions for minors with gender dysphoria, accompanied by a recommendation report. This analysis, which has been colloquially termed the “Utah Review,” concluded that hormonal interventions for gender-dysphoric youth are safe and effective and that their availability should not be restricted.

Proponents of pediatric gender transition have subsequently referenced the Utah Review as a counterpoint to the U.K. Cass Review and the U.S. Department of Health and Human Services evidence review of pediatric gender medicine, both of which examined similar evidence but concluded that there is no reliable evidence demonstrating benefit of endocrine interventions and that there is substantial risk of harm. Both the U.K. and the U.S. acted on the findings of those initiatives, instituting restrictions on the practice of pediatric transitions.

Our comprehensive methodological appraisal of the Utah Review does not aim to provide a new evidence appraisal, nor does it make recommendations about whether or how endocrine interventions for gender dysphoria in youth should be regulated. Our aim is more limited: because the Utah Review’s conclusions regarding the net benefits of such interventions are demonstrably at odds with multiple existing systematic reviews of evidence, we evaluate whether the Utah Review adhered to appropriate methodological standards in the process by which it arrived at its conclusions and recommendations.

We hope our analysis will help healthcare decision-makers—patients, families, and policymakers—engage thoughtfully with the Utah Review, develop a clearer understanding of its strengths and limitations, and be better positioned to make informed decisions.

I. Executive Summary

Background

This report provides a methodological analysis of the Utah Review—a set of evidence reviews and recommendations about hormonal interventions for minors with gender dysphoria, published by the Utah Department of Health and Human Services in May 2025. The Utah Review was commissioned by the Utah Legislature in the context of a moratorium on pediatric medical transition (S.B. 16). The analysis was conducted by the University of Utah College of Pharmacy’s Drug Regimen Review Center and was presented as a “systematic review of the medical evidence,” which concluded that puberty blockers and cross-sex hormones are safe and effective treatments for pediatric gender dysphoria and that the hormonal intervention pathway should remain open for adolescents in Utah.

Methods

Because systematic evidence reviews can substantially influence clinical practice and public policy, adherence to established methodological standards is essential. With more than 90% of the Utah Review comprising analyses of primary clinical studies, we evaluated that component using two standard appraisal tools for systematic reviews of intervention effects—ROBIS and AMSTAR 2. Other components of the Utah Review—including analyses of systematic reviews and clinical guidelines—were also assessed, with a focus on whether the methods were systematic, comprehensive, and reproducible, and whether the conclusions were supported by the evidence presented. We also evaluated the recommendations arising from the evidence review to assess whether the process of moving from evidence review to recommendations followed evidence-based methods.

In addition, we obtained and analyzed several supporting documents, including the Utah Legislature's commissioning requirements, interim results presentations, relevant publications in peer-reviewed journals, and the contractual terms that governed the analysis. Our process also involved a cursory review of the professional backgrounds of the individuals and institutions involved in the Review, to assess potential conflicts of interest. These steps were undertaken to better understand the broader context for the Utah Review, including the Review’s provenance, research goals, contractual obligations and any professional or personal positions that may have influenced the research process, findings, conclusions, and recommendations.

Findings

The Utah Evidence Review consists of four primary analytic components: pharmacological agents used; primary clinical studies on treatment effects, reported separately for short- and long-term outcomes; systematic reviews; and clinical guidelines. Our analysis found that only the review of pharmacological agents was comprehensive.

Although the Review described itself as a “systematic review of the medical evidence related to the use of hormones and hormone analogs in the treatment of pediatric gender dysphoria,” and prominent groups of policymakers and academics advocating for pediatric gender transitions have referenced it as such, none of the Utah Review analyses met the basic criteria for a systematic review of treatment effects.

The two analyses that most closely resembled systematic reviews—a review of primary clinical studies and a review of systematic reviews—failed to provide an evidence synthesis that assesses the quality/certainty of the evidence, which rendered them ineligible for classification as systematic reviews. Even as non-systematic reviews, the analyses within the Utah Review suffer from profound methodological limitations, outlined below.

Failure to consider relevant evidence. All analyses, apart from the pharmacological agents review, missed relevant evidence. The review of primary clinical studies analyzed fewer than 40% of the 230 studies identified as eligible. The review of systematic reviews omitted foundational U.K. evidence reviews by the National Institute for Health and Care Excellence (NICE) and the University of York that informed evidence-based restrictions on pediatric gender transition in the U.K. The clinical guidance analysis identified only five guidelines, compared to more than 20 evaluated in a recent systematic review, omitting major clinical guidance documents, such as those from the American Academy of Pediatrics, guidelines from Australia, and Swedish and Finnish national guidelines.
Omission of key health outcomes. The researchers made an explicit decision to deprioritize the analysis of studies reporting harms including infertility, desistance/detransition, regret, and mortality. Infertility was deprioritized on the grounds that fertility harms are expected—a methodologically unjustified reason. No justification was provided for deprioritizing desistance-related studies, which was an explicit mandate from the Utah Legislature. Long-term mortality was eventually examined separately in Part II of the analysis, but findings of elevated mortality were not incorporated into the Review’s main conclusions.
Unrecognized problems in guideline and study quality. The clinical guideline analysis did not evaluate guideline quality, despite the availability of established methodologies, such as AGREE II. Instead, the Review focused on summarizing the World Professional Association for Transgender Health (WPATH) and Endocrine Society treatment recommendations and asserted that those guidelines were inherently “evidence-based” because they originated from “recognized medical authorities.” The analysis of primary clinical studies overlooked multiple sources of confounding and bias and inflated quality ratings by treating uncontrolled studies with subgroup analyses (e.g., males and females) as though each subgroup functioned as a control group for the other.
Failure to assess overall quality/certainty of evidence. While the researchers assessed individual studies for risk of bias, they failed to provide an evidence synthesis that assesses the evidence across studies for quality/certainty, which is a key step in the systematic review methodology. The researchers stated that they were “not contracted” to perform this analysis. This omission indicates that none of the analyses within the Utah Review qualify as “systematic reviews.”
Interpretive bias. The Utah Review went beyond presenting the evidence, framing the analysis in ways that favored support for pediatric transition. For example, it minimized the significance of the off-label use of puberty blockers and cross-sex hormones in minors through a spurious comparison with pediatric off-label antibiotic prescribing; devoted most of its discussion of existing systematic reviews to criticizing the Swedish review that informed Sweden’s national restrictions on youth transition, at times attributing to it methodological flaws it does not have; and ultimately argued against restricting pediatric transition by invoking “high-quality” guidelines and dismissing “regret,” even though the Review itself did not properly analyze either the quality of those guidelines or the phenomenon of regret.

Three distinct but interrelated dynamics may have contributed to the serious limitations of the Utah Review. First are the constraints imposed by the research contract itself, which allocated fewer than four months for completion of the work. An unusually broad scope of research—which attempted multiple analyses—appears to have strained the team’s ability to finish any of them to an acceptable methodological standard. Additional work conducted later, such as the analysis of long-term outcomes, does not appear to have been appropriately integrated with the prior body of work and did not inform overall conclusions. Further, the vague language of the research contract, which specified the expectation of “systematic reviews of the evidence,” but did not specifically mandate adherence to systematic review methods, may have contributed to slippage in methodological rigor.

Second, the University of Utah DRRC researchers themselves inadvertently conflated two distinct analytic steps: assessing individual studies for methodological rigor through “risk of bias” analysis and assessing the body of evidence for quality/certainty. The latter, often conducted using the GRADE methodology, explicitly considers how risk of bias in individual studies as well as patterns across studies—such as the magnitude of reported effects and their precision, the consistency of published results, and publication bias that may have omitted negative outcome reporting—influence how certain one can be about the evidence. This step is required in any analysis that purports to be a “systematic review,” yet it was neither specified in the research proposals nor carried out in any of the Utah Review’s analyses.

Third is the Review’s explicitly advocacy-driven provenance. The Utah Review appears to have been commissioned as part of a coordinated campaign by advocacy groups to supply a new evidentiary basis for lifting the Utah moratorium on pediatric gender transition enacted by S.B. 16. In such a context, in addition to adhering to a rigorous and transparent methodological framework, the management of conflicts of interest during the research and recommendation-making process should be paramount.

Unfortunately, institutional and personal conflicts of interest were neither reported nor managed. The Review did not disclose that four of the six Review advisors had significant professional involvement in pediatric gender medicine, including directly providing or overseeing clinical services at the University of Utah-affiliated adolescent gender clinic. The University of Utah DRRC team’s ongoing financial dependence on Utah DHHS contract work was not reported. There is no discussion of how conflicts of interest were managed when the leader of the Utah Review—the DHHS executive medical director for Clinical Services, Dr. Michelle Hofmann—left DHHS mid-review to join the University of Utah faculty.

Conclusion

The analytic approaches in the Utah Review do not adhere to the standards for systematic evidence reviews established by the National Academies of Sciences, Cochrane, and other expert bodies. Since the Utah Review does not contain systematic evidence reviews, it cannot serve as a reliable basis for evidence-based decision-making. Numerous conflicts of interest do not appear to have been recognized as such, or appropriately managed.

The Utah Review’s findings and conclusions contradict nearly two-dozen systematic evidence reviews. These include systematic reviews from the U.K. (the University of York, NICE, and NHS England) and North America (McMaster University, the U.S. Department of Health and Human Services). Against a backdrop of biologically plausible or even certain harms, the conclusions of these systematic reviews are consistent: the benefits of pediatric transition remain highly uncertain.

Repeated systematic reviews of the same deficient evidence base are unlikely to generate materially different conclusions. Healthcare policymakers should accept the findings of existing high-quality systematic reviews and focus on developing evidence-based policies that prioritize the well-being and long-term health of youth with gender dysphoria.

II. Detailed Analysis

1. Background

In January 2023, the Utah Legislature instituted a moratorium on initiating hormone treatments for minors diagnosed with gender dysphoria. The bill, S.B. 16, “Transgender Medical Treatments and Procedures Amendments,” specified that the moratorium could be lifted if a “systematic medical evidence review of hormonal transgender treatments” demonstrated their safety and efficacy. A published commentary indicates that the evidence review requirement was added to the final version of the bill as a result of a “coordinated effort across advocacy organizations working with the bill sponsors,” ostensibly as a pathway to reopen adolescent gender transitions in Utah.

Figure 1. Timeline of the Utah Review.

The final version of S.B. 16 contains numerous provisions regarding the requested research, which align with two types of research output:

Evidence Review. The bill called for a systematic evidence review of the short- and long-term impacts of puberty blockers and cross-sex hormones, including an assessment of the quality of the evidence. It also requested several additional analyses, such as an assessment of harms from interrupting natural puberty and estimates of desistance rates and timing.
Recommendation Report. The bill requested recommendations regarding the treatments, including identifying situations in which endocrine interventions should not be provided; specifying the information minors and parents should receive before consenting to hormone treatments; outlining best practices for how that information should be provided; and describing the “assumptions and value determinations used to reach a recommendation.”

The Legislature tasked the Utah Department of Health and Human Services (DHHS) with overseeing the research process. Utah DHHS subsequently commissioned the research from the University of Utah College of Pharmacy’s Drug Regimen Review Center (DRRC) under a contract amendment to an existing multi-year research contract.

Initially, DHHS allocated $150,000 (First Contract Amendment, Attachment D) to the initiative, and the contract stipulated that the analysis be completed in under four months (April 17-July 31 2023). When the presentation of the results in February 2024 revealed the absence of the long-term outcome analysis requested by the Legislature, the contract was amended. Utah DHHS added $33,000 to the budget and allocated an additional eight weeks for the DRRC team to complete the work (Second Contract Amendment, Attachment E). The analysis of long-term outcomes, which became "Part II" of the Utah Review document, was presented in June 2024.

The final document—a two-part, 1,051-page report titled Gender-Affirming Medical Treatments for Pediatric Patients with Gender Dysphoria—was officially submitted to Utah DHHS in August 2024 and published in May 2025.

The Utah Review document comprised multiple research outputs, summarized below:

Identification of pharmacological agents used and their FDA licensure status (Part I). The Evidence Review identified 66 pharmacological agents and determined that none are FDA-approved for the indication of gender dysphoria.
Summary of clinical practice guidelines for the treatment of pediatric gender dysphoria (Part I). The Evidence Review identified five guidelines, but omitted several influential guidance documents, such as the American Academy of Pediatrics' Policy Statement. The quality of the guidelines was not assessed.
Review of systematic reviews and/or meta-analyses (SRMAs) and primary clinical studies (Part I). Both a review of primary clinical studies and a review of existing systematic reviews were conducted. Neither analysis contained an evidence synthesis that assessed the body of evidence for quality/certainty.
Review of long-term outcomes (Part II). This analysis was added after the interim presentation of results revealed the absence of the long-term outcomes analysis that was explicitly requested by the Legislature. The analysis did not contain an evidence synthesis that assessed the body of evidence for quality/certainty, and the findings from this analysis were not incorporated into the main conclusions of Part I.
Supplementary data. Over 90% of the 1,051-page Evidence Review consists of tables and study lists produced during the analysis. The unusual practice of redacting study details to conceal clinic names and locations challenged meaningful engagement with some of the supplementary material.

The Evidence Review concluded that, according to the “consensus of the evidence,” hormonal interventions for minors are safe and effective, and recommended against treatment restrictions:

[T]he consensus of the evidence supports that the treatments are effective in terms of mental health, psychosocial outcomes and the induction of body changes consistent with the affirmed gender in pediatric GD patients. The evidence also supports that the treatments are safe in terms of changes to bone density, cardiovascular risk factors, metabolic changes, and cancer […]

Based on the reviewed evidence included in this report, it is our expert opinion that policies to prevent access to and use of GAHT for treatment of GD in pediatric patients cannot be justified based on the quantity or quality of medical science findings or concerns about potential regret in the future, and that high-quality guidelines are available to guide qualified providers in treating pediatric patients who meet diagnostic criteria (pp. 90-91).

Following completion of the Evidence Review, Utah DHHS issued a separate Recommendation Report. The Report referenced the Review’s conclusions but did not make a recommendation as to whether the moratorium should be lifted. Instead, it offered recommendations on how the provision of endocrine interventions could be restructured at the state level if the moratorium were to be lifted. In May 2025, both documents were published on the Utah State Legislature website.

In March 2026, the Utah Legislature passed H.B. 174, which converts the state’s prior moratorium on initiating gender-affirming hormone treatments for minors into a permanent prohibition. The law requires most minors currently receiving such treatments to discontinue gender-related hormonal interventions by January 28, 2027. However, minors who were already receiving cross-sex hormones and are at least 16 years old at the time the law takes effect are allowed to continue treatment. The governor signed the bill on March 18, 2026.

2. The Utah Review: Not a Systematic Review

Summary: The Utah Review was commissioned with the expectation that the analysis would follow systematic review methodology. However, the Review had serious limitations at each step, culminating in the omission of a key methodological requirement of all systematic reviews—namely, an evidence synthesis that assesses the body of evidence for quality/certainty. This disqualifies the Utah Review from being classified as a “systematic review.”

Systematic reviews are a cornerstone of evidence-based decision-making because they provide the most reliable evidence regarding the benefits and harms of medical treatments. The Utah Legislature explicitly required a “systematic evidence review” of the effects of puberty blockers and cross-sex hormones as the basis for deciding whether to lift the moratorium on their use in minors. Given the centrality of this “systematic review” requirement, it is important to determine whether the Utah Review meets established methodological standards for a systematic review.

The research protocol provides an important basis for assessing any review’s adherence to a systematic review methodology. The protocol describes the review’s objectives, eligibility criteria, and analytic methods—all of which must be set out before the analysis begins. Best practices call for protocols to be shared via registries such as PROSPERO or the Open Science Framework. When that is not feasible, the protocol should be published with the final review, with any deviations transparently disclosed and justified.

While the Utah Review references the existence of prespecified internal protocols (p. 12), it does not make them available. As a result, evaluating the research process requires analysis of multiple sources, including the S.B. 16 bill that commissioned the Utah Review, the University of Utah DRRC research proposals, the interim results presentation, and the Evidence Review and Recommendation Report.

The need for such reconstruction, rather than straightforward verification against a prespecified protocol, is itself a methodological concern because it limits the ability to independently assess the rigor of the review. Accordingly, we present a constrained analysis based on the available materials, highlighting what appear to be significant deviations from planned methods and standard research practices.

Although specific approaches to systematic reviews may vary, systematic reviews include the following core steps:

Determine the research questions and eligibility criteria
Search for evidence and select studies
Abstract data and assess risk of bias in eligible studies
Create a formal evidence synthesis for each outcome
Assess the quality/certainty of the evidence and present evidence-based conclusions

The rest of this section provides an assessment of the extent to which the Utah Review’s methodological process adheres to systematic review methodology. Subsequent sections elaborate on each analysis in detail.

Research questions, eligibility criteria, and study selection (Steps 1, 2)

The systematic review process begins by articulating key research questions in PICO (Population, Intervention, Comparator, Outcome) format. While the Utah Review articulated the definition of the “population” (patients <18 years with gender dysphoria and related conditions) and the “interventions” (hormonal interventions used in the context of treating gender dysphoria), the definitions of “comparator” and “outcome” appear to have undergone significant revisions throughout the process—often without a methodologically appropriate justification, introducing bias.

Initially, the Utah Evidence Review planned to allow all study types, regardless of whether the study used a comparative design or which comparator was used, and it also intended to treat all reported outcomes as outcomes of interest (p. 12). This broad strategy resulted in the identification of an infeasibly large number of studies. To narrow the analytic study set, the researchers created a post-hoc category of “high priority” study types and outcomes and then analyzed only studies with outcomes that met the new definition of “high priority.”

During this post-hoc revision, the researchers made the decision to not consider or to explicitly remove the following outcomes from their “high priority” list:

Fertility. The accompanying Recommendation Report states that “fertility” was not considered a high-priority outcome because “infertility is a known risk of CSHT [cross-sex hormone therapy] and was not an outcome of focus in the DRRC systematic review” (Recommendation Report, 4). This is a problematic decision given that the Legislature explicitly requested consideration of the “harms of interrupting natural puberty” and fertility counseling is a core component of pediatric gender transition protocols precisely because loss of fertility is a possible consequence.
Desistance. The Utah Legislature explicitly requested that the Utah Review address the “rates of desistance and time to desistance.” The University of Utah DRRC acknowledged that desistance and related concepts were “pointedly of interest to the legislature,” and included the question of short- and long-term rates of discontinuation in its contract with Utah DHHS and as a research objective in the Review itself. Despite this, the Review also informs readers that the researchers chose not to consider them “high priority” outcomes (Evidence Review, p. 83).
Regret. Initially, it appears that “regret” was a “high priority outcome” (see Figure 2 below). However, in the final Evidence Review, “regret” was no longer listed as such. No explanation is available for this change; instead, the Evidence Review explicitly states that although “regret” was “pointedly of interest to the legislature,” it was “not among our high-priority outcome categories for this review” (Evidence Review, p. 83).

Post-hoc exclusions of these patient-important outcomes, some of which were identified as “pointedly of interest” to the Review’s commissioning body, are inherently problematic. Furthermore, since “high-priority” outcomes were used to separate studies into “relevant” and “not relevant,” omitting outcomes pertaining to fertility, regret, or detransition/desistance from the high-priority list meant that studies focused on those outcomes were removed from the analytic set. As a result, the Evidence Review’s conclusions systematically overlooked the very domains where harms are likely to be observed.

Figure 2. “High-Priority” Studies, Presentation to the Utah DHHS Interim Committee. Hofmann, June 2024, SB16 (2023) Transgender Medical Treatments and Procedures Amendments Progress Report (1:19:54-1:26:57)

Slide 7 listing regret as a high-priority outcome in the June 19, 2024 Utah DHHS presentation

Abstracting data and risk of bias assessment (Step 3)

All studies that meet eligibility criteria typically undergo risk of bias assessment, even if only a subset is later included in the subsequent step of evidence synthesis. While the Utah researchers identified a total of 277 eligible studies, fewer than 40% (101/277) underwent data extraction and risk of bias assessment (p. 34). The removal of over 60% of eligible studies from the analysis, apparently due to “insufficient resources” (Recommendation Report, p. 4), introduced a critical risk of bias into the Evidence Review’s conclusions.

In addition, a number of the studies subjected to risk of bias analysis received inflated quality ratings, in part because uncontrolled studies with subgroup analyses (for example, males and females) were mistakenly treated as though they were comparative studies. For example, Chen et al. (2023) was given a near-perfect study quality score (8/9 on the Newcastle-Ottawa Scale) with the justification that the study’s “non-exposed cohort” was “drawn from the same community as the exposed cohort” (p. 556). However, Chen et al. lacked a comparator group—as the study's authors openly acknowledged (Chen et al., 2023, p. 249).

Evidence synthesis and quality/certainty of evidence assessment (Steps 4, 5)

An evidence synthesis that assesses the body of evidence for quality (also known as certainty) is a hallmark of any systematic review. The researchers explicitly acknowledge omitting this analysis: “We were not contracted to include a synthesis of the evidence that we found: only to assess ROB [risk of bias] and provide evidence tables summarizing safety and efficacy” (p. 90).

For clarity, risk of bias in Step 3 is assessed at the individual study level and addresses whether features of a study’s design, conduct, or analysis may have biased the estimated intervention effect. By contrast, synthesis and assessment of the body of evidence for quality/certainty are conducted for each specific outcome across studies. This can be done using GRADE (Grading of Recommendations Assessment, Development and Evaluation) or through narrative synthesis methods, as in the University of York systematic reviews.

These steps are important because they move the analysis beyond individual studies to assess what the totality of the evidence shows about a specific outcome, and how much confidence that conclusion warrants. Whether conducted using GRADE or narrative synthesis, this process requires reviewers to weigh factors such as study design and risk of bias, the consistency (or inconsistency) of findings across studies, heterogeneity, and the magnitude of observed effects to judge how trustworthy the body of evidence is.

Because the Utah Review did not undertake these essential steps, it does not meet the criteria for a systematic review. A risk of bias assessment alone is not sufficient; reviewers must also synthesize the evidence for each outcome and assess the certainty of the body of evidence.

Presentation of the Utah Review as a “systematic review”

It remains unclear why the University of Utah DRRC team, described as “experts in the methodology of evidence synthesis” by DHHS (Recommendation Report, p. 2), responded to a mandate to conduct a “systematic evidence review” with an analysis plan that omitted a key step for such a review—a formal evidence synthesis that assesses the quality/certainty of evidence.

Part of the problem may stem from the fact that the work was guided by a pre-existing evidence review contract between Utah DHHS and the University of Utah College of Pharmacy’s DRRC department. The purpose of the ongoing multi-year contract is to support routine Utah Medicaid pharmacy decision-making, such as identifying prior authorization criteria (First Contract Amendment). To support the Utah Legislature's request for a systematic review related to pediatric gender dysphoria, the contract was updated with Attachment D, “Gender Dysphoria Report Proposal,” submitted by the University of Utah DRRC team (First Contract Amendment, Attachment D).

While the contract contains general language stating that the Utah DRRC team would “perform systematic reviews of the evidence,” nothing in the main contract itself or Attachment D commits the researchers to adhering to an explicit systematic review methodology, such as performing an assessment of the body of evidence for quality/certainty using GRADE or similar evidence synthesis methods.

Although Attachment D states that the research would assess the “quality … of evidence” (First Contract Amendment, p. 6), the DRRC team equated this analysis with “risk-of-bias [ROB]” assessment of individual studies—a distinct analytic exercise. This conflation suggests either a misunderstanding of research methodology or a deliberate departure from established methods. In either case, the absence of an explicit assessment of the body of evidence for quality/certainty—an essential component of any systematic review—indicates that the Utah Review does not meet the criteria for a systematic review of evidence.

While the Utah Review is demonstrably not a systematic review, multiple stakeholder groups have described the Utah Review as a “systematic review” and emphasized that designation as a marker of methodological rigor:

The University of Utah DRRC research team described their analysis as a “systematic review” in the Review itself (p. 899).
The University of Utah DRRC presentation of the Review’s results at the American Academy of Pediatrics’ 2024 national conference described the analysis as a “systematic review.”
Utah DHHS described the work as a “systematic review” during the interim results’ presentation (DHHS presentation, slides 8–9) and also in its final Recommendation Report (Recommendation Report, p. 5, 6, 10).
A congressional comment letter signed by 106 Members of Congress described the Utah Review as a “systematic review.”
A letter from the American Public Health Association to the U.S. Department of Health and Human Services, signed by 123 public health and health policy deans, chairs, and scholars, described the Utah Review as a “comprehensive systematic review.”

While the Utah Review does not contain any analyses that represent “systematic reviews,” it does present literature reviews. Since literature reviews (also known as "narrative reviews") do not use systematic, reproducible methods to identify, appraise, and synthesize evidence, they are vulnerable to selection and interpretive bias and less reliable for drawing conclusions about treatment effects.

The following sections examine each of the Utah Review’s analyses to assess whether the methods used are rigorous and whether the conclusions are trustworthy.

3. Identification of Pharmacological Agents

Summary: The analysis of hormonal agents used in the treatment of pediatric gender dysphoria provides a helpful compilation of specific drugs used in “gender-affirming” contexts. However, substantial editorializing, which positions off-label pediatric prescribing of puberty blockers and cross-sex hormones as inherently safe, suggests interpretive bias.

The identification of pharmacological agents and their Food and Drug Administration (FDA) licensing status, the first analysis undertaken by the Utah Review, served two purposes. First, the analysis fulfilled the requirements of the Utah Legislature by identifying the hormonal interventions used in the treatment of pediatric gender dysphoria. Second, the names of the identified drugs became the search terms for the literature search that formed the evidentiary basis for the rest of the analyses.

The University of Utah DRRC team used standard databases including Micromedex, UpToDate, and the FDA Orange Book to “identify a comprehensive list of all drug product hormones and hormonally active agents that are administered to pediatric TGNB [transgender and nonbinary] patients in the United States” (p. 6). The researchers identified 66 such drugs and reported that none was FDA-approved for that indication.

However, the analysis went beyond stating these findings as facts, and introduced a narrative supporting off-label use of hormonal agents in gender-dysphoric youth:

Justifying off-label drug use on the basis of weak evidence. In describing the high prevalence of off-label prescribing in pediatric practice (“as high as 38.1% of prescriptions, and as many as 78.9% of children”) (p. 6), the Review implied that the principal obstacle to obtaining an on-label indication for hormonal interventions for minors is the added cost of obtaining regulatory approval. The Review did not mention alternative explanations, such as the possibility that the use of these interventions in pediatric populations may be unable to meet the FDA evidentiary threshold.
Drawing a false equivalence to routine pediatric off-label prescribing. The researchers invoked the common pediatric practice of prescribing antibiotics for infections to suggest that endocrine interventions for gender dysphoria in youth represent a comparable clinical practice. However, they failed to address the key differences between these contexts. Antibiotics are intended for short-term use, have been extensively studied in adults, and are used to treat conditions with a well-characterized disease course, providing a strong evidentiary basis for pediatric extrapolation. By contrast, endocrine interventions for gender dysphoria have not demonstrated clear benefit in adult populations, and the combination of puberty blockers with cross-sex hormones has bypassed even preclinical and early-phase clinical trial evaluation. Moreover, unlike short courses of antibiotic therapy, masculinizing and feminizing hormones are typically intended for lifelong use and will have irreversible lifelong effects even if discontinued.
Framing off-label prescribing as intrinsically safe. In discussing safety considerations, the researchers did not mention multiple studies demonstrating that off-label use in pediatrics is associated with worse health outcomes and a significantly increased risk of adverse reactions and events (risk/odds ratios of 1.67–2.25). Instead, they cited “one study [that] showed no differences in the risk of adverse events” associated with off-label prescribing (p. 6). Further, the fact that some interventions may be offered based on weak evidence does not necessarily justify additional off-label uses; instead, it highlights a broader problem requiring closer scrutiny.

Notwithstanding these limitations, the review of agents provides a useful contribution to an understanding of the clinical realities of treating pediatric gender dysphoria.

4. Evidence Review of Clinical Studies

Summary: The University of Utah DRRC team’s analysis of primary clinical studies did not fulfill the essential requirements of a systematic review. Multiple questionable methodological decisions compounded throughout the analysis, culminating in the omission of a key step—the evidence synthesis assessing the body of evidence for quality/certainty. As such, any conclusions about the evidence that emerged from the Utah Review’s analysis of primary clinical studies cannot be considered reliable.

The analysis of primary clinical studies, frequently cited for including “more than 28,056 pediatric patients” (p. 44), constitutes the bulk of the Evidence Review. On this basis, it has been consistently characterized as a comprehensive systematic evidence review by University of Utah researchers and Utah DHHS (Interim DHHS presentation, slide 8; Recommendation Report, p.5), as well as by some users of the report, including policymakers.

Systematic reviews sit at the top of the evidence pyramid because they examine the totality of available evidence in a systematic and reproducible manner. A credible systematic review addresses a predefined question, applies explicit study selection criteria, conducts a systematic search across multiple sources, synthesizes characteristics of the studies and findings across all studies at the outcome level, and assesses the quality/certainty of the evidence. These features distinguish systematic reviews from literature (narrative) reviews, which are not reproducible, and from scoping reviews, which may be reproducible but do not evaluate treatment effects.

We have applied two well-recognized systematic review assessment tools to assess the quality of the Evidence Review’s analysis of primary clinical studies: ROBIS (Risk Of Bias In Systematic reviews) and AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews).

These analyses yielded ratings of “high risk of bias” and “critically low” confidence respectively—the lowest ratings possible.

ROBIS. ROBIS is a widely used, structured tool for appraising systematic reviews for “risk of bias”—the degree to which the review process (how it was designed, conducted, or interpreted) may have skewed conclusions. According to the ROBIS assessment of the Utah analysis of primary studies, serious methodological concerns were identified across all four ROBIS domains, yielding the overall rating of “high risk of bias”—the lowest possible rating (Table 1, below). For the full ROBIS assessment see Supplement 1.

Table 1. ROBIS Assessment of the Evidence Review of Clinical Studies: Summary

	Study Eligibility Criteria	Identification and Selection of Studies	Data Collection and Study Appraisal	Synthesis and Findings	Overall Risk of Bias
Risk of Bias	High	High	High	High	High

AMSTAR 2. AMSTAR 2 is a methodological tool designed to assess the overall quality of systematic reviews of healthcare interventions. Applying AMSTAR 2 criteria to the Utah analysis of primary clinical studies reveals problems in four “critical” domains, including the absence of a pre-registered protocol or prospectively defined methods (critical domain 2), inadequate risk of bias assessment in individual studies (critical domain 9), failure to incorporate risk of bias assessments into the review’s conclusions (critical domain 13), and failure to evaluate publication bias (critical domain 15). The overall rating of “critically low” confidence places the Utah analysis of primary clinical studies in the lowest possible quality category. For the AMSTAR 2 assessment see Supplement 2.

Taken together, ROBIS and AMSTAR 2 point to the same conclusion: the findings of the Evidence Review’s analysis of primary clinical studies are not trustworthy and the analysis should not be relied upon to provide an accurate and comprehensive summary of the available evidence.

Some of the most significant deficiencies are:

Lack of a prespecified protocol. A clearly articulated research protocol is essential for the transparency and reproducibility of systematic reviews. While the researchers assert that some prespecified “internal protocols” existed for some of the steps they followed (p. 12), the protocol is not presented anywhere in the Evidence Review. Further, the research process underwent significant post-hoc revisions without a methodologically acceptable rationale.
Unfocused eligibility and inclusion criteria. All study types reporting any outcomes related to hormonal treatments for minors with gender dysphoria were considered “eligible” for inclusion in the analysis. This broad strategy yielded 230 studies (p. 34), which is unusually large for this type of analysis. For context, McMaster University systematic evidence reviews of the same topic conducted around the same time, which had a higher maximum age of 25, judged only 10 puberty blocker and 24 cross-sex hormone studies as eligible for inclusion. Because the University of Utah researchers were unable to analyze this large study set, arbitrary post hoc decisions were taken to limit the number to a more manageable size, thereby introducing selection bias.
Exclusion of important outcomes. The researchers reduced the list of 230 eligible studies to a smaller set of 90 "relevant" studies by only analyzing the studies that contained “high priority” outcomes. The definition of “high priority” appears to have been constructed post hoc, and did not include important outcomes, such as fertility, regret, desistance/detransition, sexual function, and mortality. Consequently, any eligible studies focusing on these outcomes were filtered out as "not relevant" and did not inform the analysis or its conclusions.
Failure to analyze over 60% of eligible studies. Only 90 of the 230 eligible studies were subjected to subsequent analysis. The remaining 140 eligible studies (60%) did not undergo data extraction and risk of bias assessments, and were relegated to bibliography-only, which is a major deviation from standard research methods.
The final set of 90 “relevant” studies fails to form a cohesive analytic framework and instead represents a heterogeneous mix of populations, interventions, outcomes, and study designs. Below are examples of studies that made it to the final “relevant” study list:
- A study of insurance coverage in patients treated at a single-site clinic.
- A study of how the brains of testosterone-treated adolescent females perceive familiar versus unfamiliar voices.
- A study comparing the use of different doses of the same pharmacological puberty-suppression agent.
At the same time, studies of rates of discontinuation of hormonal treatment, fertility and sexual function outcomes, regret, and mortality were deemed not “relevant” and had no opportunity to inform the conclusions of this analysis.
Inaccurate study assessments and inflated quality ratings. Even after reducing the number of included studies, the researchers were still left with 90 studies—a large number—to assess for risk of bias. Best practices recommend that at least two researchers should be involved in extracting data from the studies and conducting risk of bias assessments. The University of Utah DRRC team deviated from this best practice and instead relied on a single researcher, which rendered the analysis vulnerable to errors, with an apparent bias toward inflated study quality ratings. The Review’s assessment of two well-known studies illustrates this:
- Chen et al., 2023: The University of Utah DRRC researchers assigned Chen et al., 2023 a near-perfect score of 8/9 on the Newcastle-Ottawa Scale (NOS), despite the study’s well-documented limitations. The researchers treated this study as having “exposed” and “non-exposed” cohorts, stating that the non-exposed cohort was “drawn from the same community as the exposed cohort” (p. 556), suggesting a robust study design. However, Chen et al. explicitly acknowledged that the study lacked a non-exposed comparison group: “… our study lacked a comparison group, which limits our ability to establish causality” (Chen et al., 2023, p. 249).
  The assessment also overlooked that Chen et al. failed to report most prespecified outcomes, had approximately 30% attrition at 24 months that was unexplained (aside from noting that two youths died by suicide during treatment), and did not adequately control for important confounders. An independent assessment of this study assigned it five stars on the NOS, which suggests moderate quality, while the more rigorous ROBINS-I assessment from the same source deemed it at “critical risk of bias”—the lowest rating possible.
- Tordoff et al., 2022: The researchers assessed Tordoff et al. as moderate quality, assigning it an NOS score of 6/9, despite the study’s serious methodological limitations (p. 570). Although the researchers assessed the study as though it had an untreated comparator group, the majority of participants in that group either crossed over to the treatment arm over the 12-month duration of the study, or dropped out from the gender clinic and the study entirely, leaving only seven untreated subjects at the end of follow-up, compared with the original 97 (Tordoff et al., 2022, Supplemental online content, eTable 2). Further, since all patients were eligible for hormonal treatment unless they had severe mental health problems that prevented consent, youth who remained untreated despite continuing to attend the gender clinic may have differed systematically at baseline from those who received treatment, further compromising comparability.
  Moreover, nearly 40% of participants dropped out (39/104) at 12-month follow-up, which itself was too short to meaningfully assess long-term outcomes. Of note, the York systematic review, which evaluated the same study using the same tool (NOS), gave it a much lower score of 3.5, classifying it as a low-quality study.
Missing evidence synthesis assessing quality/certainty of evidence. A formal evidence synthesis, which includes an assessment of the evidence for quality/certainty, is a cornerstone of a trustworthy systematic review. The University of Utah DRRC researchers did not synthesize results at the outcome level across studies and did not assess the body of evidence for certainty using either the GRADE assessment or a narrative synthesis (as was done in the U.K. York systematic reviews).
The researchers explain this omission by stating that they “were not contracted to include a synthesis of the evidence that we found: only to assess risk of bias and provide evidence tables summarizing safety and efficacy findings” (p. 90). It is unclear why the DRRC research team advanced a methodology for a “systematic review” that omits one of the most critical aspects of a systematic review—a formal evidence synthesis. As a result, readers have no basis for determining whether their findings are based on robust comparative data or are derived from low-quality observational evidence and small samples with high attrition.
For example, they assert conclusions such as “rates of depression and suicidal thoughts/self-harm tended to be lower among hormonally treated transgender youth compared to untreated transgender individuals” (p. 77) without specifying 1) which treatments were assessed; 2) how many studies contributed to this conclusion; 3) how many participants were included; 4) which specific outcome measures were assessed; or 5) the magnitude of effect sizes.
Deviations from accepted research processes to arrive at conclusions. The University of Utah DRRC team stated that, “[a]fter having spent many months searching for, reading, and evaluating the available literature, it was impossible for us to avoid drawing some high-level conclusions” (p. 90). The researchers’ conclusion about the use of puberty blockers and cross-sex hormones was that “the consensus of the evidence supports that the treatments are effective” in improving mental health and that the “evidence also supports that the treatments are safe in terms of changes to bone density, cardiovascular risk factors, metabolic changes, and cancer” (p. 90).
However, “consensus of the evidence” is not a recognized term in evidence-based medicine. Acceptable terminology refers to the “body of evidence” or the “totality of evidence” and such conclusions require adherence to systematic review methodology, which assesses the quality/certainty of the evidence.
High number of errors. The analysis contains multiple errors and irregularities. While the text claims that “there were N=134 primary clinical studies reporting findings in TGNB [transgender and nonbinary] populations all over the world” (p. 44), the relevant table and appendix include 230 studies (pp. 489–538), and the same number—230—is referenced in the conclusion (p. 90). At several points, the researchers conflate two different studies by Klaver et al. In Nahata et al. (2017), anxiety counts were misreported (p. 659). In van de Grift et al. (2020), follow-up age and several outcome counts were inaccurately recorded (p. 726–729). Spot-checking suggests that such errors are not isolated. The apparent high prevalence of this type of error is likely due, in part, to the researchers’ deviation from the established practice of having two researchers verify extracted data.
Incoherent presentation of study detail. Over 90% of the 1,051-page document consists of data tables and other material normally considered supplementary. Absent a proper evidence synthesis, much of the information about the evidence assessed must be gathered from these data tables. Unfortunately, this material is challenging to interpret due to poor organization and a high number of redactions of basic study information, purportedly to “protect” the identities of patients and clinical centers (see Figure 3).
It is unclear what the redactions achieved beyond making parts of the report incomprehensible, since the information the authors sought to conceal was basic study information already available in the underlying studies, all of which are in the public domain.

Figure 3. Example of redactions (pp. 49-50).

As our analysis demonstrates, despite being presented as a systematic review, the University of Utah DRRC analysis of primary clinical studies omitted a fundamental requirement—namely, a formal evidence synthesis assessing the evidence for quality/certainty. This reduces the analysis, at best, to a “literature review” rather than a “systematic review.” Even as a “literature review,” the analysis suffers from profound limitations. As such, the conclusions of the Utah Review on primary clinical studies cannot be considered a credible basis for evidence-based decision-making.

5. Overview of Systematic Reviews

Summary: The University of Utah DRRC team's analysis of existing systematic reviews lacks essential features of a rigorous review: a comprehensive search, transparent and defensible selection criteria, structured synthesis of findings across reviews, and an assessment of certainty. As such, this analysis cannot be considered a “systematic review” and cannot serve as the basis for evidence-based decision-making.

Overviews of existing systematic reviews (also known as “umbrella reviews”) typically follow the same methodology as systematic reviews of clinical studies; the key difference is that the unit of analysis is a systematic review rather than a primary clinical study. Otherwise, similar systematic methods are expected at every stage of the research process. To the extent that umbrella reviews are used to draw conclusions about the effects of interventions, they must assess the evidence for certainty for each outcome of interest through formal evidence synthesis.

The University of Utah DRRC research team analyzed seven systematic reviews. As with the systematic review of clinical studies, the most serious deficiency in this analysis is the absence of an evidence synthesis and an assessment of the quality/certainty of the evidence. We assessed the Utah Review’s analysis of systematic reviews using the PRIOR (Preferred Reporting Items for Overviews of Reviews) checklist for umbrella reviews. Several key limitations are outlined below:

Inadequate and outdated search strategy. The researchers overlooked major databases and systematic review repositories—including the Cochrane Database of Systematic Reviews (CDSR), PROSPERO, CINAHL, and Epistemonikos. They also failed to search relevant gray literature, such as systematic reviews posted on public health authorities’ websites, including the U.K. NICE evidence reviews (2020) on puberty blockers and cross-sex hormones.
In addition, the researchers did not update their search after the publication of the seminal Cass Review, which relied on the University of York systematic reviews on puberty blockers and cross-sex hormones. These highly relevant systematic reviews were released nearly a year before the public release of the Utah Review. As a result, the Utah analysis failed to incorporate a body of systematic review evidence that proved foundational to U.K. policy restrictions on these hormonal interventions for minors with gender dysphoria.
Subjective systematic review selection. The research team used a broad definition of “systematic review,” treating any evidence review that describes itself as “systematic” as such, regardless of the methods used. The research team identified 38 systematic reviews, seven of which were designated “high priority” for evaluation. However, the inclusion criteria appear to be inconsistently applied. For example, Mahfouda et al. (2019)—a literature review which did not follow a systematic review methodology and never claimed to be a systematic review—was included. Its inclusion may be due to the fact that it is commonly cited in support of pediatric gender transition, but it nevertheless failed to meet the Evidence Review’s eligibility criteria.
Evidence of bias in reviews’ assessments. The seven systematic reviews identified as relevant were subjected to AMSTAR 2 analysis (p. 43). A review of the quality ratings suggests evidence of researcher bias. For example, the systematic review by Ludvigsson et al., 2023, which originated from the work of Sweden’s SBU and underpinned that country’s 2022 restrictive guidelines, was assessed as a low quality review and an “outlier in terms of several key methods” (p. 43, 63). The University of Utah researchers asserted that because the Swedish researchers excluded high-risk-of-bias studies from the evidence synthesis, they committed a major methodological violation. This is inaccurate. Excluding high risk of bias studies from a formal evidence synthesis is an acceptable approach and is recommended in the Cochrane Handbook.
Furthermore, the discussion section focused almost exclusively on criticizing the Swedish review and questioning its conclusions (p. 42, 63). At the same time, systematic reviews whose conclusions are supportive of the practice of pediatric transitions—including those that received similarly low ratings—were not mentioned in the discussion.
Of note, like all research, the Swedish systematic review has limitations, but it was judged to be among the field’s higher-quality systematic reviews by a recent umbrella review appraisal.
Missing evidence synthesis assessing quality/certainty. As with the analysis of primary clinical studies, the University of Utah’s analysis of systematic reviews lacks a key component of a systematic review—an evidence synthesis which summarizes what is known about each outcome of interest, and how certain we can be in that knowledge.

Given the profound limitations of this analysis, culminating in the omission of a formal evidence synthesis assessing the body of evidence for quality/certainty, the Utah Review’s analysis of systematic reviews cannot be considered a systematic evidence review. Consequently, it cannot be used as the basis for evidence-based decision-making.

6. Long-term Outcomes

Summary: The Utah Review’s analysis of long-term outcomes suffers from serious limitations, as it omitted some of the most robust studies of long-term outcomes for endocrine interventions for gender dysphoria. Moreover, signals of harm that emerged from the long-term analysis, including elevated mortality, were not incorporated into the main conclusions of the Review. As with other analyses the Utah Review comprises, the analysis of long-term outcomes lacks an evidence synthesis assessing for quality/certainty, meaning that it does not meet the standard of a systematic review.

One of the key requirements of the Utah Legislature was that the evidence review assess long-term outcomes of hormonal interventions. When the researchers presented the analysis results in February 2024, it was noted that most included studies followed patients for only 1–2 years after hormone initiation and therefore did not provide meaningful long-term data (p. 899). The University of Utah DRRC research team was asked to revisit long-term outcomes. Subsequently, the researchers issued a new research proposal. This analysis of 17 “long-term” studies became a separate research output comprising Part II of the Evidence Review (p. 898).

With respect to benefits, the researchers noted that some studies reported psychological improvements associated with hormonal treatment, including reductions in anxiety, depression, and social distress, as well as improvements in other measures of mental health and functioning, while other studies found no significant improvements.

With respect to harms, the researchers observed that some studies reported statistically elevated risks of cardiovascular mortality associated with ethinyl estradiol use, increased incidence of certain benign brain tumors, and higher suicide and all-cause mortality rates compared with the general population, whereas other studies did not detect statistically significant increases in the outcomes assessed.

The long-term outcomes analysis was based on the same study search and followed the same methodological approach as the Part I analysis of the primary clinical studies (with the exception of adding one more “high priority” outcome of “mortality”); it is therefore subject to the limitations described in the earlier sections. Below, we highlight the additional irregularities in the research process that uniquely affected the long-term outcomes analysis:

Inconsistent treatment of studies of adults. The researchers did not conduct a separate literature search to support the long-term outcomes analysis, asserting that the prior search was “comprehensive enough” (p. 899). Instead, they merely re-screened previously identified evidence, looking for studies of patients who were at least five years post-treatment initiation. However, since the original search was explicitly governed by pediatric search terms such as “Pediatrics,” “Child,” “Minors,” “Adolescent,” and “Puberty” (see Figure 4), this strategy introduced serious methodological inconsistencies in how the research engaged with studies of hormonally treated adults.

Figure 4. Necessary search criteria for studies of long-term harms.

Because the search was limited to pediatric-specific search terms, adult-only outcome studies were missed. However, whenever adult outcomes happened to be reported alongside pediatric outcomes, they were allowed to enter the analytic set—even when such studies had very few pediatric patients (e.g., de Blok et al., 2021). As a result, the selection of long-term outcome studies was driven more by happenstance than by any cogent methodological principle.

Not only did this introduce incoherence into the evidence base—with some, but not other, adult studies informing the long-term analysis—but this search strategy also excluded some of the most robust evidence of long-term outcomes of hormonal treatments, which come from studies of hormonally treated adults (for example, Bränström and Pachankis (2019), Dhejne et al. (2011), and Jackson et al. (2023)). Notably, the omitted studies consistently report elevated rates of morbidity and mortality among hormonally treated individuals.

Lack of adherence to the stated inclusion criteria. In re-screening the original pediatrics-focused study pool for the long-term outcome analysis, the researchers looked for studies reporting outcomes at least five years post-transition-initiation. Yet some studies that met this criterion were excluded for unclear reasons. For example, the researchers excluded Klaver et al. (2020)—a seven-year follow-up study of adolescents. De Blok et al. (2019) is also missing, despite meeting the stated eligibility criteria. Of note, de Blok et al. reported a higher incidence of breast cancer among males treated with estrogen than in the general male population. These exclusions raise concerns that study selection was not conducted in a systematic and reproducible manner.
Results dominated by a single patient cohort with questionable eligibility. Over 80% of the 10,147 subjects with long-term outcomes in the Utah Review (p. 904) came from a single study of the Amsterdam patient cohort (Wiepjes et al., 2020). That study includes patients at all stages of transition, including some who had not received hormones. It is also unclear whether the study met the “long-term” eligibility threshold of ≥5 years after treatment initiation, because it does not report the average duration of hormone exposure or specify how many participants had been on hormones for at least five years.
Underestimated risk of bias due to unrecognized confounding and co-interventions. Long-term hormone outcome studies are especially vulnerable to confounding: patients receiving cross-sex hormones often differ from the general population in baseline mental health and socioeconomic status and frequently receive co-interventions, such as psychotherapy or surgery. This makes it difficult to isolate the independent effects of hormones. These limitations do not appear to have been adequately recognized in the Utah Review’s risk of bias appraisal of individual studies.
The Utah Review’s assessment of the “Dutch Protocol” study (de Vries et al., 2014), which launched pediatric gender transition worldwide, illustrates these problems. The Review awarded that study 6 out of 9 points on the NOS, thereby characterizing it as of moderate quality with respect to long-term hormonal outcomes. But the de Vries et al. study cannot provide reliable evidence on the long-term effects of hormones because it never reported on the outcomes of cross-sex hormones, and all participants underwent surgery before the final assessment. The NICE systematic review of cross-sex hormones conducted in 2020 recognized this and excluded the study from the analysis, stating, “all participants had surgery after gender-affirming hormones. Unable to determine whether changes were due to hormones or surgery” (NICE, p. 72). This is a critical limitation since the Utah Review was focused on evaluating hormones, not surgery.
Failure to integrate long-term findings into overall conclusions. Even though much of the relevant long-term evidence was overlooked, the 17 analyzed studies did identify signals of long-term harm associated with hormonal interventions, particularly elevated mortality. However, these conclusions from Part II were not incorporated into the main Evidence Review’s conclusions, which continued to state that “the evidence … supports that the treatments are safe in terms of changes to bone density, cardiovascular risk factors, metabolic changes, and cancer” (p. 90).
Absence of assessment of evidence for certainty. As with other analyses, no formal evidence synthesis to assess the evidence for certainty was performed.

Overall, the Utah Review’s long-term outcomes analysis is insufficiently rigorous to support conclusions about the long-term safety or benefits of endocrine interventions. The failure to identify key long-term studies and to integrate signals of harm, such as elevated mortality, into the Utah Review’s main conclusions further undermines its value for evidence-based decision-making.

7. Summary of Clinical Practice Guidelines

Summary: The analysis of guidelines used an inadequate search strategy that identified only five eligible guidelines and omitted the critical step of evaluating the guidelines for quality. Much of the analysis focuses on summarizing the WPATH and Endocrine Society’s recommendations, which the Utah Review treated as inherently evidence-based because they originate from “recognized medical authorities.” Taken together, these features raise serious questions about methodological rigor, consistency, and neutrality in the guideline analysis, rendering the document unfit for evidence-based decision-making.

Synthesizing multiple clinical practice guidelines poses recognized methodological challenges: recommendations are often numerous and qualitative, terminology and scope vary, interventions may not align, and evidence-grading systems often differ, with identical labels reflecting different standards. These complexities require explicit, systematic methods for comparison, synthesis, and transparent reporting (Johnston et al., 2019)—typically through a systematic or scoping review.

The application of the Johnston et al. (2019) framework for assessing the quality of the Utah Review’s synthesis of clinical guidelines indicates that the University of Utah DRRC analysis failed to meet the majority of the criteria, including lack of a prespecified protocol; inadequate search; failure to assess guideline quality; and failure to identify gaps, inconsistencies, and trends across guidelines.

Below, we provide a high-level summary of the key limitations of the guideline analysis.

Unusual and subjective definition of guidelines for inclusion. The Utah Review adopted an unusual criterion for inclusion of guidelines, asserting that any “guidance from a recognized medical authority” was worthy of inclusion as long as a systematic review was conducted “for at least 1 part of the guideline,” even if that aspect of the guideline was not relevant to the topic of adolescent hormonal treatments. For guidance documents where adolescent recommendations did not include a systematic review, it was sufficient to merely “cite and discuss published literature” (p. 17-18).
This criterion has a strong subjective component (i.e., “recognized medical authority”), making the inclusion criteria vulnerable to bias. It is also methodologically difficult to justify, since the relevance of adolescent treatment recommendations does not depend on whether elsewhere in the document another unrelated recommendation is supported by a systematic review.
Notably, using this definition led to the identification of only five guidance documents. By comparison, two recent systematic reviews of guidelines on the same topic identified 12 and 23 clinical guidance documents, respectively. At the same time, these unusual criteria perfectly match the process found in the World Professional Association for Transgender Health (WPATH) and Endocrine Society guidelines. Those guidelines include some systematic reviews of a narrow set of outcomes in adults, but they do not reference any systematic review of the evidence on pediatric hormonal interventions and instead rely on discussing selected studies.
Omission of relevant guidelines. As with their review of systematic reviews, the failure to search gray literature, including the websites of public health authorities, was a serious omission in the search strategy. The Review is notably missing the Swedish and Finnish national guidelines—the only two clinical guidance documents recommended for implementation by a recent systematic review of guidelines from the U.K.
Among the identified guidelines, two are well-known English-language guidelines by WPATH and the Endocrine Society. Three others are lesser known: a position statement on menstrual suppression by the American College of Obstetricians and Gynecologists, a position statement from the European Society for Sexual Medicine, and an outdated 2013 German guideline for children and adolescents with gender dysphoria.
Inconsistent adherence to eligibility criteria. Inexplicably, among the five "qualifying" guidelines, two do not qualify even under the University of Utah’s modified eligibility criteria. Specifically, neither the European Society for Sexual Medicine Position Statement nor the 2013 German "S1" guideline for children and adolescents meets the requirement of having performed a systematic review for any part of the clinical guidance document. Further, if the University of Utah researchers considered such consensus-based documents eligible, then additional clinical guidance documents—such as the influential American Academy of Pediatrics' 2018 Policy Statement and the Australian Standards of Care—issued by "recognized medical authorities" but lacking any systematic reviews— should also have qualified.
Failure to engage with the Cass Review. The 2024 Cass Review—a seminal report providing a foundational analysis of pediatric gender medicine that informed the adoption of more cautious treatment policies for gender dysphoria in the U.K. for children and adolescents—appears to match the criteria for a clinical guideline set out by the University of Utah. It was commissioned by a recognized medical authority, NHS England, and was supported by a series of systematic reviews. The Cass Review was published five months before the Evidence Review was submitted to Utah DHHS and more than a year before its eventual publication. It is unclear why this important work was not included in the Utah analysis.
No guideline quality assessment. Rather than using a validated instrument, such as AGREE II to assess the methodological quality of clinical practice guidelines, the authors relied on the issuing body’s status as a “recognized medical authority,” substituting an appeal to authority for formal appraisal (p. 19). The reputation of the guideline developer is a biased criterion that is unrelated to the quality of the guidelines. There are several examples of guidelines developed by reputable organizations that do not meet expected quality criteria. No established tool for assessing clinical guideline quality treats institutional prestige as a validated proxy for methodological rigor.
Evaluating guideline quality is crucial for interpreting recommendations, as it helps readers understand how trustworthy they are. The AGREE II tool allows evaluation of the entire guideline process, including scope and purpose, stakeholder involvement, methodological rigor of development, clarity of presentation, and applicability of the recommendations. It also assesses editorial independence, including funding and the disclosure and management of financial and nonfinancial competing interests. It is unclear why this assessment was not performed.
Failure to independently analyze levels of evidence (LOE) supporting recommendations. A stated objective of the Evidence Review's analysis of guidelines was to assess the "levels of evidence" (LOEs) that support guideline recommendations (p. 3; First Contract Amendment, Attachment D). The University of Utah researchers did not perform an independent assessment of LOE ratings for the guidelines, instead reporting only what the guidance documents themselves reported. Concerningly, the researchers uncritically repeat WPATH’s misrepresentation of the strength of evidence in Standards of Care 8, recording that “strong” treatment recommendations were supported by (“correlated with”) "high" quality evidence (p. 298). An independent assessment of LOE for WPATH would have revealed that the quality of evidence behind its recommendations is “low” or “very low.” Court documents appear to confirm that in at least one Standards of Care 8 chapter cited in the Evidence Review, WPATH graded recommendations as strong without regard to the level of evidence.

Rather than formally appraising guideline quality, the researchers focused on summarizing recommendations from English-language guidelines, with an emphasis on those issued by WPATH and the Endocrine Society, underscoring the following key points:

Puberty blockers and cross-sex hormones are the standard of care. The researchers present treatment with puberty blockers and cross-sex hormones as the undisputed standard of care for gender-dysphoric youth, emphasizing that “Guidelines providing hormonal therapy recommendations for TGNB [transgender and nonbinary] adolescents are generally in agreement about recommended therapies and treatment approach” (p. 37). In addition to inappropriately framing these interventions as evidence-based, the Utah researchers frame them as “medically necessary” (p. 36).
Children are not eligible to receive hormonal interventions. The summary emphasizes that prepubertal children are not eligible for hormonal interventions for gender dysphoria. This is an accurate but trivial claim, as there is no biological basis for altering sex hormone production in bodies that do not yet produce such hormones in any meaningful quantity. This claim is often used to ease ethical concerns about hormonal treatment in children just starting puberty, who may be as young as 8 or 9—at which point they are referred to as “adolescents.” Notably, the Utah Review adopts the definition of “children” as youth who have not begun puberty, and “adolescent” for those who have begun puberty, regardless of age (p. 4).
Both puberty blockers and cross-sex hormones can be provided to adolescents at the first signs of puberty. The Review emphasizes that according to the WPATH guidelines, not only puberty blockers but also cross-sex hormones can be administered at the earliest pubertal signs (potentially “adolescents” as young as eight or nine years of age).

Despite not having appraised the quality of the guidelines, the Evidence Review asserts that pediatric gender transitions should remain available in part because "high-quality guidelines are available to guide qualified providers in treating pediatric patients who meet diagnostic criteria."

Of note, several peer-reviewed guideline quality appraisals that used standard tools found that all guidelines in the field of pediatric gender medicine, with the exception of the Finnish and Swedish national guidelines, are of poor quality, highly interdependent, and circular. The most recently published appraisal of WPATH's guideline quality found that recommendations relating to adolescents have serious limitations in “scientific and methodological rigor, applicability, and transparency in managing competing interests.” The researchers concluded that “uncritical adoption or endorsement of WPATH’s guidelines may result in a disservice or even harm to this vulnerable population.”

Overall, the Utah Review’s guideline analysis is incomplete and methodologically weak. By omitting key guidance documents, failing to appraise guideline quality, and substituting institutional authority for methodological evaluation, it does not provide a reliable basis for evidence-based decision-making.

8. Desistance and Regret

Summary: The Utah Review’s section entitled “Persistence, desistance, and regrets” omitted key studies focused on these outcomes. Notwithstanding the profound limitations in their study search and selection strategy, and despite the lack of an assessment of the evidence for quality/certainty, the researchers concluded that there is “virtually no regret associated with receiving the treatments, even in the very small percentages of patients who ultimately discontinued them.” The inadequate analysis of the emerging phenomenon of desistance, detransition, and regret is difficult to defend in light of the critical importance of these topics.

The Utah Legislature requested an analysis of “rates of desistance and time to desistance.” The University of Utah researchers broadened the request and presented their findings in a section titled “Objective 4: Persistence, Desistance, and Regrets” (p. 83-89).

The analysis did not directly address the question of desistance and detransition, but it did conclude that “there is virtually no regret associated with receiving the treatments, even in the very small percentages of patients who ultimately discontinued them” (p. 91). However, a close reading of the section reveals that the analysis is inadequate at every stage. We highlight some of the most serious limitations below.

Failure to assess detransition/desistance due to omission of relevant evidence. The researchers made no attempt to analyze studies on desistance, commonly defined as cessation of the desire to undergo medical transition prior to medicalization. Nearly a dozen studies report desistance rates between 61% and 98% among individuals with prepubertal onset of gender dysphoria who are not medically treated during adolescence, with desistance typically occurring before mature adulthood and a homosexual orientation being a common adult outcome.
Furthermore, nearly all the studies related to detransition among recently presenting patient groups had been omitted by the analysis, including survey-based detransition research by Littman (2021); the U.K. detransition and regret analyses using clinic data (Boyd et al., 2022; Hall et al., 2021); and a study of adolescent hormone discontinuation rates from medical and pharmacy records from the US Military Healthcare System reported by Roberts et al. (2022). These studies indicate that the rate of medical detransition among those hormonally treated in adolescence and young adulthood is as high as 10%–30% within several years after initiating transition.
Two major problems in the study search and inclusion strategy led to this oversight. First, the search strategy only looked for studies of adolescents, whereas desistance/detransition and potentially associated regret are more likely to be found in studies of adults, due to a widely recognized “honeymoon” period that can follow the initiation of hormonal interventions. Second, despite DRRC’s contract with Utah DHHS (First Contract Amendment, Attachment D) and the Evidence Review’s own research objectives (p. 3) including an examination of “rates of discontinuation,” the outcomes of “detransition,” “desistance,” and “regret” were excluded from the “high priority” outcome list that governed which studies would undergo analysis. As such, the studies most relevant to the question at hand were either never identified or were identified but relegated to bibliography-only—with no chance to inform conclusions.
Unreasonably narrow definition of “regret.” It is well known that most studies reporting detransition and regret suffer from serious methodological limitations, including short follow-up periods, high dropout rates, and narrow definitions of regret. The University of Utah DRRC team does not acknowledge these limitations.
For example, the researchers’ conclusion of “virtually no regret” is supported by a reference to a large Dutch study whose definition of “regret” likely excludes many individuals who experience regret following treatment (p. 91). Using the study’s definition, well-known detransitioners—such as Chloe Cole and Fox Varian in the U.S., who are suing their providers, or Keira Bell, who was involved in judicial review litigation in the U.K.—would not be classified as having experienced regret. This is because they never underwent the removal of ovaries, whereas under the definition used by the study, individuals would be counted as experiencing regret only if they had undergone gonad removal (ovaries or testes) and subsequently restarted natal-sex hormone supplementation.
Errors in study summary statistics. As in other tables throughout the Evidence Review, there are numerous errors in these data tables. For example, when describing the Wiepjes et al. (2018) study cited above, the researchers report the following incorrect statistics on pubertal suppression and cross-sex hormone therapy: “No cases of regret were observed among the 1,360 individuals who were first seen before the age of 18 years. 1.9% of adolescents who started PS [pubertal suppression] (n=812) stopped PS and did not start HT [hormone therapy]” (p. 89). However, the number of adolescents that commenced pubertal suppression was 333, not 812. Further, regret rates pertained only to those adolescents who were on cross-sex hormones and underwent gonadectomy, a group numbering 309, not 1,360.
Lack of assessment of quality/certainty of evidence. As with all other analyses in the Evidence Review, no formal evidence synthesis was offered that summarized the body of evidence at the outcome level and assessed the findings for certainty. Instead, readers are directed to Table I.26 (pp. 83–89), which lists the (frequently inaccurate) characteristics of 32 studies. This table does not, however, specify which studies informed the conclusion of “virtually no regret,” nor does it caution readers about the narrow definition of regret or the degree to which short follow-up duration and high drop-out rates can bias estimates downward or otherwise distort conclusions.

The inadequate analysis of desistance, detransition, and regret is difficult to defend in light of the critical importance of these topics both to the commissioning body and to the patient population, which has undergone a marked epidemiological shift in recent years—and in light of rapidly growing reports of detransition, including detransitioners who regret the gender transition interventions they received in adolescence. It is even more problematic that the Utah DHHS Recommendation Report, which accompanied the Evidence Review, failed to recognize the profound problems in the analysis and instead bolstered the analysis’s credibility by suggesting that the DRRC conducted a “systematic review” providing “a complete summary” of persistence, desistance, and regret (Recommendation Report, p. 7).

9. Interruption of Normally Timed Puberty

Summary: The Utah Review omitted the Legislature-mandated analysis of the effects of interrupting natural puberty, thereby failing to assess the potential harms to physical, cognitive, and psychosocial development. The researchers’ explicit decision to exclude “infertility” from the list of outcomes of interest on the grounds that harms to fertility were expected is methodologically indefensible and all but ensured that this known harm of the puberty blocker/cross-sex hormone treatment pathway would be excluded from the Review’s conclusions.

The Utah Legislature explicitly requested that the evidence review assess the “short-term and long-term benefits and harms of interrupting the natural puberty and development processes of the child.” The Utah Review left this request unaddressed and provided no explanation.

Below, we highlight several key problems associated with the decision not to conduct this important analysis.

Lack of consideration of evidence from basic sciences and physiology. Puberty is a time-sensitive, hormone-driven developmental process affecting bone, brain, fertility, and metabolism. Disrupting it may produce cumulative or delayed effects that short-term observational studies miss. For this reason, evidence of the potential harms of endocrine disruption to physical health and overall development must be supplemented by knowledge from basic science and physiology relating to endocrine mechanisms and developmental timing. It is concerning that the Utah Review did not consider this analytic component.
Incomprehensible exclusion of “fertility” as an outcome of interest. As explained elsewhere in our analysis, the University of Utah researchers’ explicit decision not to consider “fertility” a “high-priority” outcome on the grounds that hormone-related fertility harms are already a “known risk” (Recommendation Report, p. 4) is methodologically incomprehensible. This decision ensured that one of the best understood and most ethically problematic harms of pediatric medical transition—likely infertility—was unaccounted for in the Evidence Review’s analysis of benefits and harms.
Failure to consider sexual function as an outcome of interest. Early puberty suppression followed by cross-sex hormones may impair sexual function. The lack of attention to this important aspect of human health is concerning.
Lack of consideration of the neurodevelopmental effects of interrupting normal puberty. Puberty drives structural and hormonal brain maturation, including synaptic pruning and myelination. The University of Utah researchers made no attempt to assess what is known about the effects of suppressing this biological developmental process on the brain.

The Evidence Review offers no examination of these or numerous other risks and potential harms of interrupting normal puberty. Nor does it explain why the researchers chose not to undertake this required analysis, which is entirely feasible and was conducted in the HHS Evidence Review. Instead, the University of Utah researchers appear to sidestep this analysis entirely based on the assertion that “conditions like GD [gender dysphoria] … have no other effective interventions” (p. 63). This highly consequential claim is offered without citation or analysis and is unjustifiably presented as an established fact.

10. Interpretive Bias in the Evidence Review

Summary: The discussions, conclusions, and policy recommendations in the Evidence Review go beyond what its methods and evidence can support, with a consistent tendency toward interpretive bias in favor of pediatric gender transitions.

The discussion sections contained in the Evidence Review raise questions about whether considerations beyond the evidence itself may have influenced the researchers’ conclusions, and whether prior assumptions about the contested topic of pediatric gender medicine may have affected what should have been an impartial assessment of the evidence.

Examples of non-neutral framing are present at every stage of the research process, from the research proposal to the concluding sections of the Evidence Review:

Politicized presentation of clinical dilemmas. The Evidence Review does not engage with clinical and ethical dilemmas that have led to reconsideration of pediatric gender transitions—such as major epidemiological shifts in adolescent gender dysphoria, the emergence of visible detransition and regret, and restrictions on these interventions initiated by public health authorities in several European countries. Instead, it presents these issues through a political lens, describing them as debates in “public discourse” rather than clinically uncertain matters with urgent public health implications (p. 1).
Strikingly, in place of a discussion of the epidemiological shift at the core of the clinical uncertainty—namely, the sharp rise in adolescent gender dysphoria with an overrepresentation of females and a high prevalence of pre-existing mental health conditions—the Review invokes a "1st-century BCE" Roman collection of stories “about a transgender figure, Tiresias” as evidence of the ancient and enduring nature of transgender identities (p. 2). The authors do not note that, according to the same myth, after several years of living as a woman, Tiresias resumed living as a man. By their own logic this would make Tiresias a detransitioner—a term that does not appear anywhere in the Evidence Review’s analysis.
Evidence of interpretive bias in all the analyses. Interpretive bias, also known as “spin,” is a well-documented phenomenon in research which “distort[s] the interpretation of results and mislead[s] readers.” For example, in discussing the results of the seminal Dutch study that launched the practice of pediatric gender transition, the Review stated that “over time, treated transgender men showed reduced anger and anxiety, whereas treated transgender women were more stable” (p. 911, emphasis added). This choice of language—“more stable”—obscured the finding that male patients (“transgender women”) did not experience any post-treatment improvement in either mental health outcome.
Every section we reviewed had evidence of spin; including the following examples:
- In the analysis of pharmacological agents used, the researchers went beyond the statement of fact of pharmacological agents used and their FDA license status, to proactively frame pediatric off-label prescribing of puberty blockers and cross-sex hormones during normally-timed puberty as an inherently normative medical practice that requires no scrutiny.
- In the analysis of studies and systematic reviews, the researchers appear to consistently assess more favorably those studies and reviews that conclude hormonal treatments are beneficial than those that identify problems.
- In the analysis of guidelines, the researchers justify their decision not to assess the WPATH and Endocrine Society guidelines for quality by appealing to the authority of “recognized” medical organizations.
Strong but non-evidence-based conclusions of net-benefits of pediatric transitions. The Evidence Review’s conclusions are that:

[T]he consensus of the evidence supports that the treatments are effective in terms of mental health, psychosocial outcomes and the induction of body changes consistent with the affirmed gender in pediatric GD patients. The evidence also supports that the treatments are safe in terms of changes to bone density, cardiovascular risk factors, metabolic changes, and cancer [...] (p. 90).

“Consensus of the evidence” is non-standard terminology and lacks a clear meaning. If the researchers had used standard terminology such as body of evidence or totality of evidence, and adhered to established methods for assessing the quality/certainty of the evidence, they could not have arrived at such a conclusion—which contradicts the findings of nearly two dozen systematic evidence reviews conducted to date.

Leap from analysis to advocacy and policy recommendation. The University of Utah researchers went beyond drawing conclusions about the evidence and issued a policy recommendation stating that restrictions on pediatric gender transitions “cannot be justified.” Making recommendations—including whether certain medical interventions should be offered widely, restricted, made available in research settings, or removed from clinical practice entirely—requires an explicit process that considers the quality/certainty of the evidence alongside other factors, such as the balance of benefits and harms, individual and societal values and preferences, as well as consideration of resources, cost-effectiveness, feasibility, acceptability, and equity. There is no evidence that the Utah DRRC Evidence Review authors engaged in such an analysis.
The strong policy recommendation against restrictions on endocrine interventions for minors with gender dysphoria also sits uneasily with the contract disclaimer that the University of Utah DRRC does not endorse or recommend “the use of any particular drug” and does not assume “any liability for persons administering or receiving drugs or other medical care in reliance upon” the DRRC review. Notably, in the Recommendation Report, Utah DHHS was more circumspect, stating, “DHHS takes no position on whether the Legislature should lift the moratorium.”

Initiating an evidence review with a policy goal in mind is not unprecedented in the contentious field of pediatric gender medicine. The political context should heighten scrutiny of the results to ensure that the researchers maintained their independence, conducted an unbiased analysis, and reached conclusions grounded in the evidence.

Frameworks such as GRADE were developed in part to ensure that conclusions remain proportionate to the certainty of the evidence. By not following GRADE or a comparable framework, the Utah Review is more susceptible to biased analysis and conclusions not grounded in the evidence.

11. Recommendation Report

Summary: The DHHS Recommendation Report, issued alongside the University of Utah’s DRRC Evidence Review, relied on the Review’s conclusions regarding the purported “benefits” of pediatric gender transition interventions. The DHHS recommendations addressed how hormonal interventions for minors should be delivered if the moratorium were lifted, focusing on centralized clinician training and enhancing informed consent procedures. No recommendations were provided regarding how to structure or improve clinical care for youth with gender dysphoria if the moratorium remained in place.

The 16-page Recommendation Report was completed nine months after the University of Utah DRRC team submitted its final version of the Evidence Review to Utah DHHS, with both documents published on May 19, 2025.

S.B. 16 assigned DHHS a statutory task: to issue evidence-based recommendations regarding the potential lifting of the moratorium by basing the recommendations on “systematic evidence reviews,” and describing the “assumptions and value determinations underlying its recommendations.” However, the Recommendation Report sidestepped this request, stating that DHHS takes “no position on whether the Legislature should lift the moratorium" (Recommendation Report, pp. 4, 10).

As a result, the Report does not recommend whether, when, or in which cases hormonal interventions for minors should or should not be provided—which was explicitly requested by the Legislature. Instead, the Recommendation Report addresses only how hormonal interventions should be delivered to minors if the moratorium is lifted, with a focus on enhanced provider training and improved informed consent procedures.

The Report presented four recommendations (reproduced verbatim):

If the Legislature lifts the moratorium, consider creating a hormonal transgender treatment board managed by DHHS, in partnership with the Department of Commerce, that advises on certification, continuing education, minimum standards of care, and consent procedures.
Limit providers who can deliver care to demonstrated experts.
If the Legislature decides to lift the moratorium, it should consider limiting care to a comprehensive interdisciplinary care team model that provides integrated physical and mental health care using evidence-based protocols, evaluates its outcomes, and reports regularly to the Utah Legislature.
Institute an enhanced and explicit informed consent and assent process.

The Recommendation Report suffers from several key limitations:

Misrepresentation of the Evidence Review. The Recommendation Report characterized the Evidence Review as a “full systematic review” (Recommendation Report, p. 5). As explained above, the Evidence Review does not meet the minimum threshold for a systematic review of evidence because it omitted an evidence synthesis that assessed for quality/certainty.
Misattribution of Cass Review analysis. The Recommendation Report’s assertion that “[t]he work conducted by DRRC included an analysis of the Cass Review that was presented to the Health and Human Services Interim Committee” (Recommendation Report, p. 4) is untrue. For clarity, the 1,051-page Evidence Review never mentions the Cass Review. The cited reference in the Recommendation Report refers to two presentations that the DRRC neither authored nor incorporated into the Evidence Review.
Reliance on evidence excluded by the Evidence Review. Although the Recommendation Report was intended to be based on the Evidence Review, it is apparent that it relied on evidence from other sources. Notably, none of the supporting references cited in the Recommendation Report were analyzed in the Evidence Review. This includes, for example, studies on desistance, suicide and self-harm, and mental health —which the Evidence Review either did not identify or identified but did not consider “high priority,” excluding them from subsequent data analysis. This creates uncertainty about the evidence base used by the Report’s authors to formulate their recommendations and introduces further risk of bias.
Inaccurate assumptions. The Recommendation Report relied on inaccurate assumptions, such as equating restrictions on hormonal interventions with “not treating gender dysphoria” and suggesting that restrictions on hormonal treatments for minors would lead to psychological and social harms (Recommendation Report, p. 3) The latter claim is presented without supporting evidence and goes beyond the claims made in the Evidence Review itself.
Inadequate guidance on informed consent content. The Utah Legislature explicitly directed DHHS to recommend what information a minor and the minor’s parent should understand before consenting to hormonal transgender treatment. Although the Report calls for enhanced consent and assent processes and states that risks, benefits, and alternatives should be discussed, it does not specify what parents and minors should be told in practice. Nor does it distinguish between well-established effects and those that remain uncertain, or clearly delineate what the potential benefits and harms are and how individual values and preferences should factor into decision-making.
Lack of process transparency. The process by which the Recommendation Report was developed is not disclosed. While the Report states that several advisors submitted individual reports to Utah DHHS, no explanation was provided for how those individual reports were reconciled and integrated into the final document, how disagreements were resolved, or how consensus among the numerous DHHS advisors was reached. At least one advisor testified that the Recommendation Report was largely written by Dr. Hofmann, the Executive Director of Utah DHHS, with dissenting views from within the advisory group not incorporated, and with only cursory consultation with the Utah Division of Professional Licensing, the Utah Physicians Licensing Board, and the Utah Osteopathic Physician and Surgeon's Licensing Board—entities designated as required stakeholders in the process.

Evidence-based decision-making requires that recommendations be grounded in trustworthy systematic reviews and that other relevant considerations—such as the balance of benefits and harms, and values and preferences—be explicitly addressed. The Recommendation Report departs from these principles in important respects, relying on a non-systematic Evidence Review with significant methodological flaws that undermine its capacity to support sound evidence-based recommendations.

12. Conflict of Interest Management

Summary: In a highly politicized area such as pediatric gender medicine, robust conflict-of-interest management is essential. Yet, despite the Utah Review's disclosures of no relevant conflicts of interest, substantial conflicts of interest are apparent. The lack of transparency regarding whether these conflicts were managed raises concerns about the impartiality and credibility of the overall process.

Conflicts of Interest in pediatric gender medicine

Conflicts of interest (COI) are situations “when a past, current, or expected interest creates a significant risk of inappropriately influencing an individual’s judgment, decision or action when carrying out a specific duty.” COIs are recognized at both the individual and institutional levels and can be financial and non-financial (professional status, intellectual, and/or personal).

In the context of developing policy recommendations concerning restrictions on pediatric gender transition services, deriving income from the provision of such interventions—at either the individual or institutional level—constitutes a financial interest. Similarly, strong personal views on pediatric gender transitions—including political, philosophical, or religious commitments—as well as professional and reputational investments in this area of medical practice constitute non-financial interests.

Having an “interest” alone does not automatically represent a conflict. Conflicts only arise when there is a significant risk of inappropriate judgment, decisions, or actions in analyzing or interpreting the evidence or making recommendations. Since most experts have strong positions within their area of expertise, avoidance of all conflicts of interest is frequently untenable. Instead, interests should be transparently disclosed to allow proper scrutiny of whether those interests constitute a conflict. If an interest is not judged to be a significant risk to one’s decision-making process, it does not represent a COI.

Financial COIs are frequently the easiest to identify since they are directly tied to compensation. Non-financial COIs can be equally or more powerful in influencing one’s judgment but are often harder to identify—in part because they require self-reflection and disclosure of personal information with potential for undesirable personal or professional consequences.

Inadequate conflict-of-interest disclosures in the Utah Review

Both, the Evidence Review prepared by the University of Utah DRRC, and the Recommendation Report prepared by Utah DHHS in consultation with external advisors, provided disclosures of interests. However, in neither case do the disclosures appear adequate, raising questions about the integrity of the process.

There are three distinct sources of undisclosed and unaddressed interests: the University of Utah DRRC team, responsible for the evidence review; the advisors to Utah DHHS, responsible for interpreting the evidence and providing draft recommendations; and the Utah DHHS staff, responsible for formulating final recommendations to the Utah Legislature. Each represents an independent source of potential bias, and their interaction may have compounded these risks.

A. University of Utah DRRC

The University of Utah DRRC’s disclosure of interests denied any interests outside payment for the work performed (p. xiii). However, there is no disclosure that the University of Utah DRRC operates under a broader contractual relationship with Utah DHHS (approximately $2,000,000 over five years), creating potential financial dependence on DHHS at the departmental level (First Contract Amendment, p. 1). Moreover, the University of Utah’s partnership with the Intermountain Primary Children’s Hospital, which housed the Gender Management and Support Clinic (GeMS)—Utah’s primary pediatric gender clinic—suggests both financial and reputational interests that are not captured when disclosures are limited to individual employees.

There is evidence that the DRRC's approach to the research was not neutral from the outset. The first research proposal the DRRC research team submitted suggests a priori alignment with the American Academy of Pediatrics’ position in favor of pediatric transitions. The proposal emphasized the AAP’s endorsement of gender-affirming interventions for minors, including claims that such interventions yield broad mental health benefits and reduce suicide, and characterized Utah legislators’ decision to restrict those interventions as occurring “despite the AAP advocacy” (First Contract Amendment, p. 16). Such framing is inconsistent with the neutral posture expected in a systematic assessment of the evidence.

B. Advisors to Utah DHHS

Independent advisors served a critical role in the process, interpreting the evidence submitted by the Utah DRRC and providing individual reports to Utah DHHS in preparation for the final DHHS Recommendation Report. The Recommendation Report provides interest disclosures in Appendix A. However, these disclosures were narrowly framed around pharmaceutical and industry ties, with all the advisors stating an absence of such interests (Recommendation Report, Appendix A: Advisors and Qualifications).

This form of disclosure overlooks the fact that the majority of the advisors—four of six—had professional, institutional, scholarly, and/or advocacy-related interests directly related to the provision of gender transitions of minors.

Angelo Giardino: The Recommendation Report notes no relevant interests for Dr. Giardino apart from his role as a representative of Intermountain Primary Children’s Hospital with personal expertise in “specialty pediatric care” (Recommendation Report, 15).
However, this disclosure does not adequately explain that Dr. Giardino is the Chief Medical Officer of Intermountain Primary Children’s Hospital, which housed the University of Utah-affiliated gender clinic (GeMS). As the hospital’s (and therefore the clinic’s) clinical leader, Dr. Giardino’s interest in pediatric gender medicine (financial, professional, and reputational) is significant and should have been transparently disclosed and managed.
Nicole (Nikki) Mihalopoulos: The Recommendation Report notes no relevant interests for Dr. Mihalopoulos beyond her role as a representative of the University of Utah.
However, there is no disclosure that Dr. Mihalopoulos provided gender transition services to minors in her role as the director at GeMS. She also publicly opposed legislative restrictions on gender transition for minors in Utah. After S.B. 16 was approved, which instituted a moratorium, Mihalopoulos asserted that it would harm youth, expressing hope for legal challenges to the bill. In October 2024, while the Recommendation Report was being completed, Dr. Mihalopoulos received an award from the University of Utah School of Medicine for her “dedicated humanitarian and advocacy work supporting parents, caregivers, and youth on evidence-based medicine of gender-affirming health care and access.” Dr. Mihalopoulos also co-authored a book on pediatric gender medicine along with several other University of Utah-affiliated GeMS colleagues and is listed by WPATH as both a member and donor.
Of note, following the enactment of H.B. 174 in March 2026, Dr. Mihalopoulos established a private practice serving patients aged ten and older, with publicly available materials indicating that the clinic offers services related to “gender health.”
These positions and interests are directly relevant to her advisory role in interpreting the evidence and formulating recommendations to the Legislature.
Katherine Smith: The Recommendation Report notes no relevant interests for Dr. Katherine Smith, besides her role as a representative of Roseman University of Health Sciences.
A brief review of Dr. Smith’s professional activities indicates that she is listed by WPATH as both a member and donor. She authored two book chapters on “gender-affirming care” and “transgender patients,” and published peer-reviewed papers on the topic of gender transitions between 2014 and 2024. These professional activities and affiliations represent relevant non-financial interests and should have been disclosed.
Brooks Keeshin: The Recommendation Report notes no relevant interests for Dr. Keeshin apart from his role as a representative of the University of Utah.
However, a published abstract from Dr. Keeshin’s presentation at the American Academy of Child and Adolescent Psychiatry suggests his opposition to the bill and his support for using the Utah Review as a means of challenging legislative restrictions on pediatric gender transition. In the abstract, Dr. Keeshin described the Review as stemming from a “coordinated effort” by transgender advocacy groups objecting to the moratorium, with the implication that the Review’s conclusions could help lift it. He also suggested that similar processes of initiating evidence reviews could be adopted by other “restrictive states” to help lift restrictions. These statements raise questions about whether his prior positions regarding the practice under review were adequately disclosed and appropriately managed.

C. Utah DHHS

The relationship between Utah DHHS and the University of Utah raises additional questions about potential conflicts of interest, particularly in light of the fact the Utah DHHS executive medical director of Clinical Services, Dr. Michelle Hofmann, who “led the development of the recommendations provided to the Utah Legislature,” left DHHS in October 2024 to join the University of Utah School of Medicine (Recommendation Report, p. 15), before the publication of the Utah Review in May 2025.

It is unclear whether any additional work on the Utah Review recommendations took place between October 2024 and May 2025, and if so, how the potential conflicts of interest may have been managed, given that, from the contractual standpoint, the University of Utah was the contractor supplying the evidence to DHHS. The University’s interest in its pediatric gender clinic was also a direct conflict of interest.

Further complicating matters, the interim presentation of results in June 2024 identified Michelle Hofmann, Yoon Kim-Butterfield, and Jennifer Strohecker as “DHHS representatives,” distinguishing them from the “advisors.” However, by the time the final Recommendation Report was published in May 2025, all three were listed only as “advisors,” leaving the report without any named DHHS representatives responsible for the final product (Recommendation Report, Appendix A: Advisors and Qualifications).

Thus, although the Utah Review reported no conflicts of interest, significant potential conflicts were present. The failure to disclose them raises serious concerns about whether these conflicts were properly managed. This, in turn, calls into question the integrity of the Utah Review’s interpretation of the evidence, its recommendations, and its overall credibility.

13. Conclusions

Amid the politically charged discourse surrounding gender transition practices for minors, evidence reviews are increasingly commissioned within regulatory and political contexts. In such settings, heightened scrutiny is required to ensure that these reviews adhere to rigorous methodological standards and are governed by robust conflict-of-interest safeguards. The Utah Review failed to meet these standards—raising serious concerns about its analysis, conclusions, and recommendations.

The Utah Review has been increasingly presented as an objective, independent systematic evidence review, and as one of the most robust to date. This claim is frequently substantiated by the Review’s nearly two-year analytic process that culminated in a 1,000+ page document. However, this framing reflects a misunderstanding of the Utah Review’s process and content.

The Utah Review is not objective nor independent. The Utah Review was commissioned in response to advocacy by transgender rights groups, ostensibly as a means of overturning the moratorium on minor gender transitions in Utah. Serious potential conflicts of interest among those responsible for various components of the Review are evident at all stages, from the first research proposal through finalized review and the accompanying policy recommendations. Notably, the majority of those appointed as independent advisors to the Utah Review had undisclosed interests related to pediatric gender medicine, and at least two had direct connections to the state’s primary pediatric gender clinic.
The Review does not contain any systematic reviews. None of the numerous analyses within the Utah Evidence Review adhere to acceptable methodologies for conducting systematic reviews, as they omit a key step—namely, synthesizing the evidence and assessing it for quality/certainty. Therefore, the Utah Review cannot be used to support evidence-based decision-making.
The main analysis was completed in under four months and ignored 60% of eligible studies. Despite being represented as a two-year review, the main analysis was completed in under four months due to contractual limitations. To accommodate this tight timeline, the researchers excluded over 60% of the studies that their own analysis identified as “eligible.” The Review extended to a two-year timeframe in part because the presentation of the results revealed the absence of a key analysis of long-term outcomes. This analysis was added later; however, its findings—indicating increased long-term mortality—were not incorporated into the Review’s main conclusions.
The bulk of the 1,051 pages of the Evidence Review consists of supplementary tables. While the length of any analysis should not be used as an indicator of its quality, in the case of the Utah Evidence Review, more than 90% of the content consists of supplementary tables. In addition, significant redactions of basic study information in these tables, purportedly “to protect the identities” of patients, render much of the report effectively unusable.
The Review’s conclusions overlooked highly relevant evidence and ignored signals of harm. The Review overlooked or excluded foundational systematic reviews from the U.K., national clinical guidelines from Sweden and Finland, the Cass Review, studies on the recently observed increase in detransition, and key long-term studies reporting elevated rates of morbidity and mortality. Notably, the conclusion of “no harms” of pediatric gender transition was facilitated by the explicit decision not to analyze studies in domains where harms would be most likely to be identified, such as studies focused on the outcomes of fertility, desistance/detransition, and regret—among others.

The Utah Review reflects a concerning pattern in pediatric gender medicine: the continued production of documents that project methodological rigor and scientific credibility while remaining, in important respects, inadequate, and at times actively misleading. This pattern is especially problematic because the broader evidentiary landscape is no longer materially unsettled.

The field of pediatric gender medicine must accept the findings of nearly two dozen systematic evidence reviews indicating that the evidence of benefit of pediatric gender transitions remains highly uncertain, while the evidence of harm is more certain, particularly when biologically likely harms such as impaired reproductive and sexual function are treated as patient-important outcomes. Further reviews of the same limited and methodologically weak evidence base are unlikely to yield materially different conclusions.

Societal resources should now be directed toward aligning policy with the evidence, while at the same time working to resolve the clinical controversy over the best approach to treating pediatric gender dysphoria. This includes:

Studying the etiology and epidemiology of gender dysphoria in youth to better understand contributing factors, varying presentations, and the natural history of the condition.
Rigorously analyzing existing data on patients treated during the past two decades.
Producing higher-quality primary evidence on specific interventions, including psychotherapy, where it is ethical and feasible.
Strengthening outcome surveillance and long-term follow-up.
Confronting current disagreements regarding the proper role of adolescent patient preferences and the principle of autonomy when providing life-altering interventions to otherwise healthy children struggling psychologically with their developing, sexed bodies.

Whether the evidentiary reality of certain harms but only uncertain benefits indicates that pediatric transitions should be reserved for exceptional circumstances only, restricted to properly designed clinical research trials, or removed from routine clinical practice altogether is a valid subject for debate. What cannot continue, however, is the ongoing expenditure of societal resources on weak studies, flawed evidence reviews, and “consensus” guidelines that seek to defend or promote the practice of pediatric gender transitions, while failing to adhere to the principles of evidence-based medicine.

Youth with gender dysphoria deserve access to high-quality evidence-based care—not to well-meaning but ultimately misguided efforts to shield this area of medicine from scrutiny, self-correction, and—where necessary—regulatory oversight.

Attachments

Supplement 1. Utah Review. AMSTAR 2 Assessment .pdf

Supplement 2. Utah Review. ROBIS Assessment.pdf

Supplement 3. Utah Review. First contract amendment.pdf

Supplement 4. Utah Review. Second contract amendment.pdf

Supplement 5. Utah Review. Hofmann June 19, 2024. Progress Report Presentation.pdf