The German Guidelines for Diagnosis and Treatment of Gender Incongruence and Gender Dysphoria of Childhood and Adolescence

The German guidelines' marked divergence from the Cass recommendations is explained by their failure to systematically appraise the evidence
Spotlight Home

Note 1: This analysis has been corrected. Please see correction notice at the end of this spotlight.
Note 2: The final version of the Guidelines discussed in this Spotlight have been released. They are discussed here.

In March 2024, the Association of Scientific Medical Societies in Germany (AWMF) published the final draft of the guidelines "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment.” The AWMF is an important "pillar" in the German healthcare system, as it is the umbrella organization that organizes guideline updates and certifies treatment guidelines. The guideline development process was formally led by the German Society for Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy (DGKJP), with 26 other medical organizations from Germany, Switzerland and Austria participating. The draft is scheduled to be voted on by the Boards for the 27 societies, and, if accepted, will be published in June 2024 as the final guideline.

Germany's draft recommendations immediately drew international attention for their marked departure from England's Cass Report recommendations. The divergence between the two is remarkable. The Cass Report recommended withdrawing puberty blockers from commissioned treatments for youth gender dysphoria, advised extreme caution regarding cross-sex hormone use, did not consider surgery as a possible option for minors, and asserted that most gender dysphoric youth should be treated psychotherapeutically. In contrast, Germany's draft recommendations relaxed the prior age and eligibility requirements for minors wishing to access body-modifying endocrine and surgical interventions, and asserted that the requirement that minors undergo psychotherapy prior to accessing body-modifying procedures is "not ethically justified for reasons of respect for the dignity and self-determination of the person.

Having analyzed the two sets of recommendations and the processes used to create them, it is apparent that their divergence can be largely explained by the difference in the assumptions about the role of evidence in the process of making the recommendations. The Cass Report started with the assumption that the best treatment approach for gender dysphoric youth is unknown and commissioned 8 systematic reviews of evidence to develop its recommendations. In contrast, the German guideline update started with the assumption that the reclassification of the ICD diagnosis of "gender incongruence" from a mental to a physical health condition (which itself reflected a "societal paradigm shift") demands that body-modifying procedures are available to all those who desire themincluding minors. The intention to align treatment recommendations with the "societal paradigm shift" is stated in the guideline registration with AWMF in 2020, and is apparent in the approach that the guideline development team took toward the evidence, which appears to have served as a mere backdrop to an a priori decision to liberalize access to medical intervention for minors.

Originally, the updated German guidelines for treating gender dysphoric children and adolescents were supposed to carry the classification of "S3," which signifies the highest level, evidence-based guidelines. However, the guideline development team abandoned the systematic evidence search after 2019, stating it was no longer "feasible with the Commission's resources." The decision to stop systematically searching for the evidence during the last four years (2020–2023) resulted in a failure to systematically appraise 50% or more of the relevant evidence, depending on the topic (as the recent UK York systematic reviews commissioned by the Cass review demonstrated, more than 50% of the relevant studies were published after 2019). After the final draft was completed in early 2024, AWMF downgraded it from the originally intended highest-level "S3" evidence-based guidelines to their current lower status of S2K "consensus guidelines." 

Our analysis concurs with the conclusion that the current draft of the guidelines cannot be graded as "S3" due to its failure to systematically assess much of the relevant evidence, and due to many other deviations from the evidence-based process as outlined by the AWMF-published German Instrument for Methodological Guideline Appraisal (DELBI). However, our methodological assessment suggests that even the lower S2K standard may not have been met. This additional concern deserves consideration, as any guidelines that are considered for implementation must be trustworthy. As the German Instrument for Methodological Guideline Appraisal (DELBI) states, "the primary aim of clinical practice guidelines is to enhance good clinical practice" by assessing "comprehensive knowledge (scientific evidence and clinical experience) about problems of care, to reconcile opposite views and to define current optimal practice by trading off benefits and harms." It does not appear that the basic requirement for trustworthy quality guidelines has been met.

Below, we present a brief summary of the content of the German guideline recommendations and list the key methodological concerns (both sections contain detailed tables that can be expanded). We then discuss the evolution of the German approach to the care of gender-dysphoric/gender-incongruent children and adolescents. We conclude with the SEGM take-aways.

1. Guideline Recommendations

The "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment” guideline contains 72 recommendation statements for 8 topics. All 72 are consensus-based, rather than evidence-based. The guideline methods "Leitlinienreport" report states that evidence-based guidelines were not possible because the quality of the evidence itself was weak:

 

"After the discussions on the literature-based evidence situation, it was clear to the steering group that there would be no evidence-based recommendations on individual interventions in the treatment of gender incongruence or gender dysphoria in this field due to a lack of controlled evidence of effectiveness and an overall weak evidence situation with regard to uncontrolled evidence of effectiveness from case-cohort studies." 

 

However, the claim that evidence-based recommendations are not possible when the evidence is of low quality is inaccurate. Evidence-based recommendations are always possible regardless of the state of the evidence. The key requirement for a high-quality evidence-based guideline is not that the guidelines are based on high-quality evidence but that they are based on the best available evidence. This is achieved by conducting a systematic search, appraising the evidence for quality, and by basing the recommendations on the evidence while assigning an appropriate level of "strength" to each specific recommendation. This methodologically rigorous process was not followed in the case of the update to the German child and adolescent gender dysphoria guidelines.

Below is a brief summary of the recommendations contained in the draft guidelines:

  • Comprehensive mental health assessment. The stated goal of the assessment is to establish the diagnosis of "stable/persistent gender incongruence or gender dysphoria" of adolescence (either ICD-11 or DSM-5) with "sufficient diagnostic clarity." How this should be achieved is not explained other than stating that mental health professionals should "work out an individual assessment in a joint discussion with the affected person and their guardians based on the overall picture of the existing psychological findings, the descriptions and reflections of the affected person and their life story." The guidelines acknowledge that "there are no empirically validated individual criteria for determining the long-term stability/persistence of gender incongruence or gender dysphoria.
  • Psychotherapeutic treatment. The guidelines state that "gender dysphoric children and adolescents presenting to health care facilities are more likely to have clinically relevant psychopathological abnormalities that go beyond reported gender dysphoric distress." If such conditions interfere with the clarity of diagnosis or success of treatment, treatment should be offered, but not required: "an obligation to undergo psychotherapy as a condition for access to medical treatment is not ethically justified for reasons of respect for the dignity and self-determination of the person." If an adolescent wishes to undergo medical or surgical gender transition, but the parents disagree, family therapy is recommended. Should an impasse be reached, the role of the therapist may include helping the young person emancipate from the parents to facilitate "shaping their life in accordance with their gender identity." 
  • Social gender transition (SGT). SGT (including social transition of prepubertal children) is presented as a likely beneficial intervention with non-existent risks. The guidelines recommend that the decision whether to undergo SGT should be led by the child and guided by the child's right to self-determination. The involvement of mental health experts is only necessary to help the family adjust to the child's decision and to help the child and family manage any potential adverse societal reactions. Parents are advised to remain open to the possibility that a child's identity may continue to change. Social transition is not necessary prior to undergoing treatment with PBs or surgery, however it is recommended (but not required) before starting cross-sex hormones.
  • Puberty blockers (PB). PBs can be provided upon reaching Tanner stage 2 (earliest stage of puberty) and obtaining the diagnosis of "gender incongruence" (GI) or "gender dysphoria" (GD) which, according to the guidelines, signifies a "very probable persistent gender incongruence." The guidelines do not specify how to achieve this and acknowledge a lack of "empirically validated individual criteria for determining the long-term stability/persistence of gender incongruence or gender dysphoria." There is no requirement of a pre-pubertal onset of GD/GI and no time criteria for the minimum duration of "concomitant body-related gender dysphoric distress." Unlike the "distress" criterion in the DMS-5 diagnosis of GD, the distress in the context of the German guidelines can take the form of "an anticipatory fear" of the anticipated pubertal body changes. While a comprehensive mental health assessment is listed as an "absolute requirement," PBs can be initiated "provisionally" ahead of the assessment if the pubertal changes are causing an urgency to start the PB treatment. Overall, PBs are viewed as a fully reversible and likely beneficial intervention with likely minimal risks, which can be further mitigated by limiting the duration of puberty blockade. Fertility counseling appears to be recommended. The treatment is available to both teens with a cross-sex identification, as well as those who identify as non-binary.
  • Cross-sex hormones (CSH). The GI diagnosis with "concomitant body-related gender dysphoric distress" is required, with the additional criterion that the sense of incongruence should have lasted for a "sufficiently long period of time." The guideline team made a "deliberate decision against specifying a formal time frame" since the "inner coming-out" happens "before the gender incongruence is apparent to others," making the time frame impossible to validate. A comprehensive mental health assessment is required, but treatment of co-occurring mental health conditions is merely "offered" but not required. Overall, CSH are viewed as a beneficial intervention, but their partial irreversibility and the potential harms, especially harm to fertility, are acknowledged. Fertility counseling is recommended. The treatment is available to both teens with a cross-sex identification, as well as those who identify as non-binary.
  • Surgery. Mastectomy or breast reduction requires the diagnosis of GI and "concomitant body-related gender dysphoric distress" combined with a "clear desire for a change in the organ or characteristic to be operated on." Prior treatment with testosterone is not required, but if it has occurred, then a 6-month waiting period before breast surgery is recommended to assure that both the identity and post-testosterone breast size stabilize. Surgery is recognized as an irreversible step with profound consequences. A comprehensive mental health assessment is required, but attempts at psychotherapeutic treatment are not required as a prerequisite for surgery. The latter recommendation is recognized to be in conflict with the "current assessment guidelines of the Medical Service of the National Association of Statutory Health Insurance Funds (2020), according to which, before approval of the assumption of costs for gender reassignment surgery... additional proof must be provided that the GD could not have been treated sufficiently effectively with psychotherapeutic means alone." The treatment is available to both teens with a cross-sex identification, as well as those who identify as non-binary.

    Genital surgery (i.e. orchiectomy, ovariectomy or hysterectomy) is not allowed in minors due to German laws that prohibit medical sterilization in minors.

Tables A1-A3 below present a more detailed analysis of the content of the recommendations comparing the German recommendations and rationale to those in the Cass Report. The Cass Report cautioned against social transition in pre-pubertal children and recommended caution in older adolescents; disallowed the use of puberty blockers as a treatment for gender dysphoria; suggested "extreme caution" in the use of cross-sex hormones for both adolescents and young adults; and recommended psychosocial interventions as the first and likely only line of treatment for most youth (surgeries for minors, including mastectomies, have never been allowed in England). NHS England accepted the Cass Report's recommendations for implementation.

Table A1Social Gender Transition (SGT) Show table… Hide table…
German 2024 GuidelineCass Report, 2024
Recommendation
  • Youth Social Gender Transition (SGT) is a benign act of self-expression that should be child-led.
  • The decision to undergo SGT should be made based on the child’s wish for self-expression. The parents should remain open to future identity changes. Mental health providers can help the family accept the child’s decision and manage adverse societal reactions.
  • Youth Social Gender Transition (SGT) is an active intervention with potential for benefit and harm.
  • Complete SGT is generally discouraged, especially for prepubertal children. If undertaken, it should involve the parents, and be done in consultation with a mental health provider who can advise on the benefits, risks and long-term consequences.
Rationale
The guideline acknowledged the poor quality of evidence, but judged the evidence for benefits to be more compelling than evidence for harms:The review noted the poor quality of evidence and inconsistent results, and urged caution in interpreting individual study results:
  • Benefits: “There is evidence that an affirmatively supported role change can have a positive effect on social integration and the child's self-confidence through the development of the child's personality in the course of prepubertal development… There is evidence that an affirmatively supported role change before the onset of puberty can have a positive effect on socio-emotional development.”
  • Benefits: “For example, two studies suggest there may be some benefit associated with use of chosen name in adolescence. However, in another study lifetime suicide attempt and past-year suicidal ideation was higher among those socially transitioning as adolescents compared with those socially transitioning in adulthood.”
  • Harms: “There was no evidence of increased psychosexual confusion, identity insecurity or otherwise conspicuous gender-related cognitions in a group of children with GI who had undergone a social role change with the support of their parents.”
  • Harms: “In this review, two studies suggest that children who socially transition are more likely to continue to experience gender dysphoria/incongruence in adolescence, though one study found differences by birth-registered sex. One of these studies also reported that the majority of those who socially transitioned progressed to medical interventions.”
  • Harm-benefit ratio: Based on the presumption of benefits of social gender transition and the absence of harm, the child’s right to self-determination was used as the guiding principle in making the recommendation in favor of SGT.
  • Harm-benefit ratio: Potential harms of changing “sex of rearing” on long-term development, and heightened risk of future medicalization are weighed heavier than the potential benefits.
Supporting Evidence
  • The guideline methods “Leitlinienreport” referenced a systematic evidence search. The systematic search was stopped in September 2017.
  • The analysis, including non-systematic search after 2017, identified and discussed 12 studies, but did not appear to conduct any structured appraisal of the studies for risk of bias.
  • Commissioned a systematic evidence review for SGT. The systematic search was stopped in April 2022.
  • The systematic review assessed 3,181 studies for eligibility, yielding 11 eligible studies, which were subsequently assessed for risk of bias using the modified Newcastle-Ottawa Scale.
SEGM fact-check / Notes
  • Over 50% of the studies in the UK SGT systematic review — 6 out of 11 — were not included in the German SGT analysis. Five of the six omitted studies were published after the German team suspended its systematic evidence search for SGT in 2017.
  • The non-systematic nature of the literature search and appraisal after September 2017 led to an apparent bias. For example, while the review included Olson et al. 2022, accepting the study's conclusion that social transition is beneficial, it overlooked the Sievert et al. 2021 study, which came to a different conclusion. This omission is all the more surprising since the data came from a German clinical sample of children with gender dysphoria; the topic was directly related to SGT and the article was descriptively titled “Not social transition status, but peer relations and family functioning predict psychological functioning in a German clinical sample of children with Gender Dysphoria”; and the review team discussed the study's other findings unrelated to SGT in other sections of the guidelines.
  • The German review displays a preference toward a positive interpretation of uncertain outcomes. For example, it interprets the Olson et al. 2022 finding that over 90% of youth who underwent early SGT continued to identify as transgender five years later, with the majority initiating medical transition, as a sign of long-term stability of trans identities that emerge before puberty. The alternative interpretation — that early social gender transition may create a potential “lock-in” effect of transgender identity — is not considered.
  • These issues would be mitigated by a systematic evidence search, synthesis of evidence, and appraisal of quality of evidence in regard to each recommendation question. The lack of transparent evidence synthesis and appraisal is concerning.

 

Table A2Puberty Blockers (PB) Show table… Hide table…
German 2024 GuidelineCass Review, 2024
Recommendations
  • PBs for gender dysphoria (GD) are indicated upon commencing puberty and obtaining ICD-11 diagnosis of “gender incongruence” or DSM-5 diagnosis of “gender dysphoria” of adolescence/adulthood. Youth with a cross-sex identification as well as non-binary are similarly eligible for puberty blockade.
  • No requirement of distress beyond “anticipatory fear” of developing secondary sexual characteristics. Explicitly states that the DSM-5 “gender dysphoria” diagnosis is no longer required.
  • No requirement of childhood-onset of gender incongruence: While “lasting” incongruence is mentioned, it is not defined beyond “weeks to months” of assessments prior to starting PBs.
  • Psychiatric assessment is recommended but PBs can be provisionally prescribed without assessment. The high rate of co-occurring mental illness that can complicate the diagnosis, and the difficulty in determining future persistence of trans identity, are cited as main reasons to have “weeks to months” of assessments prior to starting PBs. However, endocrinologists can prescribe PBs provisionally, ahead of the psychiatric assessment, if the pubertal changes are creating an urgency.
  • Ongoing psychotherapy not required if no mental health problems are apparent. Gender incongruence is seen as a healthy identity variation which, in and of itself, warrants no additional exploration.
  • Parental disagreement may lead to involvement of child protective services. If the child desires PB but parents do not agree, and if counseling cannot reconcile the disagreement, child protective services and court systems may be called on to protect “best interests of the child.”
  • PBs for gender dysphoria (GD) are no longer allowed in medical practice. Clinical research trials may be approved at a later point for narrow indications.
  • If distress is present, standard evidence-based treatments should be used: “standard evidence based psychological and psychopharmacological treatment approaches should be used to support the management of the associated distress and cooccurring conditions. This should include support for parents/carers and siblings as appropriate.”
Rationale
  • Benefits: UK NICE systematic reviews indicate “favorable outcome of the measured parameters” for psychological benefits of pubertal suppression:

“The systematic reviews of the British NICE (National Institute for Health and Care Excellence (NICE), …on the state of the evidence on puberty-blocking .. in adolescence state that the studies available at the time of the reviews point overall in the direction of a favorable outcome after medical interventions for the mental health of adolescents with gender dysphoria.”

SEGM fact-check note: the actual conclusion of the referenced NICE review, and the subsequent Cass Report recommendation to decommission the use of PBs, contradicts this assertion — see SEGM fact-check 1 at the end of the table.

  • Also discusses individual studies that suggest positive outcomes. Specifically, the guidelines describe the positive findings of the original Dutch study but fail to acknowledge the highly relevant UK study from the largest pediatric gender clinic in the world, Carmichael et al., 2021, which failed to replicate the Dutch study’s positive results — see SEGM fact-check 6 at the end of the table.
  • Benefits: There is no trustworthy evidence of psychological benefits from systematic evidence reviews from either NICE or the updated York systematic evidence reviews:
  • The 2020 NICE systematic review “found no evidence that puberty blockers improve body image or dysphoria, and very limited evidence for positive mental health outcomes, which without a control group could be due to placebo effect or concomitant psychological support.”
  • The 2024 York systematic review found a “lack of high-quality research assessing puberty suppression in adolescents experiencing gender dysphoria/incongruence. No conclusions can be drawn about the impact on gender dysphoria, mental and psychosocial health or cognitive development.”
  • Harms:
  • There are no concerns about psychological harms:
  • Does not consider the risk of permanently altering gender identity and sexual development credible. Notes high rate of initiation of cross-sex hormones following puberty blockade but attributes this to excellent diagnostic/prognostic ability of the clinicians, rather than to puberty blocker potential to permanently alter gender and sexual identity.
  • Does not consider negative impact of neurocognitive development significant. Recognizes research that points to possible problems in cognitive development, but points to research that suggests high levels of educational attainment post-transition.
  • Harms:
  • There are serious concerns about psychological harms:
  • Raises the possibility of permanently altering the trajectory of development of sexuality and gender identity. “Blocking natal sex hormone production means that young people have to understand their identity and sexuality based only on their discomfort about puberty and a sense of their gender identity developed at an early stage of the pubertal process.”
  • Considers negative impact on neurocognitive development. “Adolescent sex hormone surges may trigger the opening of a critical period for experience dependent rewiring of neural circuits underlying executive function… Brain maturation may be temporarily or permanently disrupted by the use of puberty blockers, which could have a significant impact on the young person’s ability to make complex risk-laden decisions, as well as having possible longer term neuropsychological consequences.”
  • There are no serious concerns about physical harms:
  • Recognizes potential physical harms of PBs, for example bone development challenges, insufficient penile tissue for future gender-affirming surgery, sexual side-effects, negative fertility-preservation implications, BMI increase, and menopausal symptoms, but does not consider any significant effects besides to bone health.
  • Specific to bone density concerns, recommends limiting the duration of puberty blockade.
  • There are serious concerns about physical harms:
  • The York review notes that “bone health and height may be compromised during treatment.”
  • Discusses the risk of other adverse effects, including metabolic health and weight, insufficient penile tissue, etc.
  • There are no accepted alternatives: “In particular, psychotherapy alone cannot be regarded as a suitable treatment to effectively reduce or avert gender dysphoria in cases of diagnosed persistent gender incongruence.”
  • Urges the development of evidence base for alternative treatment approaches: “An explicit clinical pathway must be developed for non-medical interventions, as well as a research strategy for evaluating their effectiveness.”
  • Harm-benefit ratio: Because most teens place a higher value on preventing “irreversible progression of the development of secondary sexual characteristics” over avoiding uncertain long-term harms, puberty blockade use is justified — see SEGM fact-check 5 at the end of the table.
  • Harm-benefit ratio: Because of the uncertain benefits and because the harms could be significant, PBs can only be used in clinical research settings with “very narrow indication,” i.e., for natal males as an explicit start of a transition pathway in order to stop irreversible pubertal changes, and subject to standard ethics approvals.
Supporting Evidence
  • Systematic evidence reviews:
  • The guideline methods “Leitlinienreport” stated that a systematic evidence search was conducted which included search terms specific to the topic of PBs. However, the systematic search was stopped in August 2017, with non-systematic surveillance up until 2023. The body of evidence that was appraised has not been presented.
  • The guideline also referenced the UK systematic review of puberty blockers. However, it inaccurately represented its key finding — see SEGM fact-check 1 at the end of the table.
  • In practice, most of the recommendations were supported by primary studies, but the studies were not formally appraised for risk of bias — see SEGM fact-check 6 at the end of the table.
  • Consideration of other guidelines and recommendations:
  • The review relied on WPATH Standard of Care 8 (SOC8) after 2017, asserting that SOC8 meets the German S3-level evidence-based guidelines, the highest possible level. This is not accurate — see SEGM fact-check 2 at the end of the table.
  • The review also references the Endocrine Society 2017 (ES2017) guidelines, asserting that ES2017 meet the German S3-level evidence-based guidelines, the highest possible level. This is questionable — see SEGM fact-check 3 at the end of the table.
  • The guideline also references the Cass Review and implies that it would concur with the German recommendation. This is inaccurate — see SEGM fact-check 4 at the end of the table.
  • Consideration of other guidelines and recommendations:
  • Analyzed 23 guidelines and clinical recommendations in two systematic reviews. Conducted two systematic reviews. Concluded, “Two international guidelines (World Professional Association for Transgender Health and Endocrine Society) formed the basis for most other guidance, influencing their development and recommendations.”
  • For WPATH Standards of Care 8 (SOC8) and the Endocrine Society 2017 (ES2017) concluded, “Most clinical guidance lacks an evidence-based approach and provides limited information about how recommendations were developed. The WPATH and Endocrine Society international guidelines, which like other guidance lack developmental rigour and transparency have, until recently, dominated the development of other guidelines. Healthcare professionals should consider the lack of quality and independence of available guidance when utilising this for practice.”
SEGM fact-check / Notes
  1. NICE Systematic Evidence Review. The German guideline does not accurately describe the conclusion of the referenced NICE review. Rather than concluding that there are psychological benefits of puberty blockade, the review concluded just the opposite — “little change” — noting that the small reported changes are likely the result of poor study designs:

    “The results… suggest little change with GnRH analogues [PBs] from baseline to follow-up Studies that found differences in outcomes could represent changes that are either of questionable clinical value, or the studies themselves are not reliable and changes could be due to confounding, bias or chance.” “Evidence review: Gonadotrophin releasing hormone analogues for children and adolescents with gender dysphoria”, 2020, p. 13.

  2. WPATH SOC8 Guidelines. Contrary to the assertion in the German guideline, WPATH SOC8 is not an evidence-based guideline for adolescents. The Adolescent section of SOC8 explicitly states that it was based on a narrative, rather than systematic, review. A recent systematic review commissioned by the Cass Review assessed SOC8 using AGREE II, a tool widely used by AWMF, which oversaw the guideline, and concluded that these guidelines “lack developmental rigor and transparency.” The guideline attained 35 out of 100 possible points on the “methodological rigor” domain, and 24 out of 100 possible points in the “applicability” domain. The German process also appraised SOC8 using AGREE II and gave it somewhat higher ratings, 55/100 for methodology and 28/100 for applicability.

    If properly appraised, WPATH SOC8 is unlikely to meet S3 guideline, highest possible level, due to failure to meet several of the current S3 requirements, e.g., DOLBI item 8 “systematic methods were used in the search for evidence”; item 9 “the criteria for the selection for the evidence are clearly described”; item 12 “the link between the recommendations and the underlying evidence is presented”; and a number of other criteria. A recent BMJ article concurred that SOC8 cannot be considered an evidence-based guideline.

  3. The Endocrine Society 2017 guidelines (ES2017). Contrary to the assertion in the German guideline, the Endocrine Society 2017 guideline cannot be considered an evidence-based guideline for puberty blockade because it did not use a systematic review of evidence for pubertal suppression. S3 guidelines require that systematic methods were used to search for evidence, AGREE criterion 8.
    • Neither of the two ES2017 commissioned systematic reviews of evidence focused on adolescents or pubertal suppression.
    • The first systematic review, Maraka et al., 2017, analyzed the effects of cross-sex hormones on cardiometabolic outcomes of adults and is not applicable to the population of youth or the question of pubertal suppression.
    • The second systematic review, Sing-Ospina et al., 2017, analyzed the effects of gender-affirming endocrine interventions on bone health, but of the total of 13 studies, only 1 study dealt with pubertal suppression in youth. The rest of the studies were for cross-sex hormone use by adults. The one study, Klink et al., 2015, concluded that there were adverse effects of pubertal suppression on bone health of youth that were not attenuated even after initiation of cross-sex hormones. However, because the other 12 studies concerned mature adults’ use of cross-sex hormones and found no adverse effects on bone, the review concluded no adverse effects on bone, which the ES2017 recommendation for puberty suppression for youth ultimately relied upon.
    • Of note, no systematic evidence reviews of psychological effects of pubertal suppression on youth were conducted despite psychological benefits being the primary indication for puberty blockade in gender-dysphoric youth.
    • A recent systematic review commissioned by the Cass Review assessed ES2017 using AGREE II, a tool widely used by AWMF, which oversaw the guideline, and concluded that these guidelines, like WPATH, “lack developmental rigor and transparency.” The guideline attained 44 out of 100 possible points in the “methodological rigor” domain, and 22 out of 100 possible points in the “applicability” domain. The German process also appraised ES2017 using AGREE II and gave it similar ratings, 40/100 for methodology and 22/100 for applicability.
    • If properly appraised, ES2017 recommendations for adolescents are unlikely to meet the S3 level requirements, highest possible level, due to failure to meet key requirement 8: “systematic methods were used in the search for evidence.” This requirement was demonstrably not met for the population of adolescents, as no specific literature search of the literature about benefits or harms for adolescents was conducted. A recent BMJ article concurred that ES2017 recommendations cannot be considered evidence-based.
  4. Cass Review. The German guideline appears to suggest that the Cass Report supports the notion that puberty blockers should be prescribed based on consensus since the evidence is of very low certainty: “The clinical recommendations derived from [UK NICE systematic review] have so far not included any proven clinical experience expertise, although this is explicitly formulated as a requirement in the Cass Review. There it is stated that as long as the evidence is uncertain, the broadest possible consensus of clinical experts should be sought as a basis for preliminary treatment recommendations.” The Cass Report did not support the use of puberty blockers based on “consensus,” and in fact strongly recommended decommissioning their use for gender dysphoria — the recommendation that NHS England recently accepted.
  5. Patient values and preferences research into how patients and caregivers trade off benefits and risks of pubertal suppression in the short and long term has not been conducted. High-quality values and preferences research should be established on the basis of understanding the benefits, harms, and other desirable and undesirable consequences of different alternative interventions. Lack of evidence on the benefits and harms of interventions is one major hurdle for understanding values and preferences. Another concern is the cognitive development of children and adolescents, and their ability to appreciate benefits and harms of aspects of their lives that often do not typically come into consideration until later in life, e.g., desire for children and sexual function. The lack of quality research into this complex area makes the argument of teen “preference” for immediate physical changes over avoiding long-term harms deeply problematic.
  6. Interpretation of individual study findings. Instead of relying on systematically appraised body of evidence, the review relies on findings from individual non-systematically searched studies. It does not appear that a structured appraisal of risk of bias of individual studies has been conducted. The discussion of study findings show a preference toward a positive interpretation of uncertain outcomes of youth transitions. For example:
    • The guidelines reference the Dutch research, de Vries et al., 2014, as evidence of no/low regret of youth transitions, by stating “Of the 55 people reported, no case of regret and/or detransition was reported.” It fails to mention the very short follow-up, average two years after surgery. It also does not mention several adverse affects among the original cohort of 70 which became reclassified as “non-completers,” including one transition-associated death, three instances of patients developing severe diabetes and obesity, and at least one apparent “stopped treatment” which could signal detransition.
    • The guidelines' analysis of several US studies, Tordoff et al., 2022 and Turban et al., 2020, fails to critically appraise the studies for methodological flaws, and does not discuss notable studies that contradict the conclusions of benefit of pubertal suppression, e.g., Carmichael et al., 2021, McPherson & Freedman, 2023.
    • This preference for positive interpretation could be mitigated by systematic search and synthesis of the relevant evidence for each outcome in question, and appraisal of quality of evidence using a widely accepted tool such as the GRADE system. The lack of transparent evidence synthesis and evidence appraisal is concerning.

 

Table A3Cross-Sex Hormones (CSH) Show table… Hide table…
German 2024 GuidelineCass Report, 2024
Recommendations
  • Cross-sex hormone treatment is indicated for any adolescent with the ICD-11 diagnosis of gender incongruence (GI) who experiences a “long-term desire” for the “physical changes expected as a result of hormone treatment.” No minimum age is specified.
  • The DSM-5 “distress” criterion no longer applies, and distress appears to be understood not as impairment in functioning, but as the “desire to develop the gender-specific physical changes” not associated with natal puberty.
  • No requirement of childhood-onset of gender incongruence: While “long-standing” gender incongruence is required, the minimum duration is deliberately not specified. The timing of the onset, pre-pubertal vs post-pubertal, appears unimportant; it is sufficient that the “distress developed or intensified after the onset of puberty.”
  • A mental health assessment is required to establish “stable/persistent gender incongruence.” However, the guideline acknowledges a lack of “empirically validated individual criteria for the determination of a permanent stability/persistence of gender incongruence or gender dysphoria.”
  • Ongoing psychotherapy not required. The decision of psychotherapy should be made on a case-by-case basis and prioritized with the patient. Treatment of other mental disorders is recommended but should not interfere with the body-modifying treatment.
  • Parental involvement and co-consent are recommended. If the child desires cross-sex hormones but parents do not agree, and if counseling cannot reconcile the disagreement, child protective services and court systems may be called on to protect “best interests of the child.”
  • Cross-sex hormone treatment for those diagnosed with Gender Dysphoria is currently available but with a new qualification of “extreme caution.” The minimum age is 16.
  • Psychotherapy and psychosocial support should be the first line of treatment. There should be a clear clinical rationale for providing hormones for minors rather than waiting until an individual reaches 18.
  • A new centralized team not directly involved in care of the young person would need to approve the medical necessity.
  • NHS England will use the Cass recommendations to develop a policy on masculinizing/feminizing hormones for those aged 16 and older.
Rationale
  • Benefits:
  • Claims systematic reviews show “favorable outcome of the measured parameters.”

“The systematic reviews of the British NICE (National Institute for Health and Care Excellence (NICE), …on the state of the evidence on puberty-blocking and gender reassignment hormone treatment.. in adolescence state that the studies available at the time of the reviews point overall in the direction of a favorable outcome after medical interventions for the mental health of adolescents with gender dysphoria.”

SEGM fact-check note: This is an inaccurate representation of the findings of the NICE review. See SEGM fact-check 1 at the end of the table.

  • Quotes individual studies to assert there is evidence of benefit for the “overall package” of treatments, rather than evidence for benefits of cross-sex hormones:

“The reported data from previous non-controlled clinical cohort studies on hormonal interventions in adolescents with diagnosed gender incongruence or gender dysphoria provide consistent evidence for a favorable outcome of the measured parameters for mental health and life satisfaction if gender reassignment hormone treatment was at least part of the treatment.”

SEGM fact-check note: It is not appropriate to analyze results of individual studies. Instead, conclusions must be drawn from the entire body of evidence which was systematically searched and appraised for quality at each outcome level. See SEGM fact-check 2 at the end of the table.

  • Benefits:
  • Concludes there is no trustworthy evidence of psychological benefits of cross-sex hormone treatments, quoting the 2024 York systematic review:
  • “There is a lack of high-quality research assessing the outcomes of hormone interventions in adolescents experiencing gender dysphoria/incongruence, and few studies that undertake long-term follow-up. No conclusions can be drawn about the effect on gender-related outcomes, body satisfaction, psychosocial health, cognitive development or fertility. Uncertainty remains about the outcomes for height/growth, cardiometabolic and bone health.”
  • The evidence also did not support the notion that hormone treatment decreases risk of death by suicide.
  • Notes about the small effect sizes and the possibility that the improvements may be short-lived:

“When a young person has been on puberty blockers, a short-term boost in mental wellbeing is to be expected when sex hormones are introduced....The start of long anticipated physical changes would be expected to improve mood, at least in the short term, and it is perhaps surprising that there is not a greater effect size. However, much longer term follow-up is needed to understand the full psychological impact of medical transition.”

  • Harms:
  • Physical harms: The recommendations for cross-sex hormone treatment are not accompanied with the evidence on the probability and severity of potential harms although a number of potential harms are mentioned, e.g., increased BMI, decreased HDL, increased risk of thrombosis, etc. The harm to fertility is also recognized.
  • Harms:
  • Physical harms: Systematic reviews evaluated a range of physical health outcomes. It found only one high-quality study that examined side effects. Inconsistent results were observed for height/growth, bone health and cardiometabolic effects. There was insufficient evidence to assess impact on fertility; no study assessed fertility in birth-registered females. Most studies included adolescents who received puberty suppression, making it difficult to determine the effects of hormones alone.
  • Psychological harms from overtreatment:
    • Diagnostic reliability: Acknowledges a lack of predictive validity of the ICD-11 diagnosis of gender incongruence, and a lack of validated criteria to predict persistence. However, it is not viewed as a risk for overtreatment.
    • Gay youth: Acknowledges sexual orientation may be related to persistent gender incongruence in some adolescents, but recommends cross-sex hormone treatment regardless of sexual orientation.
    • Autistic youth: Although the guidelines recognize a high rate of co-occurrence between autistic diagnosis and gender incongruence, there is no apparent concern over potential overdiagnosis or overtreatment of autistic youth.
  • Psychological harms from overtreatment:
    • Diagnostic reliability: Recognizes that the diagnoses of “gender dysphoria” (DSM-5) or “gender incongruence” (ICD-11) lack predictive validity. It is unknown whether that young person will have longstanding gender incongruence in the future, or whether medical intervention will be the best option for them.
    • Gay and autistic youth who frequently exhibit gender non-conforming behaviors and are susceptible to developing GD/GI. There is a concern with inappropriately treating such youth with gender transition.
  • Youth with fluid/evolving identities: Though the guidelines recognize the rise of a non-binary identity and a lack of understanding how such identities may develop, the guidelines still recommend cross-sex hormone treatment for children reporting non-binary gender identity.
  • Youth with fluid/evolving identities: Recognizes that nonbinary identities are on the rise, and that identity in youth is still developing, which raises questions about medical interventions.
  • Detransition is recognized as a phenomenon but assumed to be rare and not a signal of overtreatment/harm.
  • Detransition: The percentage of people treated with hormones who subsequently detransition remains unknown due to the lack of long-term follow-up studies, although there is suggestion that numbers are increasing.
  • Issues related to consent: Recognizes that the irreversible nature of many hormone-induced changes, including risk to fertility and sexual function, make it imperative that the young person is capable of consent. In the event that a minor cannot consent, legal guardians should not be allowed to consent on the minor's behalf. Instead, the minor's own capacity to consent should be developed.
  • Issues related to consent: Recognizes a key barrier to informed decision making, since poor evidence basis makes it challenging to provide adequate information on which a young person and their family can make an informed choice.
  • There are no accepted alternatives to hormone treatment: “there is a lack of a justifiable evidence-based alternative treatment option in the sense of a previously established and proven treatment.”
  • Because adolescents have associated desire to develop “the gender-specific physical changes expected as a result of hormone treatment” over avoiding uncertain long-term harms, cross-sex hormone treatment is justified.
  • Urges the development of evidence base for alternative treatment approaches: “An explicit clinical pathway must be developed for non-medical interventions, as well as a research strategy for evaluating their effectiveness.”
  • Harm-benefit ratio: While the evidence is recognized as uncertain, the benefits are assumed to outweigh harms, while the principles of self-determination of minors should guide the decision to treat with cross-sex hormones, as long as minors are deemed capable of consent.
  • Harm-benefit ratio: There is insufficient and/or inconsistent evidence about the risks and benefits of hormone interventions in this population.
Supporting Evidence
  • Systematic evidence reviews:
  • The guideline methods “Leitlinienreport” referenced a systematic evidence search which included search terms specific to the topic of hormone treatment. The systematic search was stopped in August 2017.
  • The guideline discussed a number of studies but did not formally appraise them for risk of bias.
  • The guideline references the evidence of “whole package”, which included psychotherapy and hormone treatment, rather than evidence on hormone treatment.
  • Systematic evidence reviews:
    • The Cass Review commissioned a systematic evidence review for cross-sex hormones, systematically searching for studies through April 2022.
    • The systematic review assessed 3,181 studies for eligibility, yielding 53 eligible studies, which were subsequently assessed for risk of bias using the modified Newcastle-Ottawa Scale.
  • Consideration of other guidelines and reviews:
  • The review relied on WPATH Standard of Care 8 (SOC8) after 2017, asserting that SOC8 meets the German S3-level evidence-based guidelines, the highest possible level.
  • The review also references Endocrine Society 2017 (ES2017) guidelines, asserting their equivalence to German S3-level evidence-based guidelines, the highest possible level.
  • The guideline also references Cass Review, with the incorrect implication that Cass Review would concur with the German recommendation.

See SEGM fact-check notes 3–5 at the end of the table.

  • Consideration of other guidelines and recommendations:
  • Analyzed 23 guidelines and clinical recommendations in two systematic reviews. Conducted two systematic reviews. Concluded, “Two international guidelines (World Professional Association for Transgender Health and Endocrine Society) formed the basis for most other guidance, influencing their development and recommendations.”
  • For WPATH Standards of Care 8 (SOC8) and the Endocrine Society 2017 (ES2017) concluded, “Most clinical guidance lacks an evidence-based approach and provides limited information about how recommendations were developed. The WPATH and Endocrine Society international guidelines, which like other guidance lack developmental rigour and transparency have, until recently, dominated the development of other guidelines. Healthcare professionals should consider the lack of quality and independence of available guidance when utilising this for practice.”
SEGM fact-check / Notes
  1. NICE systematic evidence review. The German guideline does not accurately describe the conclusion of the referenced 2020 NICE review for cross-sex hormones. Rather than concluding “favorable outcomes,” the 2020 NICE cross-sex hormones review noted:

    “The key limitation to identifying the effectiveness and safety of gender-affirming hormones for children and adolescents with gender dysphoria is the lack of reliable comparative studies. All the studies included in the evidence review are uncontrolled observational studies, which are subject to bias and confounding and were of very low certainty using modified GRADE. A fundamental limitation of all the uncontrolled studies included in this review is that any changes in scores from baseline to follow-up could be attributed to a regression-to-the mean...Any potential benefits of gender-affirming hormones must be weighed against the largely unknown long-term safety profile of these treatments in children and adolescents with gender dysphoria.”

  2. Using results from individual studies: The recommendations on cross-sex hormone treatment should be informed by systematically appraising the entire body of evidence regarding each specific outcome. Further, discussing individual findings from studies, without assessing the study for risk of bias, is not appropriate, as any given study's result may not be trustworthy.
  3. WPATH SOC8 Guidelines. Contrary to the assertion in the German guideline, WPATH SOC8 is not an evidence-based guideline for adolescents. The Adolescent section of SOC8 explicitly states that it was based on a narrative, rather than systematic, review. A recent systematic review commissioned by the Cass Review assessed SOC8 using AGREE II, a tool widely used by AWMF, which oversaw the guideline, and concluded that these guidelines “lack developmental rigor and transparency.” The guideline attained 35 out of 100 possible points on the “methodological rigor” domain, and 24 out of 100 possible points in the “applicability” domain. The German process also appraised SOC8 using AGREE II and gave it somewhat higher ratings, 55/100 for methodology and 28/100 for applicability.

    If properly appraised, WPATH SOC8 is unlikely to meet S3 guideline, highest possible level, due to failure to meet several of the current S3 requirements, e.g., DOLBI item 8 “systematic methods were used in the search for evidence”; item 9 “the criteria for the selection for the evidence are clearly described”; item 12 “the link between the recommendations and the underlying evidence is presented”; and a number of other criteria. A recent BMJ article concurred that SOC8 cannot be considered an evidence-based guideline.

  4. The Endocrine Society 2017 guidelines (ES2017). Contrary to the assertion in the German guideline, the Endocrine Society 2017 guideline cannot be considered an evidence-based guideline for puberty blockade because it did not use a systematic review of evidence for pubertal suppression. S3 guidelines require that systematic methods were used to search for evidence, AGREE criterion 8.
    • Neither of the two ES2017 commissioned systematic reviews of evidence focused on adolescents.
    • The first systematic review, Maraka et al., 2017, analyzed the effects of cross-sex hormones on cardiometabolic outcomes of adults and is not applicable to the population of youth.
    • The second systematic review, Sing-Ospina et al., 2017, analyzed the effects of gender-affirming endocrine interventions on bone health, but of the total of 13 studies, only 1 study dealt with cross-sex hormones in youth; the rest of the studies were for cross-sex hormone use by adults. The one study, Klink et al., 2015, concluded that there were adverse effects of pubertal suppression on bone health of youth that were not attenuated even after initiation of cross-sex hormones. However, because the other 12 studies concerned mature adults’ use of cross-sex hormones and found no adverse effects on bone, the review concluded no adverse effects on bone, which the ES2017 recommendations for endocrine interventions for youth ultimately relied upon.
    • A recent systematic review commissioned by the Cass Review assessed ES2017 using AGREE II, a tool widely used by AWMF, which oversaw the guideline, and concluded that these guidelines, like WPATH, “lack developmental rigor and transparency.” The guideline attained 44 out of 100 possible points in the “methodological rigor” domain, and 22 out of 100 possible points in the “applicability” domain. The German process also appraised ES2017 using AGREE II and gave it similar ratings, 40/100 for methodology and 22/100 for applicability.
    • If properly appraised, ES2017 recommendations for adolescents are unlikely to meet the S3 level requirements, highest possible level, due to failure to meet the S3 level key requirement 8: “systematic methods were used in the search for evidence.” This requirement was demonstrably not met for the population of adolescents, as no specific literature search of the literature about benefits or harms for adolescents was conducted. A recent BMJ article concurred that ES2017 recommendations cannot be considered evidence-based.
  5. The Cass review revealed that most included studies on cross-sex hormones included adolescents who received puberty suppression, making it difficult to determine the effects of hormones alone. It is not appropriate to draw conclusions about cross-sex hormone treatment based on the “whole package” of treatment, which included various steps of medical transition, and was confounded by psychological interventions.

 

2. Methodological Issues

High-quality guidelines share the following characteristics: the recommendations are clear and actionable; the evidence is summarized using rigorous systematic review methods; the guideline panel considers all outcomes important to patients; and the guideline panel makes appropriate judgments in the interpretation of the evidence and the final recommendation. Having assessed the final draft of the guidelines "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment" for methodological rigor, we conclude that it does not meet the standard for a credible evidence-based guideline.

While the guidelines acknowledge that their recommendations are not evidence-based, they wrongly state that evidence-based recommendations were not possible due to the low level of evidence. In explaining the reason for the downgrading of the guideline from its originally-intended S3 evidence-based guidelines to the current consensus-based S2k level, the guidelines provided this justification: 

 

"Due to a lack of controlled evidence of efficacy and an overall weak evidence base with regard to uncontrolled evidence of efficacy from case-cohort studies, no evidence-based recommendations were made in this guideline for the treatment of GI or GD; instead, all recommendations were developed on the basis of consensus."

 

However, as is widely known, evidence-based guidelines can be created regardless of the level or strength of the available evidence. As long as a rigorous process for guideline development is followed, an evidence-based guideline can be produced even in the context of extremely limited, low-quality evidence. Below is a summary of the key methodological shortcomings of the study. This summary is followed by a detailed assessment that suggests that not only did the guidelines fail to meet the S3 criteria, but even the lower-level S2k criteria may not have been met

  • Systematic search for evidence stopped between 2017–2019. The decision by the guidelines committee to stop the systematic evidence search after 2019 (and for some topics, as early as 2017) led to a failure to systematically assess as much as 50% + of the relevant literature, depending on the topic (e.g., see Table 1A, "SEGM fact-check / Notes). The cessation of the systematic search so early in the guideline development renders the recommendations not evidence-based; it omits a large body of recent literature, which is most applicable to the current populations of youth presenting with gender dysphoria/gender incongruence.
  • The search for the evidence was conducted without clearly defined criteria. Even during the early timeframe when a systematic search was conducted, the approach to defining search criteria was inadequate. The study inclusion criteria were overly broad and vague (e.g., were articulated at a high level instead of stating it separately for each intervention; did not specify target outcomes; did not list comparator groups; did not specify study designs, etc). This makes the guideline susceptible to concerns of bias over which studies were allowed to influence the recommendations.
  • The evidence was not critically appraised at the study level and not rated for certainty overall. While the guideline acknowledged that the overall quality of evidence was poor, the appraisal fell far short of what the AWMF "DELBI" standards for guideline appraisal consider adequate. Individual studies were not appraised for risk of bias (RoB), and the overall body of evidence was not appraised for quality/certainty using tools such as GRADE. Notably, the guidelines also misrepresented the findings of the NICE systematic review for puberty blockers, wrongly suggesting that it concluded that puberty suppression was beneficial to young gender-dysphoric people.
  • There was no explicit link between the recommendations and the evidence base. None of the over 70 topic-specific recommendations, including the specific recommendations regarding psychotherapy, social transition, puberty blockers, cross-sex hormones, and surgery are linked to a body of evidence that is graded for certainty. Instead, the guidelines make specific treatment recommendations justifying them by findings from individual studies (which were not assessed for risk of bias and frequently presented highly biased findings as a trustworthy basis for recommendations).
  • Failure to properly engage stakeholders with a range of views representative of the relevant clinician and patient communities. According to the AWMF "DELBI" standards for guideline appraisal, the guideline development effort should seek the engagement of professionals who will be tasked with implementing the recommendations, and patient/citizen groups whose care will be affected by the recommendations. While the guidelines did include professionals from 27 organizations, it appears that a diversity of views was lacking.
    • The failure to ensure intellectual diversity and manage disagreement was suggested by the fact that one of the guideline steering committee members quit the effort. The guideline methods "Leitlinienreport" report reveals that "Prof. Dr. med. Florian Daniel Zepf left the steering group at his own request after two years on the steering group due to his stated professional ethical concerns and 'concerns regarding aspects of child and youth protection'. At no time was he entitled to vote in consensus conferences." 
    • The failure to properly engage professionals with dissenting opinions became apparent when during the comment period, 15 Chairs and senior members of the Child and Adolescent Psychiatry Association submitted a 100+ page dissenting opinion.
    • The failure to ensure a broad representation of opinions held by German clinicians was also evident when the German Medical Assembly, which represents 250 delegates representing 17 German medical associations, passed resolution Ic-48 calling for a markedly different approach to treating gender-dysphoric youth than the one outlined in the draft guidelines. The resolution asked to restrict all gender-transitioning treatments for youth to clinical trials.
    • The diversity of patient perspectives was also not represented, as evidenced by the dissenting opinion published by several parent stakeholder groups shortly after the draft guideline was completed.
  • Failure to manage conflicts of interest. The guideline methods "Leitlinienreport" report states, "no conflicts of interest were found which have been considered problematic with regard to the involvement of the members of the guideline commission in the consensus process." However, a detailed analysis of the guideline development group's composition, compiled by the parent groups, noted what appeared to be significant intellectual and potential financial conflicts of interest for a number of the guideline development group members, and their lack of independence from powerful interest groups that promote a medicalized approach to treating minors.

 

German guideline group composition COI

 

 

Our detailed methodological assessment of the German guidelines relative to the AWMF DELBI standard is presented in Table B below. NoteThe DELBI instrument comprises 34 items across 8 domains, 7 of which are drawn from the internationally recognized AGREE II tool, with the eighth domain relating to "applicability in the German healthcare system." A full assessment was out of scope, so instead we focused on the subset of the criteria which the AWMF lists as specifically differentiating between the level of guidelines (S1-S3).

Table B - Assessment of "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment" relative to the DELBI standards outlined by AWMF

Show table..

 

What is the appropriate "S" classification for the current draft guidelines?

The AWMF classifies guidelines from S1 to S3. The "S1" rating is the lowest level, reserved for the guidelines based only on recommendations by experts; "S2" guidelines require a structured consensus process ("S2k") or a systematic literature review ("S2e"); while "S3" guidelines are based both on a systematic literature review, and employ a structured consensus process to make recommendations.

As mentioned earlier, the guidelines "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment" were originally registered at the "S3" level, which represents the highest level of rigorous and trustworthy guidelines. The guideline draft continued to carry the designation of S3 until January 2024. However, following a meeting with AWMF in January, the final draft was downgraded to "S2k." Two major reasons were offered publicly as the explanation for the downgrade. One was that the decision to downgrade the guideline was due to the poor quality of the evidence itself (a surprising explanation, as it is widely known that even when the evidence itself is of very low quality, it is possible to create high-quality evidence-based guidelines, as long as the process follows a high methodological standard).

Another stated reason for the downgrade was that this was merely a "preemptive" downgrade in anticipation of AWMF strengthening its S3 criteria, which would have rendered the guideline non-compliant with "S3" in the near future: the upcoming change in the S3 standard will require that at least 50% of the recommendations are evidence-rather than consensus-based. Of note, however, it appears that this requirement was already present in the 2023 AWMF Guidance Manual and Rules for Guideline Development (p. 40, see below) and according to another AWMF document, it appears that this requirement may have been in place as early as July 2023 (p. 3), a full 6 months before the guidelines were finalized.

50 percent requirement

In trying to understand the marked discrepancy between the Cass Report and the German draft recommendations, we embarked on assessing the guideline development process used by the German guideline development group using the "DELBI" standard. Our analysis suggests that not only do the guidelines fail to meet the stated future S3 requirements, but they fail to meet the S3 standard as it exists currently. Per the 2023 AWMF Guidance Manual and Rules for Guideline Development (p. 57), S3 guidelines require assigning a formal "grade" to each recommendation*:

In the case of S3 guidelines, the formal consensus development process for adopting recommendations focuses on clinical aspects to judge the methodologically synthesised evidence. The recommendations are then discussed on this basis. Next, the strength of the recommendations is determined and a grade of recommendation assigned [emphasis added].

By additionally indicating the strength of consensus (percentage of agreement within the guideline development group) for each recommendation, the guideline users are given an impression of the extent to which all participants were in agreement. [emphasis added].

 

As our detailed analysis in Table B demonstrates, most of the S3 requirements have not been met. However, our analysis also suggests even the lower S2k level may not have been reached. According to the various AWMF documentation sources, while the S2 guidelines do not require a formal grading of the evidence using structured tools such as GRADE, a less formal assessment of the quality of the recommendation is still required. In 2013 the AWMF made it clear that for S2k guidelines, the consensus rating should be used in addition to, rather than instead of, providing a grading for the strength of the recommendation:

For consensus-based guidelines (S2k), the strength of recommendations is identified and adopted during the formal consensus process, although an indication of grades of recommendation (and levels of evidence) is not included because recommendations are not based on a systematic review of the evidence. Here, the strength of a recommendation is expressed in words only. Additionally, the strength of consensus (percentage of agreement within the guideline development group) can be indicated for each recommendation [emphasis added] (p. 42).

 

The updated 2023 AWMF Guidance Manual and Rules for Guideline Development language is similar but less specific:

For consensus-based guidelines (S2k), the strength of recommendations is identified and adopted during the formal consensus process. Nevertheless, it is not planned to state schematic grades of recommendation or levels of evidence because recommendations are not based on any systematic processing of the evidence. The grade of a recommendation is expressed in words. 

 

It further specifies however that the consensus is used to arrive at the ranking of the recommendation, rather than substitute it: 

AWMF Guideline Register Rule: Classification of S2 and S3 guidelines (excerpt): If it is an S2k or S3 guideline:

  • The methods for formulating recommendations are clearly described. This requires formal consensus techniques (e.g. consensus conference, nominal group process or Delphi method (see AGREE II Criterion 10).
  • Every recommendation is discussed and voted on as part of a structured consensus development with a neutral moderator. The objectives are to find a solution to pending decision-making issues, to establish a final ranking of the recommendations (S2k guideline) and determine the grade of recommendation (S3 guideline), and to measure the strength of consensus [emphasis added] (p. 58)

The requirement that "the strength of the recommendation is expressed in words" also appears in the "aids and appendices" section for S2k.

There does not appear to be any clear indication of the strength of the recommendations (even informal) in the guidelines text. However, the requirement that the "strength of the recommendation is expressed in words" is vague, and therefore, it is unclear if the S2k criterion has been met. What is clear, however, is that the strength of consensus is not considered to be a substitute for the strength of recommendation by AWMF, as the two measure different constructs. The strength of consensus indicates the proportion of the guideline panel that agrees with the recommendation statement. The strength of the recommendation signals whether almost all, most, or only a few of the patients would benefit from the recommendation.

While it is up to AWMF to determine whether or not the rating/grading of the strength of the recommendation "expressed in words" criterion has been met, to the extent that S2k guidelines are considered credible for implementation by AWMF, the lack of any indication of the strength of the recommendation (formal and informal) is a key limitation of the current draft, regardless of the final AWMF rating.

*This text has been updated based on the 2023 AWMF Guidance Manual and Rules for Guideline Development (the prior language was based exclusively on the 2012 AWMF Guidance Manual).

3. The Evolution of German Guidance to Care of Children and Adolescents 

Like in other Western countries, the care for gender incongruent/gender dysphoric children and adolescents in the 21st century was largely informed by innovations in the care for gender-dysphoric mature adults in the 20th century. Germany occupies a special place in the history of adult gender medicine. In the 20th century, it was the German sexologist and founder of the Institute for Sexual Science in Berlin, Magnus Hirschfeld, who coined the term “transsexual” and oversaw the first “gender-affirming” procedures in the 1920s-1930s. Later, Hershfield mentored his younger German colleague Harry Benjamin, who later immigrated to the United States, and became known as the founder of the field of gender medicine in the U.S. and worldwide in the 1950s. Benjamin’s research led to the formation of WPATH, which, from its inception until 2006, carried Benjamin’s name.

Not only was Germany instrumental to initiating the practice of gender transitions, but it has been the worldwide leader in promoting the "depathologizing" approach to gender incongruence in adults. In 2015, the German Medical Association proposed new transgender guidelines during the World Medical Association (WMA), which asserted that “everyone has the right to determine one’s own gender,” that “gender incongruence is not in itself a mental disorder,” and recommended that “every effort be made to make individualised, multi-professional, interdisciplinary and affordable transgender healthcare (including speech therapy, hormonal treatment, surgical interventions and mental healthcare) available to all people who experience gender incongruence in order to reduce or to prevent pronounced gender dysphoria.” The WMA accepted the German proposal and adopted the guideline, with many medical associations worldwide (including the American Medical Association, who is a member of WMA) embracing it. 

In contrast to the adult 2015 German guidelines, the original German treatment guidelines for children and adolescents, written in 1999 and updated in 2013, retained a cautious approach to medicalizing gender-dysphoric youth. The guidelines required youth to undergo psychotherapy for at least 1 year prior to commencing medical interventions to rule out the possibility of transient distress due to emerging sexuality or any other developmental struggles. For youth whose desire to transition persisted, an additional 1-year “real-life test” was required. Puberty blockers are allowed at Tanner stage 2, while the minimum age of cross-sex hormones is set at 16—but both treatments are only provided to those who were expected to continue to suffer from “life-long transsexualism.”

The 2013 guidelines recognized the significant uncertainty regarding the medicalization of gender-dysphoric minors. In addition to discussing the complexities of adolescent development, the 2013 guideline acknowledged that endocrine interventions in general, and puberty blockers in particular, could alter the natural psychosocial development, with a particular effect on gay youth. 

To diagnose “transsexualism” (an ICD-10 diagnosis) in a young person, a number of conditions have to be ruled out first. Differential diagnosis included temporary distress, personality disorders, development of a homosexual orientation, or a “sexual maturation crisis,” especially in the cases where gender distress appeared “shortly before or during puberty.”

The 2013 guidelines were classified as "S1" (lowest level, as the consensus process was not "structured"). To be rated as an S1-level guideline, "a representative group of experts from the specialist society(s) draws up a recommendation by informal consensus, which is finally approved by the board of the specialist society(s) and any other organizations involved."

As discussed above, this changed in 2020, when the AWMF—the umbrella organization for Germany’s medical societies which also organizes guideline updates—officially registered its plan to update the 2013 guideline for the management of gender dysphoria in youth. According to the guideline registration information, the primary justification for the update was a “paradigm shift” in the world whereby gender-related variance was no longer considered to be a form of mental illness. The registration specifically referenced the diagnostic change in ICD-11, which officially depathologized “gender incongruence” by removing this diagnosis from the mental health disorders section. The guideline registration did, however, mention that the approach to minors would take into account “particularities of the still incomplete biological development of maturity and identity and personality development in childhood and adolescence.”

The registration information provided additional detail on the planned guideline update. The “Guideline goal” section (“Zielorientierung der Leitlinie”) stated that the goal was to create a “valid guideline in which the treatment standards for childhood and adolescence are summarized with the greatest possible evidence-base and consensus among the professional associations involved in order to achieve an improvement in quality standards in medical care.” The "methodology" section (“Methodik”) referenced the plan to utilize systematic reviews of evidence as well as the use of a structured consensus process.

In accordance with this description, the guideline was registered as an S3 guideline, and in fact, the early draft in January 2024 still carried the S3 designation, suggesting that the guideline will utilize a systematic review of evidence. As explained earlier, German guidelines can carry a designation from S1 to S3 depending on the rigor of the process. Guidelines classified as S1 are of the lowest quality, since they are based only on recommendations by experts; S2 guidelines require a structured consensus process (S2K) or a systematic literature review (S2E); while S3 guidelines include both elements.

Despite registering and publishing the guideline draft as “S3,” the draft had to be downgraded to “S2k” after the AWMF meeting in January. According to the process outlined by AWMF, the next steps include the meeting of the Boards of all 27 participating organizations, with plans to vote on the adoption of the guideline in June 2024. Should the guideline be adopted, it will replace the 2013 version. However, it is unclear how binding the guidelines are, especially in light of the significant dissent from both professional and patient/community stakeholder groups, as discussed above.

SEGM Take-away

Guidelines exist to enable physicians, patients, and policy-makers to make informed decisions about treatments. While AWMF allows for these different "levels" of guidelines, the fundamental requirement that guidelines be trustworthy is not optional. The German AWMF Instrument for Methodological Guideline Appraisal (DELBI) acknowledges that "the primary aim of clinical practice guidelines is to enhance good clinical practice" by assessing "comprehensive knowledge (scientific evidence and clinical experience) about problems of care, to reconcile opposite views and to define current optimal practice by trading off benefits and harms." DELBI recognizes that if a guideline is not viewed as reliable, it will not be accepted by the end users: "The effectiveness of guidelines depends on their acceptance and the reliability of their recommendations" (p. 9). 

From the German AWMF/DELBI standard perspective, the current guidelines fall far below the originally intended S3 evidence-based guidelines. More importantly, irrespective of the exact DELBI rating, treatment guidelines must be trustworthy. The requirement that guidelines should be evidence-based is no longer optional in the world of evidence-based medicine of the 21st century. As our analysis demonstrates, the "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment" guidelines do not meet the basic requirement of credible, trustworthy, evidence-based guidelines. 

 

Correction Notices (March 26, 2025)

The previously stated position that the draft recommendations were not graded for strength has been identified as incorrect. Consequently, the following three corrections have been implemented:

  1. The following text has been removed:

"The recommendations were not graded for strength. The hallmark of an evidence-based guideline is the grading of the guideline recommendations for strength. This informs guideline users how to interpret each recommendation. Recommendations may advocate either "for" or "against" specific treatments, with gradations of "strong" or "conditional." A "strong" recommendation suggests nearly all patients would benefit from (or be harmed by) the intervention, whereas a "conditional" recommendation indicates that most, though not all, patients would benefit (or be harmed). None of the 70+ recommendations in the guideline were graded for strength using either a formal methodology (e.g., GRADE) or a less formal method. Instead, only the strength of "consensus" was provided. However, according to the AWMF Guidance Manual and Rules for Guideline Development, the strength of consensus should supplement—not replace—the grading of recommendation strength."

 

  1. The following sentence has been removed:

"It appears that even the S2k level, which the guideline is purportedly certified at by AWMF, may not have been adequately met, as the guideline did not indicate the strength of recommendations (only providing the strength of consensus, which is insufficient per AWMF documentation for S2k certification)."

 

  1. In one extensive passage, it was determined optimal to retain the original text while explicitly noting the error and its implications directly in the text. The supplementary clarification reads:

"Above we provide seven points to support our concern that the draft guidelines might not meet the criteria for an S2k level. One of these seven points is that the draft recommendations do not include a grading of strength. As noted in a correction notice to this Spotlight, this is not correct. This misunderstanding arose partly because neither the Guidelines nor the associated "Leitlinienreport" clarify this point, and partly due to nuances lost when the recommendations were translated into English. Consequently, the criticisms made throughout this subsection, arguing that the guidelines might not meet S2k certification criteria, are undermined by this correction. The other six points remain valid."