The German guidelines' marked divergence from the Cass recommendations is explained by their failure to systematically appraise the evidence
In March 2024, the Association of Scientific Medical Societies in Germany (AWMF) published the final draft of the guidelines "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment.” The AWMF is an important "pillar" in the German healthcare system, as it is the umbrella organization that organizes guideline updates and certifies treatment guidelines. The guideline development process was formally led by the German Society for Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy (DGKJP), with 26 other medical organizations from Germany, Switzerland and Austria participating. The draft is scheduled to be voted on by the Boards for the 27 societies, and, if accepted, will be published in June 2024 as the final guideline.
Germany's draft recommendations immediately drew international attention for their marked departure from England's Cass Report recommendations. The divergence between the two is remarkable. The Cass Report recommended withdrawing puberty blockers from commissioned treatments for youth gender dysphoria, advised extreme caution regarding cross-sex hormone use, did not consider surgery as a possible option for minors, and asserted that most gender dysphoric youth should be treated psychotherapeutically. In contrast, Germany's draft recommendations relaxed the prior age and eligibility requirements for minors wishing to access body-modifying endocrine and surgical interventions, and asserted that the requirement that minors undergo psychotherapy prior to accessing body-modifying procedures is "not ethically justified for reasons of respect for the dignity and self-determination of the person."
Having analyzed the two sets of recommendations and the processes used to create them, it is apparent that their divergence can be largely explained by the difference in the assumptions about the role of evidence in the process of making the recommendations. The Cass Report started with the assumption that the best treatment approach for gender dysphoric youth is unknown and commissioned 8 systematic reviews of evidence to develop its recommendations. In contrast, the German guideline update started with the assumption that the reclassification of the ICD diagnosis of "gender incongruence" from a mental to a physical health condition (which itself reflected a "societal paradigm shift") demands that body-modifying procedures are available to all those who desire them—including minors. The intention to align treatment recommendations with the "societal paradigm shift" is stated in the guideline registration with AWMF in 2020, and is apparent in the approach that the guideline development team took toward the evidence, which appears to have served as a mere backdrop to an a priori decision to liberalize access to medical intervention for minors.
Originally, the updated German guidelines for treating gender dysphoric children and adolescents were supposed to carry the classification of "S3," which signifies the highest level, evidence-based guidelines. However, the guideline development team abandoned the systematic evidence search after 2019, stating it was no longer "feasible with the Commission's resources." The decision to stop systematically searching for the evidence during the last four years (2020–2023) resulted in a failure to systematically appraise 50% or more of the relevant evidence, depending on the topic (as the recent UK York systematic reviews commissioned by the Cass review demonstrated, more than 50% of the relevant studies were published after 2019). After the final draft was completed in early 2024, AWMF downgraded it from the originally intended highest-level "S3" evidence-based guidelines to their current lower status of S2K "consensus guidelines."
Our analysis concurs with the conclusion that the current draft of the guidelines cannot be graded as "S3" due to its failure to systematically assess much of the relevant evidence, and due to many other deviations from the evidence-based process as outlined by the AWMF-published German Instrument for Methodological Guideline Appraisal (DELBI). However, our methodological assessment suggests that even the lower S2K standard may not have been met. This additional concern deserves consideration, as any guidelines that are considered for implementation must be trustworthy. As the German Instrument for Methodological Guideline Appraisal (DELBI) states, "the primary aim of clinical practice guidelines is to enhance good clinical practice" by assessing "comprehensive knowledge (scientific evidence and clinical experience) about problems of care, to reconcile opposite views and to define current optimal practice by trading off benefits and harms." It does not appear that the basic requirement for trustworthy quality guidelines has been met.
Below, we present a brief summary of the content of the German guideline recommendations and list the key methodological concerns (both sections contain detailed tables that can be expanded). We then discuss the evolution of the German approach to the care of gender-dysphoric/gender-incongruent children and adolescents. We conclude with the SEGM take-aways.
1. Guideline Recommendations
The "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment” guideline contains 72 recommendation statements for 8 topics. All 72 are consensus-based, rather than evidence-based. The guideline methods "Leitlinienreport" report states that evidence-based guidelines were not possible because the quality of the evidence itself was weak:
"After the discussions on the literature-based evidence situation, it was clear to the steering group that there would be no evidence-based recommendations on individual interventions in the treatment of gender incongruence or gender dysphoria in this field due to a lack of controlled evidence of effectiveness and an overall weak evidence situation with regard to uncontrolled evidence of effectiveness from case-cohort studies."
However, the claim that evidence-based recommendations are not possible when the evidence is of low quality is inaccurate. Evidence-based recommendations are always possible regardless of the state of the evidence. The key requirement for a highly quality evidence-based guideline is not that the guidelines are based on high-quality evidence but that they are based on the best available evidence. This is achieved by conducting a systematic search, appraising the evidence for quality, and by basing the recommendations on the evidence while assigning an appropriate level of "strength" to each specific recommendation. This methodologically rigorous process was not followed in the case of the update to the German child and adolescent gender dysphoria guidelines.
Below is a brief summary of the recommendations contained in the draft guidelines:
Tables A1- A3 below present a more detailed analysis of the content of the recommendations comparing the German recommendations and rationale to those in the Cass Report. The Cass Report cautioned against social transition in pre-pubertal children and recommended caution in older adolescents; disallowed the use of puberty of blockers as a treatment for gender dysphoria; suggested "extreme caution" in the use of cross-sex hormones for both adolescents and young adults; and recommended psychosocial interventions as the first and likely only line of treatment for most youth (surgeries for minors, including mastectomies, have never been allowed in England). NHS England accepted the Cass Report's recommendations for implementation.
Table A1 – Social Gender Transition (SGT) Show table..Hide table..
German 2024 Guideline |
Cass Report, 2024 |
Recommendation |
- Youth Social Gender Transition (SGT) is a benign act of self-expression that should be child-led.
- The decision to undergo SGT should be made based on the child’s wish for self-expression. The parents should remain open to future identity changes. Mental health providers can help the family accept the child’s decision and manage adverse societal reactions.
|
- Youth Social Gender Transition (SGT) is an active intervention with potential for benefit and harm.
- Complete SGT is generally discouraged, especially for prepubertal children. If undertaken, it should involve the parents, and be done in consultation with a mental health provider who can advise on the benefits, risks and long-term consequences.
|
Rationale |
The guideline acknowledged the poor quality of evidence, but judged the evidence for benefits to be more compelling than evidence for harms: |
The review noted the poor quality of evidence and inconsistent results, urged caution in interpreting individual study results: |
- Benefits: “There is evidence that an affirmatively supported role change can have a positive effect on social integration and the child's self-confidence through the development of the child's personality in the course of prepubertal development… There is evidence that an affirmatively supported role change before the onset of puberty can have a positive effect on socio-emotional development.”
|
- Benefits: “For example, two studies suggest there may be some benefit associated with use of chosen name in adolescence. However, in another study lifetime suicide attempt and past-year suicidal ideation was higher among those socially transitioning as adolescents compared with those socially transitioning in adulthood.”
|
- Harms: “There was no evidence of increased psychosexual confusion, identity insecurity or otherwise conspicuous gender-related cognitions in a group of children with GI who had undergone a social role change with the support of their parents.”
|
- Harms: “In this review, two studies suggest that children who socially transition are more likely to continue to experience gender dysphoria/incongruence in adolescence, though one study found differences by birth-registered sex. One of these studies also reported that the majority of those who socially transitioned progressed to medical interventions.”
|
- Harm-benefit ratio: Based on the presumption of benefits of social gender transition and the absence of harm, the child’s right to self-determination was used as the guiding principle in making the recommendation in favor of SGT.
|
- Harm-benefit ratio: Potential harms of changing “sex of rearing” on long-term development, and heightened risk of future medicalization are weighed heavier than the potential benefits.
|
Supporting Evidence |
- The guideline methods "Leitlinienreport" report referenced a systematic evidence search.The systematic search was stopped in September 2017.
- The analysis (including non-systematic search after 2017) identified and discussed 12 studies, but did not appear to conduct any structured appraisal of the studies for risk of bias (RoB)
|
- Commissioned a systematic evidence review for SGT.The systematic search was stopped in April 2022
- The systematic review assessed 3,181 studies for eligibility, yielding 11 eligible studies, which were subsequently assessed for risk of bias (RoB) using modified Newcastle-Ottawa Scale (NOS).
|
SEGM fact-check / Notes |
- Over 50% of the studies in the UK SGT systematic review (6 our of 11) were not included in the German SGT analysis. Five of six omitted studies were published after the German team suspended its systematic evidence search for SGT in 2017.
- The non-systematic nature of the literature search and appraisal after September 2017 led to an apparent bias. For example, while the review included Olson et al. 2022 , accepting the study's conclusion that social transition is beneficial, it overlooked the Sievert et al. 2021 study, which came to a different conclusion. This omission is all the more surprising since the data came from a German clinical sample of children with gender dysphoria; the topic was directly related to SGT and the article was descriptively titled "Not social transition status, but peer relations and family functioning predict psychological functioning in a German clinical sample of children with Gender Dysphoria"; and the review team discussed the study's other findings unrelatd to SGT (e.g., the finding of elevated levels of mental health disorders in GD youth) in other sections of the guidelines.
- The German review displays a preference toward a positive interpretation of uncertain outcomes. For example, it interprets the Olson et al. (2022) finding that over 90% of youth who underwent early SGT continue to identify as transgender 5 years later (with majority initiating medical transition), as a sign of long-term stability of trans identities that emerge before puberty. The alternative interpretation of this finding, i.e., that early social gender transition may create a potential "lock-in" effect of transgender identity, is not considered.
- These issues would be mitigated by a systematic evidence search, synthesis of evidence, and appraisal of quality of evidence in regards to each recommendation question. The lack of transparent evidence synthesis and appraisal is concerning.
|
Table A2 – Puberty Blockers (PB)Show table..Hide table..
German 2024 Guideline |
Cass Review, 2024 |
Recommendations |
- PBs for gender dysphoria (GD) are indicated upon commencing puberty and obtaining ICD-11 diagnosis of “gender incongruence” or DSM-5 diagnosis of "gender dysphoria" of adolescence/adulthood. Youth with a cross-sex identification as well as non-binary are similarly eligible for puberty blockade.
- No requirement of distress beyond “anticipatory fear” of developing secondary sexual characteristics. Explicitly states that the DSM-5 “gender dysphoria” diagnosis is no longer required.
- No requirement of childhood-onset of gender incongruence: While “lasting” incongruence is mentioned, it is not defined beyond “weeks to months” of assessments prior to starting PBs.
- Psychiatric assessment is recommended but PBs can be provisionally prescribed without assessment. The high rate of co-occurring mental illness that can complicate the diagnosis, and the difficulty in determining future persistence of trans identity, are cited as main reasons to have “weeks to months” of assessments prior to starting PBs. However, endocrinologists can prescribe PBs provisionally, ahead of the psychiatric assessment, if the pubertal changes are creating an urgency.
- Ongoing psychotherapy not required if no mental health problems are apparent. Gender incongruence is seen as a healthy identity variation which, in and of itself, warrants no additional exploration.
- Parental disagreement may lead to involvement of child protective services. If the child desires PB but parents do not agree, and if counseling cannot reconcile the disagreement, child protective services and court systems may be called on to protect “best interests of the child.”
|
- PBs for gender dysphoria (GD) are no longer allowed in medical practice. Clinical research trials may be approved at a later point for narrow indications.
- If distress is present, standard evidence-based treatments should be used: “standard evidence based psychological and psychopharmacological treatment approaches should be used to support the management of the associated distress and cooccurring conditions. This should include support for parents/carers and siblings as appropriate.
|
Rationale |
- Benefits: UK NICE systematic reviews indicate “favorable outcome of the measured parameters” for psychological benefits of pubertal suppression:
“The systematic reviews of the British NICE (National Institute for Health and Care Excellence (NICE), …on the state of the evidence on puberty-blocking .. in adolescence state that the studies available at the time of the reviews point overall in the direction of a favorable outcome after medical interventions for the mental health of adolescents with gender dysphoria.
SEGM fact-check note: the actual conclusion of the referenced NICE review (and the subsequent Cass Report recommendation to decommission the use of PBs) contradicts this assertion —*see SEGM fact-check 1 at the end of the table:
- Also discusses individual studies that suggest positive outcomes. Specifically, the guidelines describe the positive findings of the original Dutch study but fail to acknowledge the highly relevant UK study from the largest pediatric gender clinic in the world, Carmichael et al., 2021, which failed to replicate the Dutch study’s positive results —*see SEGM fact-check 6 at the end of the table.
|
- Benefits: There is no trustworthy evidence of psychological benefits from systematic evidence reviews from either NICE or the updated York systematic evidence reviews:
- The 2020 NICE systematic review “found no evidence that puberty blockers improve body image or dysphoria, and very limited evidence for positive mental health outcomes, which without a control group could be due to placebo effect or concomitant psychological support.”
- The 2024 York systematic review found a “lack of high-quality research assessing puberty suppression in adolescents experiencing gender dysphoria/incongruence. No conclusions can be drawn about the impact on gender dysphoria, mental and psychosocial health or cognitive development.”
|
- Harms:
- There are no concerns about psychological harms:
- Does not consider the risk of permanently altering gender identity and sexual development credible. Notes high rate of initiation of cross-sex hormones following puberty blockade but attributes this to excellent diagnostic/prognostic ability of the clinicians, rather than to puberty blocker potential to permanently alter gender and sexual identity.
- Does not consider negative impact of neurocognitive development significant. Recognizes research that points to possible problems in cognitive development, but points to research that suggests high levels of educational attainment post-transition.
|
- Harms:
- There are serious concerns about psychological harms:
- Raises the possibility of permanently altering the trajectory of development of sexuality and gender identity. “Blocking natal sex hormone production means that young people have to understand their identity and sexuality based only on their discomfort about puberty and a sense of their gender identity developed at an early stage of the pubertal process.”
- Considers negative impact on neurocognitive development. “Adolescent sex hormone surges may trigger the opening of a critical period for experience dependent rewiring of neural circuits underlying executive function… Brain maturation may be temporarily or permanently disrupted by the use of puberty blockers, which could have a significant impact on the young person’s ability to make complex risk-laden decisions, as well as having possible longer term neuropsychological consequences.”
|
- There are no serious concerns about physical harms:
- Recognizes potential physical harms of PBs (e.g., bone development challenges, insufficient penile tissue for future gender-affirming surgery, sexual side-effects, negative fertility-preservation implications, BMI increase, menopausal symptoms) but does not consider any significant effects besides to bone health.
- Specific to bone density concerns, recommends limiting the duration of puberty blockade.
|
- There are serious concerns about physical harms:
- The York review notes that “bone health and height may be compromised during treatment.”
- Discusses the risk of other adverse effects (metabolic health and weight, insufficient penile tissue, etc).
|
- There are no accepted alternatives: “In particular, psychotherapy alone cannot be regarded as a suitable treatment to effectively reduce or avert gender dysphoria in cases of diagnosed persistent gender incongruence.”
|
|
- Harm-benefit ratio: Because most teens place a higher value on preventing “irreversible progression of the development of secondary sexual characteristics” over avoiding uncertain long-term harms, puberty blockade use is justified —*see SEGM fact-check 5 at the end of the table.
|
- Harm-benefit ratio: Because of the uncertain benefits and because the harms could be significant, PBs can only be used in clinical research settings with “very narrow indication” (i.e., for natal males as an explicit start of a transition pathway in order to stop irreversible pubertal changes) and subject to standard ethics approvals.
|
Supporting Evidence |
- Systematic evidence reviews:
- The guideline methods "Leitlinienreport" report stated that a systematic evidence search was conducted which included search terms specific to the topic of PBs. However, the systematic search was stopped in August 2017, with non-systematic surveillance up until 2023. The body of evidence that was appraised has not been presented.
- The guideline also referenced the UK systematic review of puberty blockers. However, it inaccurately represented its key finding —*see SEGM fact-check 1 at the end of the table.
- In practice, most of the recommendations were supported by primary studies, but the studies were not formally appraised for risk of bias (RoB) —*see SEGM fact-check 6 at the end of the table.
|
- Systematic evidence reviews:
|
- Consideration of other guidelines and recommendations:
- The review relied on WPATH Standard of Care 8 (SOC8) after 2017, asserting that SOC8 meets the German S3-level evidence-based guidelines (highest possible level). This is not accurate —*see SEGM fact-check 2 at the end of the table.
- The review also references the Endocrine Society 2017 (ES2017) guidelines, asserting that ES2017 meet the German S3-level evidence-based guidelines (highest possible level). This is questionable —*see SEGM fact-check 3 at the end of the table.
- The guideline also references the Cass Review and implies that it would concur with the German recommendation. This is inaccurate — *See SEGM fact-check 4 at the end of the table.
|
- Consideration of other guidelines and recommendations:
- Analyzed 23 guidelines and clinical recommendations in two systematic reviews. Conducted two systematic reviews. Concluded, "Two international guidelines (World Professional Association for Transgender Health and Endocrine Society) formed the basis for most other guidance, influencing their development and recommendations."
- For WPATH Standards of Care 8 (SOC8) and the Endocrine Society 2017 (ES2017) concluded, "Most clinical guidance lacks an evidence-based approach and provides limited information about how recommendations were developed. The WPATH and Endocrine Society international guidelines, which like other guidance lack developmental rigour and transparency have, until recently, dominated the development of other guidelines. Healthcare professionals should consider the lack of quality and independence of available guidance when utilising this for practice."
|
SEGM fact-check / Notes |
- NICE Systematic Evidence Review. The German guideline does not accurately describe the conclusion of the referenced NICE review. Rather than concluding that that there are psychological benefits of puberty blockade, the review concluded just the opposite—“little change” — noting that the small reported changes are likely the result of poor study designs:
“The results… suggest little change with GnRH analogues [PBs] from baseline to follow-up Studies that found differences in outcomes could represent changes that are either of questionable clinical value, or the studies themselves are not reliable and changes could be due to confounding, bias or chance.” (“Evidence review: Gonadotrophin releasing hormone analogues for children and adolescents with gender dysphoria”, 2020, p. 13)
- WPATH SOC8 Guidelines. Contrary to the assertion in the German guideline, WPATH SOC8 is not an evidence-based guideline for adolescents. The Adolescent section of SOC8 explicitly states that it was based on a narrative, rather than systematic, review. A recent systematic review commissioned by the Cass Review assessed SOC8 using AGREE II (a tool widely used by AWMF, which oversaw the guideline) and concluded that these guidelines "lack developmental rigor and transparency.” The guideline attained 35 out of 100 possible points on the "methodological rigor" domain, and 24 out of 100 possible points in the "applicability" domain. The German process also appraised SOC8 using AGREE II and gave it somewhat higher ratings (55/100 for methodology and 28/100 for applicability).
If properly appraised, WPATH SOC8 is unlikely to meet S3 guideline (highest possible level) due to failure to meet several of the current S3 requirements (e.g., DOLBI item 8 "systematic methods were used in the search for evidence"; item 9 "the criteria for the selection for the evidence are clearly described"; item 12 "the link between the recommendations and the underlying evidence is presented"; and a number of other criteria). A recent BMJ article concurred that SOC8 cannot be considered an evidence-based guideline.
- The Endocrine Society 2017 guidelines (ES2017). Contrary to the assertion in the German guideline, the Endocrine Society 2017 (ES2017) guideline cannot be considered an evidence-based guideline for puberty blockade because it did not use a systematic review of evidence for pubertal suppression. S3 guidelines require that systematic methods were used to search for evidence (AGREE criterion 8).
- Neither of the two ES2017 commissioned systematic reviews of evidence focused on adolescents or pubertal suppression.
- The first systematic review (Maraka et al., 2017) analyzed the effects of cross-sex hormones on cardiometabolic outcomes of adults and is not applicable to the population of youth or the question of pubertal suppression.
- The second systematic review (Sing-Ospina et al., 2017) analyzed the effects of gender-affirming endocrine interventions on bone health, but of the total of 13 studies, only 1 study dealt with pubertal suppression in youth (the rest of the studies were for cross-sex hormone use by adults). The one study (Klink et al., 2015) concluded that there adverse effects of pubertal suppression on bone health of youth that were not attenuated even after initiation of cross-sex hormones. However, because the other 12 studies concerned mature adults’ use of cross-sex hormones and found no adverse effects on bone, the review concluded no adverse effects on bone, which the ES2017 recommendation for puberty suppression for youth ultimately relied upon.
- Of note, no systematic evidence reviews of psychological effects of pubertal suppression on youth were conducted despite the effect that psychological benefits being the primary indication for puberty blockade in gender-dysphoric youth.
- A recent systematic review commissioned by the Cass Review assessed ES2017 using AGREE II (a tool widely used by AWMF, which oversaw the guideline) and concluded that these guidelines, like WPATH, "lack developmental rigor and transparency.” The guideline attained 44 out of 100 possible points in the "methodological rigor" domain, and 22 out of 100 possible points in the "applicability" domain. The German process also appraised ES2017 using AGREE II and gave it similar ratings (40/100 for methodology and 22/100 for applicability).
- If properly appraised, ES2017 recommendations for adolescents are unlikely to meet the S3 level requirements (highest possible level) due to failure to meet the S3 level as the key requirement 8: "systematic methods were used in the search for evidence." This requirement was demonstrably not met for the population of adolescents, as no specific literature search of the literature about benefits or harms for adolescents was conducted. A recent BMJ article concurred that ES2017 recommendations cannot be considered evidence-based.
- Cass Review. The German guideline appears to suggest that the Cass Report supports the notion that puberty blockers should be prescribed based on consensus since the evidence is of very low certainty: “The clinical recommendations derived from [UK NICE systematic review] have so far not included any proven clinical experience expertise, although this is explicitly formulated as a requirement in the Cass Review. There it is stated that as long as the evidence is uncertain, the broadest possible consensus of clinical experts should be sought as a basis for preliminary treatment recommendations]. The Cass Report did not support the use of puberty blockers based on “consensus,” and in fact strongly recommended decommissioning their use for gender dysphoria—the recommendation that NHS England recently accepted.
- Patient values and preferences research into how patients and caregivers trade off benefits and risks of pubertal suppression in the short and long term has not been conducted. High-quality values and preferences research should be established on the basis of understanding the benefits, harms, and other desirable and undesirable consequences of different alternative interventions. Lack of evidence on the benefits and harms of interventions is one major hurdle for understanding values and preferences. Another concern is the cognitive development of children and adolescents, and their ability to appreciate benefits and harms of aspects of their lives that often do not typically come into consideration until later in life (e.g., desire for children, sexual function). The lack of quality research into this complex area makes the argument of teen "preference" for immediate physical changes over avoiding long-term harms deeply problematic.
- Interpretation of individual study findings. Instead of relying on systematically appraised body of evidence, the review relies on findings from individual non-systematically searched studies. It does not appear that a structured appraisal of individual study appraised for risk of bias (RoB) has been conducted. The discussion of study findings show a preference toward a positive interpretation of uncertain outcomes of youth transitions. For example:
- The guidelines reference the Dutch research (de Vries et al., 2014) as evidence of no/low regret of youth transitions, by stating “Of the 55 people reported, no case of regret and/or detransition was reported.” It fails to mention the very short follow-up (average 2 years after surgery). It also does not mention several adverse affects among the original cohort of 70 which became reclassified as “non-completers,” including 1 transition-associated death, 3 instances of patients developing severe diabetes and obesity, and at least one apparent “stopped treatment” which could signal detransition.
- The guidelines' analysis of several US studies (Tordoff et al., 2022; Turban et al., 2020) fails to critically appraise the studies for methodological flaws, and does not discuss notable studies that contradict the conclusions of benefit of pubertal suppression (e.g., Carmichael et al., 2021, McPherson & Freedman, 2023).
- This preference for positive interpretation could be mitigated by systematic search and synthesis of the relevant evidence for each outcome in question, and appraisal of quality of evidence using a widely accepted tool such as the GRADE system. The lack of transparent evidence synthesis and evidence appraisal is concerning.
|
Table A3 – Cross-Sex Hormones (CSH)Show table..Hide table..
German 2024 Guideline |
Cass Report, 2024 |
Recommendations |
- Cross-sex hormone treatment is indicated for any adolescent with the ICD-11 diagnosis of gender incongruence (GI) who experiences a "long-term desire" for the "physical changes expected as a result of hormone treatment." No minimum age is specified.
- The DSM-5 "distress" criterion no longer applies, and distress appears to be understood not as impairment in functioning, but as the "desire to develop the gender-specific physical changes" not associated with natal puberty.
- No requirement of childhood-onset of gender incongruence: While “long-standing” gender incongruence is required, the minimum duration is deliberately not specified. The timing of the onset (pre-pubertal vs post-pubertal) appears unimportant; it is sufficient that the "distress developed or intensified after the onset of puberty."
- A mental health assessment is required to establish "stable/persistent gender incongruence." However, the guideline acknowledges a lack of "empirically validated individual criteria for the determination of a permanent Stability/persistence of gender incongruence or gender dysphoria."
- Ongoing psychotherapy not required. The decision of psychotherapy should be made on a case-by-case basis and prioritized with the patient.Treatment of other mental disorders is recommended but should not interfere with the body-modifying treatment.
- Parental involvement and co-consent is recommended. If the child desires cross-sex hormones but parents do not agree, and if counseling cannot reconcile the disagreement, child protective services and court systems may be called on to protect “best interests of the child.”
|
- Cross-sex hormone treatment for those diagnosed with Gender Dysphoria is currently available but with a new qualification of "extreme caution." The minimum age is 16.
- Psychotherapy and psychosocial support should be the first line of treatment. There should be a clear clinical rationale for providing hormones for minors rather than waiting until an individual reaches 18.
- A new centralized team not directly involved in care of the young person would need to approve the medical necessity.
- NHS England will use the Cass recommendations to develop a policy on masculinizing/feminizing hormones for those aged 16 and older.
|
Rationale |
- Benefits:
- Claims systematic reviews show “favorable outcome of the measured parameters”
“The systematic reviews of the British NICE (National Institute for Health and Care Excellence (NICE), …on the state of the evidence on puberty-blocking and gender reassignment hormone treatment.. in adolescence state that the studies available at the time of the reviews point overall in the direction of a favorable outcome after medical interventions for the mental health of adolescents with gender dysphoria."
SEGM fact-check note: This is an inaccurate representation of the findings of the NICE review. See SEGM fact-check 1 at the end of the table.
- Quotes individual studies to assert there is evidence of benefit for the “overall package" of treatments (rather than evidence for benefits of cross-sex hormones) quoting individual studies:
“The reported data from previous non-controlled clinical cohort studies on hormonal interventions in adolescents with diagnosed gender incongruence or gender dysphoria provide consistent evidence for a favorable outcome of the measured parameters for mental health and life satisfaction if gender reassignment hormone treatment was at least part of the treatment.”
SEGM fact-check note: It is not appropriate to analyze results of individual studies. Instead, conclusions must be drawn from the entire body of evidence which was systematically searched and appraised for quality at each outcome level. See SEGM fact-check 2 at the end of the table.
|
- Benefits:
- Concludes there is no trustworthy evidence of psychological benefits of cross-sex hormones treatments quoting the 2024 York systematic review:
- “There is a lack of high-quality research assessing the outcomes of hormone interventions in adolescents experiencing gender dysphoria/incongruence, and few studies that undertake long-term follow-up. No conclusions can be drawn about the effect on gender-related outcomes, body satisfaction, psychosocial health, cognitive development or fertility. Uncertainty remains about the outcomes for height/growth, cardiometabolic and bone health."
- The evidence also did not support the notion that hormone treatment decreases risk of death by suicide.
- Notes about the small effect sizes and the possibility that the improvements may be short-lived:
"When a young person has been on puberty blockers, a short-term boost in mental wellbeing is to be expected when sex hormones are introduced....The start of long anticipated physical changes would be expected to improve mood, at least in the short term, and it is perhaps surprising that there is not a greater effect size. However, much longer term follow-up is needed to understand the full psychological impact of medical transition."
|
- Harms:
- Physical harms: The recommendations for cross-sex hormone treatment are not accompanied with the evidence on the probability and severity of potential harms although a number of potential harms are mentioned (e.g., increased BMI, decreased HDL, increased risk of thrombosis, etc.). The harm to fertility is also recognized.
|
- Harms:
- Physical harms: Systematic reviews evaluated a range of physical health outcomes. It found only one high-quality study that examined side effects. Inconsistent results were observed for height/growth, bone health and cardiometabolic effects. There was insufficient evidence to assess impact on fertility (no study assessed fertility in birth-registered females). Most studies included adolescents who received puberty suppression, making it difficult to determine the effects of hormones alone.
|
- Psychological harms from overtreatment:
- Diagnostic reliability: Acknowledges a lack of predictive validity of the ICD-11 of gender incongruence. diagnosis (and a lack of validated criteria to predict persistence). However, it is not viewed as a risk for overtreatment.
- Gay youth: Acknowledges sexual orientation be related to persistent gender incongruence in some adolescents, but recommends cross-sex hormone treatment regardless of sexual orientation
- Autistic youth: Although the guidelines recognize a high rate of co-occurrence between autistic diagnosis and gender incongruence, there is no apparent concern over potential overdiagnosis overtreatment of autistic youth.
|
- Psychological harms from overtreatment:
- Diagnostic reliability: Recognizes that the diagnoses of "gender dysphoria" (DSM-5) or "gender incongruence" (IC-11) lack predictive validity. It is unknown whether that young person will have longstanding gender incongruence in the future, or whether medical intervention will be the best option for them.
- Gay and autistic youth who frequently exhibit gender non-conforming behaviors and are susceptible to developing GD/GI. There is a concern with inappropriately treating such youth with gender transition.
|
- Youth with fluid/evolving identities: Though the guidelines recognize the rise of a non-binary identity and a lack of understanding how such identities may develop, the guidelines still recommend cross-sex hormone treatment for children reporting non-binary gender identity.
|
- Youth with fluid/evolving identities: Recognizes that nonbinary identities are on the rise, and that identity in youth is still developing, which raises questions about medical interventions.
|
- Detransition is recognized as a phenomenon but assumed to be rare and not a signal of overtreatment/harm.
|
- Detransition: The percentage of people treated with hormones who subsequently detransition remains unknown due to the lack of long-term follow-up studies, although there is suggestion that numbers are increasing.
|
- Issues related to consent: Recognizes that the irreversible nature of many hormone-induced changes (including risk to fertility and sexual function) make it imperative that the young person is capable of consent. In the event that a minor cannot consent, legal guardians should not be allowed to consent on the minor's behalf. Instead, the minor's own capacity to consent should be developed.
|
- Issues related to consent: Recognizes a key barrier to informed decision making, since poor evidence basis makes it challenging to provide adequate information on which a young person and their family can make an informed choice.
|
- There are no accepted alternatives to hormone treatment: “there is a lack of a justifiable evidence-based alternative treatment option in the sense of a previously established and proven treatment.”
- Because of the adolescents have associated desire to develop “the gender-specific physical changes expected as a result of hormone treatment” over avoiding uncertain long-term harms, cross-sex hormone treatment is justified
|
- Urges the development of evidence base for alternative treatment approaches: “An explicit clinical pathway must be developed for non-medical interventions, as well as a research strategy for evaluating their effectiveness.
|
- Harm-benefit ratio: While the evidence is recognized as uncertain, the benefits assumed to outweigh harms, while the principles of self-determination of minors should guide the decision to treat with cross-sex hormones (as long as minors are deemed capable of consent).
|
- Harm-benefit ratio: There is insufficient and/or inconsistent evidence about the risks and benefits of hormone interventions in this population.
|
Supporting Evidence |
- Systematic evidence reviews:
- The guideline methods "Leitlinienreport" report referenced a systematic evidence search which included search terms specific to the topic of hormone treatment. The systematic search was stopped in August 2017
- The guideline discussed a number of studies but did not formally appraise them for risk of bias (RoB)
- The guideline references to the evidence of “whole package”, which included psychotherapy and hormone treatment, rather than evidence on hormone treatment
|
- Systematic evidence reviews:
- The Cass Review commissioned a systematic evidence review for cross-sex hormones, systematically searching for studies through April 2022
- The systematic review assessed 3,181 studies for eligibility, yielding 53 eligible studies, which were subsequently assessed for risk of bias (RoB) using modified Newcastle-Ottawa Scale (NOS).
|
- Consideration of other guidelines and reviews:
- The review relied on WPATH Standard of Care 8 (SOC8) after 2017, asserting that SOC8 meets the German S3-level evidence-based guidelines (highest possible level).
- The review also references Endocrine Society 2017 (ES2017) guidelines, asserting their equivalence to German S3-level evidence-based guidelines (highest possible level)
- The guideline also references Cass Review, with the (incorrect) implication that Cass Review would concur with the German recommendation.
* See SEGM fact-check notes 3-5 at the end of the table
|
- Consideration of other guidelines and recommendations:
- Analyzed 23 guidelines and clinical recommendations in two systematic reviews. Conducted two systematic reviews. Concluded, "Two international guidelines (World Professional Association for Transgender Health and Endocrine Society) formed the basis for most other guidance, influencing their development and recommendations."
- For WPATH Standards of Care 8 (SOC8) and the Endocrine Society 2017 (ES2017) concluded, "Most clinical guidance lacks an evidence-based approach and provides limited information about how recommendations were developed. The WPATH and Endocrine Society international guidelines, which like other guidance lack developmental rigour and transparency have, until recently, dominated the development of other guidelines. Healthcare professionals should consider the lack of quality and independence of available guidance when utilising this for practice."
|
SEGM fact-check / Notes |
- NICE systematic evidence review. The German guideline does not accurately describe the conclusion of the referenced the 2020 NICE review for cross-sex hormones. Rather than concluding " favorable outcomes" the 2020 NICE cross-sex hormones review noted:
“The key limitation to identifying the effectiveness and safety of gender-affirming hormones for children and adolescents with gender dysphoria is the lack of reliable comparative studies. All the studies included in the evidence review are uncontrolled observational studies, which are subject to bias and confounding and were of very low certainty using modified GRADE. A fundamental limitation of all the uncontrolled studies included in this review is that any changes in scores from baseline to follow-up could be attributed to a regression-to-the mean...Any potential benefits of gender-affirming hormones must be weighed against the largely unknown long-term safety profile of these treatments in children and adolescents with gender dysphoria. "
- Using results from individual studies: The recommendations on cross-sex hormone treatment should be informed by systematically appraising the entire body of evidence regarding each specific outcome. Further, discussing individual findings from studies, without assessing the study for risk of bias (RoB) is not appropraite, as any given study's result may not be trustworthy.
- WPATH SOC8 Guidelines. Contrary to the assertion in the German guideline, WPATH SOC8 not an evidence-based guideline for adolescents. The Adolescent section of SOC8 explicitly states that it was based on a narrative, rather than systematic, review. A recent systematic review commissioned by the Cass Review assessed SOC8 using AGREE II (a tool widely used by AWMF, which oversaw the guideline) and concluded that these guidelines "lack developmental rigor and transparency.” The guideline attained 35 out of 100 possible points on the "methodological rigor" domain, and 24 out of 100 possible points in the "applicability" domain. The German process also appraised SOC8 using AGREE II and gave it somewhat higher ratings (55/100 for methodology and 28/100 for applicability).
If properly appraised, WPATH SOC8 is unlikely to meet S3 guideline (highest possible level) due to failure to meet several of the current S3 requirements (e.g., DOLBI item 8 "systematic methods were used in the search for evidence; item 9 "the criteria for the selection for the evidence are clearly described"; item 12 "the link between the recommendations and the underlying evidence is presented"; and a number of other criteria). A recent BMJ article concurred that SOC8 cannot be considered an evidence-based guideline.
- The Endocrine Society 2017 guidelines (ES2017). Contrary to the assertion in the German guideline, the Endocrine Society 2017 (ES2017) guideline cannot be considered an evidence-based guideline for puberty blockade because it did not use a systematic review of evidence for pubertal suppression. S3 guidelines require that systematic methods were used to search for evidence (AGREE criterion 8).
- Neither of the two ES2017 commissioned systematic reviews of evidence focused on adolescents.
- The first systematic review (Maraka et al., 2017) analyzed the effects of cross-sex hormones on cardiometabolic outcomes of adults and is not applicable to the population of youth.
- The second systematic review (Sing-Ospina et al., 2017) analyzed the effects of gender-affirming endocrine interventions on bone health, but of the total of 13 studies, only 1 study dealt with cross-sex hormones in youth (the rest of the studies were for cross-sex hormone use by adults). The one study (Klink et al., 2015) concluded that there adverse effects of pubertal suppression on bone health of youth that were not attenuated even after initiation of cross-sex hormones. However, because the other 12 studies concerned mature adults’ use of cross-sex hormones and found no adverse effects on bone, the review concluded no adverse effects on bone, which the ES2017 recommendations for endocrine interventions for youth ultimately relied upon.
- A recent systematic review commissioned by the Cass Review assessed ES2017 using AGREE II (a tool widely used by AWMF, which oversaw the guideline) and concluded that these guidelines, like WPATH, "lack developmental rigor and transparency.” The guideline attained 44 out of 100 possible points in the "methodological rigor" domain, and 22 out of 100 possible points in the "applicability" domain. The German process also appraised ES2017 using AGREE II and gave it similar ratings (40/100 for methodology and 22/100 for applicability).
- If properly appraised, ES2017 recommendations for adolescents are unlikely to meet the S3 level requirements (highest possible level) due to failure to meet the S3 level as the key requirement 8: "systematic methods were used in the search for evidence." This requirement was demonstrably not met for the population of adolescents, as no specific literature search of the literature about benefits or harms for adolescents was conducted. A recent BMJ article concurred that ES2017 recommendations cannot be considered evidence-based.
5. The Cass review revealed that most included studies on cross-sex hormone included adolescents who received puberty suppression, making it difficult to determine the effects of hormones alone. It is not appropriate to draw conclusions about cross-sex hormone treatment based on the “whole package” of treatment, which included various steps of medical transition, and was confounded by psychological interventions.
|
2. Methodological Issues
High-quality guidelines share the following characteristics: the recommendations are clear and actionable; the evidence is summarized using rigorous systematic review methods; the guideline panel considers all outcomes important to patients; and the guideline panel makes appropriate judgments in the interpretation of the evidence and the final recommendation. Having assessed the final draft of the guidelines "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment" for methodological rigor, we conclude that it does not meet the standard for a credible evidence-based guideline.
While the guidelines acknowledge that their recommendations are not evidence-based, they wrongly state that evidence-based recommendations were not possible due to the low level of evidence. In explaining the reason for the downgrading of the guideline from its originally-intended S3 evidence-based guidelines to the current consensus-based S2k level, the guidelines provided this justification:
"Due to a lack of controlled evidence of efficacy and an overall weak evidence base with regard to uncontrolled evidence of efficacy from case-cohort studies, no evidence-based recommendations were made in this guideline for the treatment of GI or GD; instead, all recommendations were developed on the basis of consensus."
However, as is widely known, evidence-based guidelines can be created regardless of the level or strength of the available evidence. As long as a rigorous process for guideline development is followed, an evidence-based guideline can be produced even in the context of extremely limited, low-quality evidence. Below is a summary of the key methodological shortcomings of the study. This summary is followed by a detailed assessment that suggests that not only did the guidelines fail to meet the S3 criteria, but even the lower-level S2k criteria may not have been met.
- Systematic search for evidence stopped between 2017–2019. The decision by the guidelines committee to stop the systematic evidence search after 2019 (and for some topics, as early as 2017) led to a failure to systematically assess as much as 50% + of the relevant literature, depending on the topic (e.g., see Table 1A, "SEGM fact-check / Notes). The cessation of the systematic search so early in the guideline development renders the recommendations not evidence-based; it omits a large body of recent literature, which is most applicable to the current populations of youth presenting with gender dysphoria/gender incongruence.
- The search for the evidence was conducted without clearly defined criteria. Even during the early timeframe when a systematic search was conducted, the approach to defining search criteria was inadequate. The study inclusion criteria were overly broad and vague (e.g., were articulated at a high level instead of stating it separately for each intervention; did not specify target outcomes; did not list comparator groups; did not specify study designs, etc). This makes the guideline susceptible to concerns of bias over which studies were allowed to influence the recommendations.
- The evidence was not critically appraised at the study level and not rated for certainty overall. While the guideline acknowledged that the overall quality of evidence was poor, the appraisal fell far short of what the AWMF "DELBI" standards for guideline appraisal consider adequate. Individual studies were not appraised for risk of bias (RoB), and the overall body of evidence was not appraised for quality/certainty using tools such as GRADE. Notably, the guidelines also misrepresented the findings of the NICE systematic review for puberty blockers, wrongly suggesting that it concluded that puberty suppression was beneficial to young gender-dysphoric people.
- The recommendations were not graded for strength. The hallmark of an evidence-based guideline is the grading of the guideline recommendations for strength. This tells guideline users how to interpret any given recommendation. The recommendations can be both "for" and "against" certain treatments, and they can be graded "strong" and "conditional." If a recommendation is graded as "strong," this indicates that almost all of the patients would benefit from (or would be harmed by) the intervention, while a recommendation graded as "conditional" suggests that the majority, but not all, of the patients would benefit (or would be harmed). None of the 70+ recommendations in the guideline are graded for strength using either a formal (e.g. GRADE) or a less formal method. Instead, only the strength of "consensus" is provided. However, according to the AWMF Guidance Manual and Rules for Guideline Development, the strength of consensus is provided in addition to the grading/rating of the strength of recommendation — not in place of it.
- There was no explicit link between the recommendations and the evidence base. None of the over 70 topic-specific recommendations, including the specific recommendations regarding psychotherapy, social transition, puberty blockers, cross-sex hormones, and surgery are linked to a body of evidence that is graded for certainty. Instead, the guidelines make specific treatment recommendations justifying them by findings from individual studies (which were not assessed for risk of bias and frequently presented highly biased findings as a trustworthy basis for recommendations).
- Failure to properly engage stakeholders with a range of views representative of the relevant clinician and patient communities. According to the AWMF "DELBI" standards for guideline appraisal, the guideline development effort should seek the engagement of professionals who will be tasked with implementing the recommendations, and patient/citizen groups whose care will be affected by the recommendations. While the guidelines did include professionals from 27 organizations, it appears that a diversity of views was lacking.
- The failure to ensure intellectual diversity and manage disagreement was suggested by the fact that one of the guideline steering committee members quit the effort. The guideline methods "Leitlinienreport" report reveals that "Prof. Dr. med. Florian Daniel Zepf left the steering group at his own request after two years on the steering group due to his stated professional ethical concerns and 'concerns regarding aspects of child and youth protection'. At no time was he entitled to vote in consensus conferences."
- The failure to properly engage professionals with the dissenting opinions became apparent when during the comment period, 15 Chairs and senior members of the Child and Adolescent Psychiatry Association submitted a 100+ page dissenting opinion.
- The failure to ensure a broad representation of opinions held by German clinicians was also evident when the German Medical Assembly, which represents 250 delegates representing 17 German medical associations, passed resolution Ic-48 calling for a markedly different approach to treating gender-dysphoric youth than the one outlined in the draft guidelines. The resolution asked to restrict all gender-transitioning treatments for youth to clinical trials.
- The diversity of patient perspectives was also not represented, as evidenced by the dissenting opinion published by several parent stakeholder groups shortly after the draft guideline was completed.
- Failure to manage conflicts of interest. The guideline methods "Leitlinienreport" report states, "no conflicts of interest were found which have been considered problematic with regard to the involvement of the members of the guideline commission in the consensus process." However, a detailed analysis of the guideline development group's composition, complied by the parent groups, noted what appeared to be significant intellectual and potential financial conflicts of interest for a number of the guideline development group members, and their lack of independence from powerful interest groups that promote a medicalized approach to treating minors.
Our detailed methodological assessment of the German guidelines relative to the AWMF DELBI standard is presented in Table B below. Note: The DELBI instrument comprises 34 items across 8 domains, 7 of which are drawn from the internationally recognized AGREE II tool, with the eighth domain relating to "applicability in the German healthcare system." A full assessment was out of scope, so instead we focused on the subset of the criteria which the AWMF lists as specifically differentiating between the level of guidelines (S1-S3).
Table B - Assessment of "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment" relative to the DELBI standards outlined by AWMF.
Show table..Hide table..
AWMF Language (translation) |
Assessment |
DOLBI DOMAIN 2 — "Stakeholder participation" |
The guideline group is representative of the target group and representatives of the professional society(s) and/or organization(s) to be involved, including patients/citizens, are involved in the guideline development at an early stage (see AGREE ll criteria 4 + 5)
AGREE II Criterion 4: The guideline development group includes individuals from all the relevant professional groups.
AGREE II Criterion 5:
The views and preferences of the target population (patients, public, etc.) have been sought.
[ist die Leitliniengruppe repräsentativ für den Adressat*innenkreis und sind Vertretende der entsprechend zu beteiligenden Fachgesellschaft(en) und/oder Organisation(en) inkl. der Patient*innen /Bürger*innen in die Leitlinienentwicklung frühzeitig eingebunden (s. AGREE ll Kriterium 4 + 5)]
|
- Adherence to this rule is mandatory in order to classify a guideline as S2k or S3.
- The guideline did not meet the requirement of appropriately representing professional and patient/target population perspectives, as demonstrated by the facts below:
Per the AWMF DELBI guidance the guideline development group must be representative of groups of professionals who will implement the guideline, and of the patient populations for whom the guideline is intended. While the professional representation was broad (26 medical societies), the diversity of professional views was lacking.
The lack of appropriate representation of the views held by the professional community suggested by a resignation by one steering group member due to the "professional ethical concerns" and "concerns regarding aspects of child and youth protection" (see the guideline methods "Leitlinienreport" report, p. 3) and further evidenced by the significant professional disagreement that occurred after the draft was published.
During the comment period, 15 Chairs and senior members of the Child and Adolescent Psychiatry Association published a dissenting 100+ page opinion. Shortly after the comment period ended, the majority of the delegates at the German Medical Assembly (which represents 250 delegates representing 17 medical associations) passed resolution Ic-48, which called for a markedly different approach to treating gender-dysphoric youth than the one endorsed by the guideline development group.
Likewise, patient and public's views and preferences have not been adequately sought. The patient representative excluded the voices of parents of minors concerned about the medicalized "gender-affirmative" approach (see below).
|
The perspective and preferences of patients/citizens are determined (see AGREE ll criterion 5)
AGREE II Criterion 5: The patients’ views and preferences have been sought.
[werden die Sichtweise und die Präferenzen der Patient*innen/Bürger*innen ermittelt (s. AGREE ll Kriterium 5)]
|
- Adherence to this rule is mandatory in order to classify a guideline as S2k or S3.
- The guideline did not meet the requirement of appropriately representing patient perspectives, as demonstrated by the facts below:
Per the AWMF DELBI guidance, "the application of a guideline may be impeded if the patients’ preferences or needs have not or not adequately been considered. Patients / relatives should thus be involved in the guideline development process."
The guideline committee did include 2 patient groups. However, the diversity of patient perspectives was not represented. This is evidenced by the subsequent dissenting opinion by several parent stakeholder groups. Since the guidelines concern the health of minors, parents represent a key stakeholder voices as they serve as their children's healthcare proxies, represent patient interests.
This exclusion may be explained by the fact that the recruitment of patient groups took place in 2017, before the sharp rise of trans-identification in youth reached its peak, and before such parent groups were organized. However, failure to include the diversity of patient views compromises the safeguarding of children's best interests, and threatens the acceptability of this guideline.
|
DOLBI DOMAIN 3 — "Methodological accuracy of guideline development" |
Systematic research, selection and evaluation of scientific evidence on the relevant clinical questions is required
[ist eine systematische Recherche, Auswahl und Bewertung wissenschaftlicher Belege (Evidenz) zu den relevanten klinischen Fragestellungen erforderlich]
|
- Adherence to this rule is mandatory for S3, but not S2k classification.
- The guideline did not meet the S3 requirement of systematically searching and evaluating the evidence on relevant clinical questions. See below.
Per the 2023 AWMF Guidance Manual and Rules for Guideline Development, "A recommendation is “evidence-based” when the underlying evidence for the key question to be answered by the pertinent/applicable recommendation was systematically researched, the patients were included in the study based on prospectively established Inclusion and exclusion criteria, a critical appraisal of the power and reliability of the studies was undertaken, and a level of evidence or quality was assigned. This also includes documentation of the steps for research, critical appraisal, assigning a level of evidence or quality. The contents of the study should either be presented in the background text or as supplements in evidence tables. As a result of formal evidence-basing, the quality of the published evidence on a key question becomes transparent." (p. 40)*
As evidenced by the guideline methods "Leitlinienreport" report, while an attempt was made to systematically search and appraise the evidence, the rigor of the process fell significantly below the threshold outlined by these AWMF-required steps.
The study inclusion criteria were overly broad and the exclusion criteria were not listed. The systematic literature search stopped between 2017–mid 2020 depending on the topic, and only a non-systematic literature surveillance method was used afterwards. Individual studies were not consistently appraised for quality/ risk of bias. Finally, the assessment of the overall quality/certainty of the evidence relating to each key recommendation question was not conducted. This indicates that the guideline fell far short of the S3 standard in terms of methodological rigor.
*This text has been updated based on the 2023 updated language (the prior language was based on the 2012 AWMF Guidance).
|
Systematic methods are used to search for evidence, ie the search strategy is described in detail with a list of the search terms and sources used (such as electronic databases, databases for systematic reviews or guidelines, hand-searched specialist journals or congress proceedings); Period of literature search and number of hits (see AGREE ll criterion 7)
AGREE II Criterion 7: Systematic methods were used to search for evidence.
[werden zur Suche nach der Evidenz systematische Methoden angewandt, d.h. die Suchstrategie ist detailliert beschrieben mit der Auflistung der verwendeten Suchbegriffe und Quellen (wie elektronische Datenbanken, Datenbanken für systematische Übersichtsarbeiten oder für Leitlinie, von Hand durchsuchte Fachzeitschriften oder Kongressberichte), Zeitraum der Literatursuche und Trefferzahlen (s. AGREE ll Kriterium 7)]
|
- Adherence to this rule is mandatory for S3, but not S2k classification.
- The guidelines did not meet the S3 requirement of systematic search of the evidence, as demonstrated by the facts below:
This requirement provides further regarding the overall requirement for "systematic research, selection and evaluation of scientific evidence" above, with a specific focus on the methods used to search for the evidence. Per the AWMF DELBI guidance, due to the "vast number of publications on different clinical issues," it is critical to conduct a topic-specific systematic search for "as many resources as possible" in order to be able to generate recommendations during the following assessment procedure.
A systematic search requires a "systematic methodology," which includes listing search strategy in detail including "the applied search terms, sources (electronic databases, databases systematic reviews, hand searched journals, conference reports, other guidelines).
The guideline methods "Leitlinienreport" report reveals that while an attempt at a systematic search was made, it was suspended around 2017-2020, depending on the topic: "At least one further topic-specific systematic literature search was conducted for each working group. The timing of these searches was between August 2017 and April 2020." The report states that further systematic search was not feasible: "For publication years 2020 to 2022, the renewed effort of a new systematic literature search would not have been feasible” (see Subsection 3.3.2 of the “Topic-specific systematic literature research” [Themenspezifische systematische Literaturrecherche]).
This decision to stop the systematic research after 2017-2020 depending on the topic, strongly suggests that much of the relevant literature has not been systematically searched. Recent systematic reviews indicate that around 60% of the relevant literature in the field was published after 2019:
To compensate for a lack of a systematic search after 2017-2020, the guideline development group made the decision to informally surveil the literature, with significant reliance on the list of studies referenced by the WPATH Standards of Care 8. However, since WPATH itself did not utilize a systematic review of evidence for its adolescent recommendations, this does not constitute an adequate strategy. As the WPATH SOC8 Adolescent Section explicitly stated, it used a "short narrative review," rather than a systematic review, as the basis for its recommendations (Coleman et al., 2022, p. 46). As such, the German guideline literature search after 2019 inherited the biases of the non-systematic search of the WPATH SOC8 guideline.
The lack of clearly defined study selection criteria that transparently shows why certain studies were included vs excluded makes the literature search vulnerable to selection bias. For example, the non-systematic search for topic of "Social Gender Transition" identified and included the Olson et al. 2022 study, which concluded that social transition is beneficial. However, the topic specific search overlooked the Sievert et al. 2021 study, which concluded that "not social transition status, but peer relations and family functioning predict psychological functioning" of children. In contrast, both studies were part of the UK systematic review. This omission of the Sievert study findings regarding SGT is surprising since the data came from a German clinical sample of children with gender dysphoria; the topic was directly related to SGT; and the review team discussed the study's other findings (e.g., the finding of elevated levels of mental health disorders in GD youth) in other sections of the guidelines.
|
The selection criteria for the evidence are explicitly stated. Reasons for inclusion (target population, study design, comparators, endpoints, language, context) and for exclusion are explained (see AGREE ll criterion 8)
AGREE II Criterion 8: The criteria for selecting the evidence are clearly described.
werden die Auswahlkriterien für die Evidenz explizit dargelegt. Dabei werden Gründe für den Einschluss (Zielpopulation, Studiendesign, Vergleiche, Endpunkte, Sprache, Kontext) und für den Ausschluss dargelegt (s. AGREE ll Kriterium 8)
|
- Adherence to this rule is mandatory for S3, but not S2k classification.
- The guidelines did not meet the S3 requirement for clearly describing the criteria for selecting the evidence.
Per the AWMF DELBI guidance, "the criteria for selecting the scientific evidence constitute the basis of guideline recommendations." AWMF instructs to state which criteria are applied in selecting the evidence, with special importance on the exclusion criteria: "In particular, the reasons for excluding evidence must be clearly defined and stated."
The guideline methods "Leitlinienreport" report does list the study inclusion criteria but they are overly broad. It did not specify study designs, comparisons, or study end points (which specific outcomes are considered critical or important). The exclusion criteria are not listed. This creates the opportunity for subjective study selection (see the bias concerns regarding the inclusion of Olson et al. 2022 vs the exclusion of Sievert et al. 2021 study, described above).
Further, it is important to specify the study selection criteria at the specific guideline-question (or topic) level. Different research questions often lead to different study qualification requirements. For example, the recommendation question, "Which questions should be considered when diagnosing gender incongruence or gender dysphoria in adolescence with regard to the indication for body-modifying medical measures?" may have a lower bar for study qualification criteria as compared to the question "Can treatment with GnRH analogs for puberty suppression in adolescents with persistent gender incongruence/gender dysphoria be considered sufficiently safe with regard to known risks?"
|
The evidence researched and selected according to a priori established criteria is assessed with regard to its methodological quality and the results are presented in an evidence summary. This can be done in table form with comments on quality aspects or through the use of formal instruments or strategies (e.g. Cochrane Risk of Bias Tool, GRADE methodology) (see AGREE ll criteria 8 + 9)
AGREE II Criterion 8: The criteria for selecting the evidence are clearly described.
AGREE II Criterion 9: The strengths and limitations of the body of evidence are clearly described.
[wird die nach a priori festgelegten Kriterien recherchierte und ausgewählte Evidenz hinsichtlich ihrer methodischen Qualität kritisch bewertet und die Ergebnisse in einer Evidenz-Zusammenfassung dargelegt. Dies kann in Tabellenform mit Kommentaren zu Qualitätsaspekten oder durch die Anwendung von formalen Instrumenten oder Strategien (z.B. Cochrane Risk of Bias Tool, GRADE Methodik) erfolgen (s. AGREE ll Kriterium 8 + 9)]
|
- Adherence to this rule is mandatory for S3, but not S2k classification.
- The guidelines did not meet the S3 requirement for assessing the methodological quality of the results using a risk of bias (RoB) tool at the study level, or GRADE methodology at the overall evidence level.
At face value, the guidelines fell short of the standards of selecting the evidence and appraising it for certainty, as described in the AWMF DELBI guidance.
The issues with the AGREE II Criterion 8 (criteria for selecting the evidence ) are discussed in the section above. The issues with the AGREE II Criterion 9 (the strengths and the limitations of the body of evidence are clearly described) are outlined below:
Individual studies:
It does not appear that individual studies were assessed for risk of bias (RoB). Further, individual study findings are discussed as credible even when a RoB analysis would show the findings to be at a critical risk of bias (e.g., Tordoff et al., 2022).
Body of evidence:
It appears that some subsets of the body of evidence may have been assessed for certainty, however it was not done at the specific treatment recommendation level, and it is not clear which methodology was used to rate the quality of the evidence.
The guidelines section of "Indications for medical interventions" recommends treatment with puberty blockers, cross-sex hormones, and surgery for minors. This direction is supported by one evidence-based statement (and no evidence-based recommendations):
- Evidence-based statement: There is evidence from uncontrolled follow-up studies that patients with persistent gender dysphoria diagnosed in adolescence who receive stepped body-modifying treatment in the context of a socially supported transition show a long-term improvement in quality of life and mental health in adulthood.
- Evidence level: low (2 studies with different cohorts from the same center)
- References: (Cohen-Kettenis & van Goozen, 1997; de Vries et al., 2011, 2014).
This statement, which encompasses a wide range of treatments, is supported by a total of 3 cited studies (the latest dated 2014), which do not appear to have been critically appraised for risk of bias (RoB), with only a single-line explanation for the level of evidence overall.
|
The result of the assessment leads to a determination of confidence in the quality of the evidence (level of evidence)
[führt das Ergebnis der Bewertung zur Feststellung des Vertrauens in die Qualität der Evidenz (Evidenzgrad)]
|
- Adherence to this rule is mandatory for S3, but not S2k classification.
- The guidelines did not meet the S3 requirement for clearly describing the certainty of evidence for any of the recommendations (see above).
The guidelines mention that that the evidence is "low." However, the determination of the certainty of the evidence (also known as quality of the evidence of level of evidence) has not been done at the recommendation level. This departs from the practices of credible evidence-based guidelines.
None of the over 70 topic-specific recommendations spread across 8 key topics (including psychotherapy, social transition, puberty blockers, cross-sex hormones, surgery) are linked to a body of evidence that is graded for certainty. Instead, the guidelines make specific recommendations justifying them by findings from individual studies (which were not assessed for risk of bias).
In addition to over 70 recommendations, the guideline includes 8 "evidence-based statements" which do mention the certainty of the evidence. However, these "statements" are largely disconnected from the actual recommendations (see example in the section above).
|
The methods for formulating the recommendations are clearly described; formal consensus techniques are required, e.g. consensus conference, nominal group process or Delphi process (see AGREE ll criterion 10)
AGREE II 10 criterion: The methods for formulating the recommendations are clearly described.
[sind die Methoden zur Formulierung der Empfehlungen klar beschrieben, dazu sind formale Konsensustechniken erforderlich, z.B. Konsensuskonferenz, Nominaler Gruppenprozess oder Delphi-Verfahren (s. AGREE ll Kriterium 10)]
|
- Adherence to this rule is mandatory in order to classify a guideline as S2k or S3.
- It is unclear whether the guidelines met this requirement. The guideline clearly stated the method for formulating recommendations, but the method used contradicts the practice of evidence-based guideline development.
The guidelines state that the recommendations were consensus- rather than evidence-based based due to the poor quality of the evidence: "Due to a lack of controlled evidence of efficacy and an overall weak evidence base with regard to uncontrolled evidence of efficacy from case-cohort studies, no evidence-based recommendations were made in this guideline for the treatment of GI or GD; instead, all recommendations were developed on the basis of consensus." However, this explanation contradicts the principles of evidence-based guideline development, and is not a valid reason for not creating an evidence-based guidelines that adhere to a high methodological standard.
As outlined in JAMA, evidence-based guidelines can be created regardless of the level or strength of the available evidence. As long as a rigorous process for guideline development is followed, an evidence-based guideline can be produced even in the context of extremely limited, low-quality evidence.
|
The recommendations are comprehensibly linked to the description of the underlying evidence in a corresponding section (background text) and/or an evidence summary with a reference list (see AGREE ll criterion 12)
AGREE II 12 criterion: There is an explicit link between the recommendations and the supporting evidence.
[sind die Empfehlungen mit der Beschreibung der zugrunde liegenden Evidenz in einem entsprechenden Abschnitt (Hintergrundtext) und/oder einer Evidenzzusammenfassung mit Referenzliste nachvollziehbar verknüpft (s. AGREE ll Kriterium 12)]
|
- Adherence to this rule is mandatory for S3, but not S2k classification.
- The guidelines did not meet the S3 requirement of comprehensively linking the description of the underlying evidence to the recommendations.
The guidelines comprise over 70 recommendations across 8 topics. None of the "recommendations" provided an explicit link to the underlying evidence. While individual studies are discussed and often used as rationale for the recommendations, the body of evidence is not clearly defined, not assessed for certainty, and not linked to individual recommendations.
|
Each recommendation is discussed and voted on as part of a structured consensus-building process under neutral moderation, the goals of which are to solve still open decision-making problems as well as a final grading of the recommendations (S2k guidelines) or determination of the recommendation level (S3 guidelines) and the measurement of the strength of the consensus
[wird jede Empfehlung im Rahmen einer strukturierten Konsensfindung unter neutraler Moderation diskutiert und abgestimmt, deren Ziele die Lösung noch offener Entscheidungsprobleme sowie eine abschließende Graduierung der Empfehlungen (S2k-Leitlinie) bzw. Festlegung des Empfehlungsgrades (S3-Leitlinie) und die Messung der Konsensstärke sind]
|
- Adherence to this rule is mandatory in order to classify a guideline as S2k or S3.
- The criteria for the S3 level threshold have not been met.
- It is uncertain whether the S2k threshold has been met.
Criteria for the S3 level:
Per the 2023 AWMF Guidance Manual and Rules for Guideline Development (p. 57), S3 guidelines require assigning a "grade" to each recommendation.* This has not been done:
In the case of S3 guidelines, the formal consensus development process for adopting recommendations focuses on clinical aspects to judge the methodologically synthesised evidence. The recommendations are then discussed on this basis. Next, the strength of the recommendations is determined and a grade of recommendation assigned [emphasis added].
By additionally indicating the strength of consensus (percentage of agreement within the guideline development group) for each recommendation, the guideline users are given an impression of the extent to which all participants were in agreement. [emphasis added].
The guidelines omitted the step of assigning a grade to the recommendations. Thus, the S3 level threshold has not been met.
Criteria for the S2k level:
According to the various AWMF documentation sources, while the S2 guidelines do not require a formal grading of the evidence using structured tools such as GRADE, a less formal assessment of the quality of the recommendation is still required. In 2013 the AWMF made it clear that for S2k guidelines, the consensus rating should be used is in addition to, rather than instead of, providing a grading for the strength of the recommendation.
For consensus-based guidelines (S2k), the strength of recommendations is identified and adopted during the formal consensus process, although an indication of grades of recommendation (and levels of evidence) is not included because recommendations are not based on a systematic review of the evidence. Here, the strength of a recommendation is expressed in words only. Additionally, the strength of consensus (percentage of agreement within the guideline development group) can be indicated for each recommendation [emphasis added] (p. 42).
The updated 2023 AWMF Guidance Manual and Rules for Guideline Development language is similar but less specific:
For consensus-based guidelines (S2k), the strength of recommendations is identified and adopted during the formal consensus process. Nevertheless, it is not planned to state schematic grades of recommendation or levels of evidence because recommendations are not based on any systematic processing of the evidence. The grade of a recommendation is expressed in words.
It further specifies however that the consensus is used to arrive at the ranking of the recommendation, rather than substitute it:
AWMF Guideline Register Rule: Classification of S2 and S3 guidelines (excerpt): If it is an S2k or S3 guideline:
- The methods for formulating recommendations are clearly described. This requires formal consensus techniques (e.g. consensus conference, nominal group process or Delphi method (see AGREE II Criterion 10).
- Every recommendation is discussed and voted on as part of a structured consensus development with a neutral moderator. The objectives are to find a solution to pending decision-making issues, to establish a final ranking of the recommendations (S2k guideline) and determine the grade of recommendation (S3 guideline), and to measure the strength of consensus [emphasis added] (p. 58)
The requirement that "the strength of the recommendation is expressed in words" also appears in the "aids and appendices" section for S2k.
There does not appear to be any clear indication of the strength of the recommendations (even informal) in the guideline text. However, the requirement that the "strength of the recommendation is expressed in words" is vague and therefore it is unclear if the S2 criterion has been met.
*This text has been updated based on the 2023 AWMF Guidance Manual and Rules for Guideline Development (the prior analysis was based only on the 2013 AWMF Guidance Manual).
|
Levels of evidence and/or recommendation are specified for each recommendation in the finished guideline
[werden in der fertigen Leitlinie zu jeder Empfehlung Evidenz- und/oder Empfehlungsgrade angegeben]
|
- Adherence to this rule is mandatory for S3, but not S2k classification.
- The guidelines did not meet the S3 requirement of assessing the level of evidence or strength of recommendation at the recommendation level.
As discussed above, the guidelines did not provide any rating of the evidence or the recommendations at the recommendation level for any of the 70+ recommendations.
|
The guideline contains a description of the methodological approach (guideline report).
Note: Recommendations from S2k guidelines do not contain a schematic indication of levels of evidence and recommendations, as the evidence is not systematically processed.
[ist der Leitlinie eine Beschreibung zum methodischen Vorgehen (Leitlinien-Report) hinterlegt (Hinweis: Empfehlungen aus S2k Leitlinien enthalten keine schematische Angabe von Evidenz- und Empfehlungsgraden, da keine systematische Aufbereitung der Evidenz zugrunde liegt.)]
|
- Adherence to this rule is mandatory in order to classify a guideline as S2k or S3.
- The guidelines met this requirement by providing a guideline methodology report.
The guidelines are accompanied by the guideline methods "Leitlinienreport" report. However, the methods described by the report fall short of methodological rigor required in creating trustworthy guidelines, as described above. In addition, the report makes a highly inaccurate assertion that an evidence-based guideline is not possible if the evidence is of low certainty. As stated above, evidence-based recommendations are possible even when the certainty of evidence is very low.
|
Information about the period of validity and the update of the guideline is available (see AGREE ll criterion 14) and a contact person for the update is named. The planned update periods are specified for “Living Guidelines”, which are a maximum of 12 months
AGREE 14 criterion: A procedure for updating the guideline is provided.
[sind Angaben zum Gültigkeitszeitraum und zur Aktualisierung der Leitlinie vorhanden (s. AGREE ll Kriterium 14) und ist ein/e Ansprechpartner*in für die Aktualisierung genannt. Für "Living Guidelines" sind die geplanten Aktualisierungszeiträume benannt, diese betragen höchstens 12 Monate]
|
- Adherence to this rule is mandatory in order to classify a guideline as S2k or S3.
- This criterion to state the timing of the update has been met but is inadequate.
The guidelines indicate that a regular update will take place in 5 years. However, given that the systematic evidence search stopped in 2017–2020, the guidelines are already 3-5 years behind. In such a rapidly developing field undergoing major changes (e.g., the rise of nonbinary identity is very recent; many countries issued new guidelines which were considered eligible documents in the study search criteria), a much more timely update is needed, as these guidelines appear outdated even before they have been published.
|
Other |
|
The guidelines are finally approved by the boards of all professional societies and organizations involved
[wird die Leitlinie final von den Vorständen aller beteiligten Fachgesellschaften und Organisationen verabschiedet]
|
- Adherence to this rule is mandatory in order to classify a guideline as S2k or S3.
- Since this event is in the future, it can not be currently assessed.
The letter to the medical societies which accompanied the guideline draft stated the the meeting of the Boards of the professional societies is being scheduled for May 2024 (it may or may not have already occurred). During the meeting the Boards can either accept the guidelines, reject them, or accept them with some changes to specific recommendations.
Given the dissent by the 15 Chairs and senior members of the Child and Adolescent Psychiatry Association published (the professional organization formally designed as the leading organization for these guidelines development), as well as the majority vote by the German Medical Assembly (which represents 250 delegates representing 17 medical associations) who passed resolution Ic-48 which called for a markedly different approach to treating gender-dysphoric youth than the one endorsed by the guideline development group, it is not clear whether the approval of all of the medical societies is possible.
|
What is the appropriate "S" classification for the current draft guidelines?
The AWMF classifies guidelines from S1 to S3. The "S1" rating is the lowest level, reserved for the guidelines based only on recommendations by experts; "S2" guidelines require a structured consensus process ("S2k") or a systematic literature review ("S2e"); while "S3" guidelines are based both on a systematic literature review, and employ a structured consensus process to make recommendations.
As mentioned earlier, the guidelines "Gender Incongruence and Gender Dysphoria in Childhood and Adolescence: Diagnosis and Treatment" were originally registered at the "S3" level, which represents the highest level, rigorous, and trustworthy guideline. The guideline draft continued to carry the designation of S3 until January 2024. However, following a meeting with AWMF in January, the final draft was downgraded to "S2k." Two major reasons were offered publicly as the explanation for the downgrade. One was that the decision to downgrade the guideline was due to the poor quality of the evidence itself (a surprising explanation, as it is widely known that even when the evidence itself is of very low quality, it is possible to create high-quality evidence-based guidelines, as long as the process follows a high methodological standard).
Another stated reason for the downgrade was that this was merely a "preemptive" downgrade in anticipation of AWMF strengthening its S3 criteria, which would have rendered the guideline non-compliant with "S3" in the near future: the upcoming change in the S3 standard will require that at least 50% of the recommendations are evidence- rather than consensus-based. Of note, however, it appears that this requirement was already present in the 2023 AWMF Guidance Manual and Rules for Guideline Development (p. 40, see below) and according to another AWMF document, it appears that this requirement may have been in place as early as July 2023 (p. 3), a full 6 months before the guidelines were finalized.
In trying to understand the marked discrepancy between the Cass Report and the German draft recommendations, we embarked on assessing the guideline development process used by the German guideline development group using the "DELBI" standard. Our analysis suggests that not only do the guidelines fail to meet the stated future S3 requirements, but they fail to meet the S3 standard as it exists currently. Per the 2023 AWMF Guidance Manual and Rules for Guideline Development (p. 57), S3 guidelines require assigning a formal "grade" to each recommendation*:
In the case of S3 guidelines, the formal consensus development process for adopting recommendations focuses on clinical aspects to judge the methodologically synthesised evidence. The recommendations are then discussed on this basis. Next, the strength of the recommendations is determined and a grade of recommendation assigned [emphasis added].
By additionally indicating the strength of consensus (percentage of agreement within the guideline development group) for each recommendation, the guideline users are given an impression of the extent to which all participants were in agreement. [emphasis added].
As our detailed analysis in Table B demonstrates, most of the S3 requirements have not been met. However, our analysis also suggests even the lower S2k level may not have been reached. According to the various AWMF documentation sources, while the S2 guidelines do not require a formal grading of the evidence using structured tools such as GRADE, a less formal assessment of the quality of the recommendation is still required. In 2013 the AWMF made it clear that for S2k guidelines, the consensus rating should be used in addition to, rather than instead of, providing a grading for the strength of the recommendation:
For consensus-based guidelines (S2k), the strength of recommendations is identified and adopted during the formal consensus process, although an indication of grades of recommendation (and levels of evidence) is not included because recommendations are not based on a systematic review of the evidence. Here, the strength of a recommendation is expressed in words only. Additionally, the strength of consensus (percentage of agreement within the guideline development group) can be indicated for each recommendation [emphasis added] (p. 42).
The updated 2023 AWMF Guidance Manual and Rules for Guideline Development language is similar but less specific:
For consensus-based guidelines (S2k), the strength of recommendations is identified and adopted during the formal consensus process. Nevertheless, it is not planned to state schematic grades of recommendation or levels of evidence because recommendations are not based on any systematic processing of the evidence. The grade of a recommendation is expressed in words.
It further specifies however that the consensus is used to arrive at the ranking of the recommendation, rather than substitute it:
AWMF Guideline Register Rule: Classification of S2 and S3 guidelines (excerpt): If it is an S2k or S3 guideline:
- The methods for formulating recommendations are clearly described. This requires formal consensus techniques (e.g. consensus conference, nominal group process or Delphi method (see AGREE II Criterion 10).
- Every recommendation is discussed and voted on as part of a structured consensus development with a neutral moderator. The objectives are to find a solution to pending decision-making issues, to establish a final ranking of the recommendations (S2k guideline) and determine the grade of recommendation (S3 guideline), and to measure the strength of consensus [emphasis added] (p. 58)
The requirement that "the strength of the recommendation is expressed in words" also appears in the "aids and appendices" section for S2k.
There does not appear to be any clear indication of the strength of the recommendations (even informal) in the guidelines text. However, the requirement that the "strength of the recommendation is expressed in words" is vague, and therefore, it is unclear if the S2k criterion has been met. What is clear, however, is that the strength of consensus is not considered to be a substitute for the strength of recommendation by AWMF, as the two measure different constructs. The strength of consensus indicates the proportion of the guideline panel that agrees with the recommendation statement. The strength of the recommendation signals whether almost all, most, or only a few of the patients would benefit from the recommendation.
While it is up to AWMF to determine whether or not the rating/grading of the strength of the recommendation "expressed in words" criterion has been met, to the extent that S2k guidelines are considered credible for implementation by AWMF, the lack of any indication of the strength of the recommendation (formal and informal) is a key limitation of the current draft, regardless of the final AWMF rating.
*This text has been updated based on the 2023 AWMF Guidance Manual and Rules for Guideline Development (the prior language was based exclusively on the 2012 AWMF Guidance Manual).
3. The Evolution of German Guidance to Care of Children and Adolescents
Like in other Western countries, the care for gender incongruent/gender dysphoric children and adolescents in the 21st century was largely informed by innovations in the care for gender-dysphoric mature adults in the 20th century. Germany occupies a special place in the history of adult gender medicine. In the 20th century, it was the German sexologist and founder of the Institute for Sexual Science in Berlin, Magnus Hirschfeld, who coined the term “transsexual” and oversaw the first “gender-affirming” procedures in the 1920’s-1930s. Later, Hershfield mentored his younger German colleague Harry Benjamin, who later immigrated to the United States, and became known as the founder of the field of gender medicine in the U.S. and worldwide in the 1950’s. Benjamin’s research led to the formation of WPATH, which, from its inception until 2006, carried Benjamin’s name.
Not only was Germany instrumental to initiating the practice of gender transitions, but it has been the worldwide leader in promoting the "depathologizing" approach to gender incongruence in adults. In 2015, the German Medical Association proposed new transgender guidelines during the World Medical Association (WMA), which asserted that “everyone has the right to determine one’s own gender,” that “gender incongruence is not in itself a mental disorder,” and recommended that “every effort be made to make individualised, multi-professional, interdisciplinary and affordable transgender healthcare (including speech therapy, hormonal treatment, surgical interventions and mental healthcare) available to all people who experience gender incongruence in order to reduce or to prevent pronounced gender dysphoria.” The WMA accepted the German proposal and adopted the guideline, with many medical associations worldwide (including the American Medical Association, who is a member of WMA) embracing it.
In contrast to the adult 2015 German guidelines, the original German treatment guidelines for children and adolescents, written in 1999 and updated in 2013, retained a cautious approach to medicalizing gender-dysphoric youth. The guidelines required youth to undergo psychotherapy for at least 1 year prior to commencing medical interventions to rule out the possibility of transient distress due to emerging sexuality or any other developmental struggles. For youth whose desired to transition persisted, an additional 1-year “real-life test” was required. Puberty blockers are allowed at Tanner stage 2, while the minimum age of cross-sex hormones is set at 16—but both treatments are only provided to those who were expected to continue to suffer from “life-long transsexualism.”
The 2013 guidelines recognized the significant uncertainty regarding the medicalization of gender-dysphoric minors. In addition to discussing the complexities of adolescent development, the 2013 guideline acknowledged that endocrine interventions in general, and puberty blockers in particular, could alter the natural psychosocial development, with a particular effect on gay youth.
To diagnose “transsexualism” (an ICD-10 diagnosis) in a young person, a number of conditions have to be ruled out first. Differential diagnosis included temporary distress, personality disorders, development of a homosexual orientation, or a “sexual maturation crisis,” especially in the cases where gender distress appeared “shortly before or during puberty.”