Critical Appraisal of “Long-Term Regret and Satisfaction With Decision Following Gender-Affirming Mastectomy”

“Near-zero regret” findings among adults suffer from a critical risk of bias and have low applicability to youth

Recent research published in JAMA Surgery evaluated satisfaction and regret among individuals who had undergone chest masculinizing mastectomy at the University of Michigan hospital. The average patient age at the time of mastectomy was 27 years; no patients who were under age 18 were allowed to participate in the study.

The participants reported high levels of satisfaction and low levels of regret at an average of 3.6 years following mastectomy. The study authors lauded the “overwhelmingly low levels of regret following gender-affirming surgery,” and framed their findings as in conflict with the “increasing legislative interest in regulating gender-affirming surgery,” referring to current legislative attempts to restrict or ban “gender-affirming” procedures for minors. Another group of authors provided an invited commentary on the paper, reinforcing the view held by the study authors, and asserting the presence of a “double standard:” “gender-affirming” mastectomies have come under undue scrutiny by states’ legislators, while other surgical procedures with higher regret rates do not appear to concern legislative bodies.

The study suffers from serious methodological limitations, which render the findings of high levels of long-term satisfaction with mastectomy among adults at a "critical risk of bias"—the lowest rating according to the Risk of Bias (ROBINS-I) analysis. ROBINS-I is used to assess non-randomized studies for methodological bias. The "critical risk of bias" rating signals that the results reported by the study may substantially deviate from the truth. The results also suffer from low applicability to the central issue the study and the invited commentary sought to address, which was whether legislative attempts to regulate “gender-affirming” surgeries are warranted in minors. Unfortunately, these highly questionable findings are misrepresented as certain and highly positive by both the study authors and the invited commentators, several of whom have significant conflicts of interest.

Below, we provide a detailed explanation of the key methodological issues in the study which render its claims untrustworthy and not applicable to the patient population at the center of the debate: youth undergoing gender reassignment. We also comment on the alarming trend: several prestigious scientific journals appear to have deviated from their previously high standards for scholarly work and instead have become vehicles for promoting poor-quality research, seemingly to influence judicial policy decisions rather than advance scientific understanding. We conclude with recommendations about how journal editors can restore the integrity of scientific debate and raise the bar on the quality of published studies in the field of gender medicine.

Key issues

The claim of “long-term” follow-up is inaccurate. The study authors acknowledge that much of the research about gender medicine outcomes suffers from short-term follow-up but emphasize the that their study fills the gap because it has “long-term” follow-up. This claim of “long-term” follow-up is so central to the study’s conclusions that it has been elevated into the study title, “Long-Term Regret and Satisfaction with Decision Following Gender-Affirming Mastectomy.”
While the study endpoints spanned 30 years (1990-2020), the median time post-surgery was only 3.6 years. Further, only a small proportion (25%) of the study participants were followed longer than 5 years. Such follow-up cannot be considered long-term. In fact it falls short of the 8-11 year average time to regret documented in prior research.
High non-participation rate threatens the validity of the results. A high number of the eligible patients (41%) did not participate in the study. Most studies finding “low regret’ suffer from similarly high (20-60%) rates of loss to follow-up. High nonparticipation rates threaten the validity of the findings because those who opt not to participate in research likely have different levels of satisfaction and regret. The more the responder and non-responder groups differ on baseline or other characteristics, the greater the chance of non-participation bias.
In the table describing demographic characteristics of responders and non-responders (Table 1), the researchers provided data about some of the differences between the two groups. While some relevant comparisons are notably missing (e.g., age, timing of onset of gender dysphoria, prior treatment with cross-sex hormones), the authors did correctly call out two important differences: compared to non-responders, the responder group’s mastectomies were more recent (3.6 vs 4.6 years) and the responders had more anxiety and depression at baseline (70% vs 44%). Each of these differences could bias the results obtained from responders-only in an important way.
- Different time since surgery:
  The first several years following gender-reassignment procedures are known to be a “honeymoon” period, with quality of life and satisfaction rising for the first several years, but then starting to fall after 3-5 years. The fact that the responders had shorter time since surgery than the non-responders (3.6 years vs 4.6 years) suggests that their self-reported outcomes are likely to be more positive than the outcomes of non-responders.
- Different mental health status:
  The authors acknowledge an important difference in the baseline mental health status between the two groups: the responders had more mental illness at baseline. However, the authors minimize this difference by asserting that the two groups’ rates of psychiatric medication use were similar:
  
  Nonresponders also had lower rates than responders of diagnoses of depression (42 [44%] vs 94 [68%]; P < .001) and anxiety (42 [44%] vs 97 [70%]; P < .001) in the past medical history section of the medical record at the time of surgery (Table 1). However, the rates of medication use associated with anxiety and depression at the time of surgery did not differ between respondent groups (eTable 3 in Supplement 1).
  
  The assertion of no difference in medication use hinges on non-statistically significant p-values (<0.05), reported in supplemental eTable 3 which is reproduced below.
  
  While p-values play a role in gauging whether differences between groups are attributable to chance, over-reliance on p-values has led to well-acknowledged problems in research. In this case, to judge whether two groups likely had the same rate of utilization of psychiatric mediation—the null hypothesis—or whether the responders had greater reliance on such medication than the non-responders—the alternative hypothesis—requires looking beyond the simplistic analysis of whether the calculated p-value falls below the 0.05 “threshold.”
  
  As eTable3 shows, 23.7% of the responders were taking an SSRI vs only 14.6% of the non-responders. This 63% increase is a large effect size. Further, the higher utilization rate of psychiatric medications by the responders is consistent across all drug classes reported. Finally, the higher recorded utilization of psychiatric drugs among the responders is congruent with the higher rate of diagnosed psychiatric illness in the same subgroup. All of these factors support the conclusion that the responders had higher use of psychiatric medication, and that it is not just a “chance” finding. The lack of statistical significance in this case is likely an artifact of a small underpowered sample and should not be used to assert that the two groups are similar. Had the investigators compared the use of any psychiatric medication between the groups, the difference would likely have been statistically significant.
Contrary to the authors' claims, lack of reversal procedures is not a sign of low regret/ high satisfaction. To mitigate the non-response bias from over 40% of the eligible participants, the study authors analyzed medical records of the “nonparticipants” at the same institution for the presence of mastectomy “reversal” procedures. Finding no such “reversals,” the authors concluded that regret rates among non-participants was similarly nearly zero. However, this assumption is fundamentally flawed.
Patients unhappy with their transition decision and seek to detransition are unlikely to return to the same clinicians who treated them originally—this has already been demonstrated in prior detransition research. Further, not all those who regret mastectomy will seek another invasive surgery. However, even more problematic is the authors’ assumption that masculinizing mastectomies are “reversible.”

The surgical technique employed in masculinizing mastectomies fundamentally differs from the one employed when women undergo mastectomy for breast cancer. In breast cancer reconstruction, aspects of native breast are typically preserved to help maintain a feminine breast shape. In contrast, masculinizing mastectomy’s goals are to masculinize the chest area, especially nipple position, lower breast pole shape, and scar placement. As a result, re-feminizing the chest in detransitioners poses unique surgical challenges. Scarring, a relative lack of usable soft tissue, and nipple position are limitations in achieving an aesthetically pleasing “reversal.” Another hurdle for detransitioners may be cost— while insurance companies are increasingly choosing to pay or are compelled to pay for “gender-affirming” mastectomies, they do not typically cover reversals. As a result, the process of medical detransitioning and attempts at reconstructive surgery may be cost prohibitive for many patients.

The above discussion only pertains to the cosmetic “reversal”. A breast’s cosmetic function is secondary to its function as a human organ capable of producing milk to feed offspring. No cosmetic reversal, however unlikely in the case of masculine mastectomy, can restore the functioning of a breast due to the loss of milk ducts.
The study’s outcomes suggest that improved mental health is no longer the target of "gender-affirming" interventions. The study did not attempt to investigate mental health or functional outcomes. Instead, the focus was on self-reported satisfaction. However, the central premise behind “gender-affirming” interventions to date has not been that it leads to satisfaction, but that it is critically-important for optimal mental health functioning. According to the NHS England interim clinical policy regarding the practice of gender transition of minors released earlier this month, treatments should aim to “ameliorate the potentially negative impact of gender incongruence/dysphoria on general developmental processes, … support young people and their families in managing the uncertainties inherent in gender identity development and to provide ongoing opportunities for exploration of gender identity.”

Despite access to medical records and IRB approval to analyze the records, the study authors did not attempt to determine whether gender-dysphoric individuals who underwent mastectomy had better or worse mental health or overall functioning than similarly dysphoric individuals who did not seek or obtain mastectomy. Although helpful as secondary outcomes, “satisfaction” and “regret” patient-reported outcomes are inconsistent with the justifications put forth for the medical necessity of “gender-affirming” interventions, especially among youth.

The study also revealed a seemingly unexpected finding: achieving “congruence” with identity was not correlated with satisfaction or regret over treatment. The authors noted that for nearly 20% of the participants (27/139), gender identity shifted after surgery. The greatest outflow was from the category of “female-to-male” transitioners (n=95) into the vaguely defined category of “multiple” gender identity (See eTable 1 below).

The authors conducted an additional analysis which showed that despite the identity changes post-surgery, satisfaction and regret were not negatively impacted: both those whose gender identities remained consistent and those whose identities changed were similarly highly satisfied with mastectomy.

While robust patient satisfaction with mastectomy against the backdrop of shifting identities is an interesting finding, it raises a key question: when the goal of treatment is not improved mental health or functional status, nor is it achievement of “congruence” with identity, then what is the treatment goal—and how is it different from cosmetic plastic surgery?

Increasingly, proponents of gender-affirming care insist that the goal of gender-affirming care should be the achievement of “embodiment goals.” This study's findings support the notion that "gender-affirming care" can help meet individual embodiment goal in a satisfying way. Individuals are free to pursue body autonomy. However, it is considerably more challenging to insist that physicians must meet patients’ embodiment goals or that public and private payers must deem such procedures medically necessary and required for reimbursement.
The authors problematically conflate adults’ satisfaction with children’s and adolescents’ health outcomes. While minors were eligible for surgery at the hospital, they were excluded from participation in the study of the outcomes. Given the focus of the debate on minors, excluding minors who got mastectomies from research is an unusual decision; it is also surprising that the authors did not even report how many underwent mastectomy as minors. However, it is likely that the majority of those who provided responses to the survey were indeed age 18 or older at the time of the surgery (this follows from the median age of 27 at the the interquartile range of 23-33.)
Results of studies of adult participants cannot reasonably be extrapolated to children/adolescents due to the difference in decisional capacity regarding long-term invasive and irreversible interventions between children and adults. Since all the data on satisfaction and regret came from adults, it is inappropriate for the authors and the invited commentators, to leverage the results of the study—whether they are reliable or untrustworthy—to advocate for a policy concerning minors.
Lack of comparator threatens the validity of the conclusion. The study’s findings of high satisfaction and low regret with their treatment approach are biased by the fact that we do not know what contributed to their satisfaction. Various other factors such as the passage of time, attention from medical professionals, counseling, better control of mental illness, or use of mood-enhancing drugs could be responsible for their overall feelings of high satisfaction and low regret with their chosen treatment. Further, we are unaware of the degree of satisfaction and quality of life those who did not receive gender-affirming mastectomy experience. This is the reason that research lacking a comparator is fundamentally unreliable.

SEGM Take-Aways

Although this study reports extremely high rates of satisfaction and low regret, the timeframe in which these outcomes were assessed is insufficient—just 3.6 years post-mastectomy on average. The sample is also highly skewed: 50% of the participants had mastectomies in the last 3.6 of the 30 years. This skewing of the length of time since surgery is expected, given the sharp rise in the number of people (especially adolescents and young adults) identifying as transgender and undergoing chest masculinization mastectomy. It is also a short time in which to assess regret, particularly since one quarter of study participants were younger than age 23 at time of surgery and the median age of first birth in the US is 30 years.

The conclusion of high satisfaction/low regret suffers from a critical risk of bias due to the high non-participation rate, important differences between participants and non-participants, and lack of control group. Problematically, the authors misuse the (critically-biased) results from adults to argue against regulations for irreversible body alternations for minors and do so with a decidedly politicized spin.

The only intellectually honest commentary is that we do not have good knowledge of the likely rates of detransition and regret following chest masculinization mastectomy, nor do we know how many people experience regret but remain transitioned. There is an urgent need for quality research in this area. Previously, detransition and regret rates were considered to be low: they may have indeed been low due to the much more rigorous screenings, or the results may have been biased by the notoriously high dropout rates that plague “regret” research. Regardless, there is now growing evidence of much higher rates of medical detransition.

A recent study from a comprehensive U.S. dataset with no loss to follow-up revealed a 36% medical detransition rate among females within just 4 years of starting hormonal transition. At least two recent studies suggest that average time to regret among recently-transitioned females is about 3-5 years, but there is a wide range. Much less is known about detransition among those who undergo surgery. A growing number of detransitioners now express regret associated with the loss of breastfeeding ability, with one case study detailing breastfeeding grief experienced some 15 years post-mastectomy.

The study and invited commentary exemplify three problematic trends that plague studies emerging from the gender clinics: problematic conflicts of interest of the authors; leveraging scientific journals to disguise politically-motivated pieces as quality research; and a conflicted stance by the gender medicine establishment on surgery for minors. We expand on each briefly below.

Conflicts of interest of study authors and commentators

The significant conflicts of interest of the gender clinicians who study and report on the outcomes of “gender-affirming” interventions cannot be overlooked. These clinicians are conflicted financially, since their practices specialize in “gender-affirming” interventions, as well as intellectually. While conflicts of interest among experts are common, such experts should still attempt to be balanced in their discussions and should acknowledge and reflect on their conflicts of interest.

The interpretations of the data in the study is neither rigorous nor balanced, and both the study and the invited commentary have a decidedly political spin. Further, the invited politicized commentary does not disclose that at least one of the authors is a key expert witness opposing states’ efforts to regulate “gender-affirming” surgeries for minors. This role alone precludes the ability to provide a balanced commentary.

There is a fundamental problem with research emerging from gender clinic settings. The same clinicians provide gender-transitioning treatments to individual patients in their practice; serve as primary investigators and custodians of data used in research informing population health policies; and increasingly, provide paid expert witness testimony in courts defending the unrestricted availability of hormonal and surgical interventions for minors.

As a result, such clinicians cannot express nuanced perspectives. Since any balanced statements may be used against them in a court of law when they serve as expert witnesses, they must resort to the lowest common denominator of the "winner-takes-all" adversarial approach. Such an approach does not tolerate nuance. Unfortunately, this approach contributes to the erosion of the quality of the published work in the arena of gender medicine and accelerates loss of trust about the integrity of the scientific process.

Misuse of scientific publications to promote politically-motivated articles disguised as scientific research

That prestigious medical journals now serve as platforms for promoting misleading, politically motivated research that aims to apply a veneer of misplaced confidence in highly invasive, irreversible treatment should worry everyone committed to evidence-based medicine and the integrity of science. Moreover, it impairs our ability to accurately assess and improve the long-term health outcomes of the rapidly growing numbers of gender-diverse and gender-distressed youths.

This is not the first time that a JAMA has been used as a platform for positioning advocacy for “gender-affirming” care as scientific research. In 2022, JAMA Pediatrics published a study that assessed bodily happiness in a group of subjects aged 14-24 three months after chest masculinization mastectomy. Despite the very short follow up and dropout rate of 13%, the authors argued that their findings supported the premise that there was no evidence to suggest that young age should delay surgery. They also asserted that their research would help dispel the misconception that such surgeries are experimental. The editorial commissioned to bolster the authors claims was descriptively titled, “Top surgery in adolescents and young adults-effective and medically necessary.”

Another troubling trend is the misuse of statistical tools to reframe research findings that contradict the author's own position. For example, a well-known study that claimed that access to puberty blockers reduce the risk of suicide disregarded the fact that individuals reporting use of puberty blockers use had twice as many recent serious suicide attempts as their peers who did not use puberty blockers. Like the finding cited above, the doubling of suicide attempts was not statistically significant due to a small underpowered sample—but the magnitude of the effect was striking and should have tempered the authors’ enthusiastic conclusion that puberty blockers prevent suicides. Another recent gender clinic study, widely and positively covered by major media outlets, claimed that puberty blockers and cross-sex hormones led to plummeting rate of depression—even though the rate of depression among youth taking those medications remained demonstrably unchanged. More information about problems with research originating from gender clinics is detailed in this recent analysis.

Gender medicine’s stance on pediatric surgery

More generally, the gender medicine establishment is in a curious state of internal conflict about its stance on “gender-affirming” surgeries for minors. On the one hand, it has become common for advocates of “gender-affirmation” of minors to insist that surgeries for minors are not performed and anyone who suggests otherwise is spreading “scientific misinformation” and “science denialism.” On the other hand, gender clinicians publish mastectomy outcomes for minors in major medical journals, and laud surgeries for minors as “effective and medically necessary.” It is not uncommon for these opposing claims to be made by the same group of researchers and clinicians, as they test various arguments, searching for the "angle" that is most likely to convince judges and juries--and public at large--that scrutiny of the practice of pediatric transitions, which is increasingly occurring in European countries, is not warranted in the United States.

Notably, none of the European countries that are enacting severe restrictions on the use of puberty blockers or cross-sex hormones for minors have ever allowed surgeries for youth under 18. That the U.S. gender affirmation professionals continue to fight regulation of these problematic procedures speaks volumes about how far the U.S. healthcare has drifted when it comes to "gender affirmation" of minors.

Final thoughts

While it is challenging to determine how best to reduce the temperature of the highly politicized nature of the debate in gender medicine, the editors of scientific journals can begin to restore balance by recognizing how far the field has drifted from the standards of quality scientific research, and begin to expand their circle of peer-reviewers to those with diverse views. Inviting those concerned with the state of gender medicine (and not just the practices’ advocates) into the peer-review and commentary process is the first essential step to improve the quality of research published in the field of gender medicine.