694 - Failing to test for sex differences can result in wrong conclusions: example of the Difference in Sex-Specific Significance error in a reanalysis of brain maturation during COVID-19 lockdowns
Friday, April 25, 2025
5:30pm – 7:45pm HST
Publication Number: 694.5003
Andrew W. Brown, University of Arkansas for Medical Sciences, Little Rock, AR, United States; Simon Chung, University of Arkansas for Medical Sciences College of Medicine, Little Rock, AR, United States; Tim R. Koscik, University of Arkansas for Medical Sciences College of Medicine, Little Rock, AR, United States; Colby Vorland, Indiana University School of Public Health-Bloomington, Bloomington, IN, United States; Donna L.. Maney, Emory University, Atlanta, GA, United States
Associate Professor / Director of Biostatistics University of Arkansas for Medical Sciences / Arkansas Children’s Research Institute Little Rock, Arkansas, United States
Background: Consideration of sex as a variable is increasingly regarded as important for replicability and generalizability of results. To maximize rigor, sex must be incorporated using valid statistical approaches. In a recent example, researchers tested for effects within each sex separately, but used an inappropriate approach of declaring sex differences when statistically significant differences were independently found in one sex but not the other. This approach has, for decades, been widely recognized as invalid. When applied to sex differences, it has been called the “Difference in Sex-Specific Significance” (DISS) error, which is a specific case of the “Differences in Nominal Significance” (DINS) error. When considering two groups, DISS can result in false positive findings of difference up to 50% of the time – no better than flipping a coin. Objective: To use appropriate between-sex comparisons to assess the associations between COVID-19 lockdowns and cortical thickness in female and male adolescents. Design/Methods: In the original manuscript, authors generated (n=87) and validated (n=22) normative cortical thickness curves in male and female adolescents, and tested differences for each of 68 brain regions (n=54). Using shared data, code, and normed region values, we replicated results reporting cortical thinning deviated significantly from normative values in 30 regions of the “female brain” and in two regions of the “male brain” when sexes were analyzed independently. We further replicated overall age acceleration with confidence intervals including the null for males but not females. Thereafter, we appropriately and directly compared cortical thickness for each of the 68 regions and overall age acceleration between females and males using two-sample t-tests and the same false discovery rate correction. Results: Cortical thinning was significantly greater in females than males in only one region, not thirty (Fig. 1). In the case of overall ‘age acceleration,’ the bootstrapped 95% confidence interval failed to exclude the null (Fig. 2), meaning that, based on the original authors’ threshold of statistical significance, the data do not convincingly support the claim of a sex difference.
Conclusion(s): The DISS error resulted in unsupported conclusions regarding the association between COVID-19 lockdowns and sex differences in cortical thickness. Appropriate between-sex tests are essential for making sex-specific comparisons. Thorough and transparent reporting of data and code permits reanalysis of results to come to rigorous conclusions.
Figure 1. Only one of thirty sex differences in cortical thinning reported in a recent paper held up in a valid between-sex comparison. Point estimates represent the difference between males and females for each of 68 regions (named on the x-axis) with uncorrected 95% confidence intervals. After applying False Discovery Rate (FDR) correction as per the authors, only the difference in the left hemisphere insula (lh-insula, darker line on figure) remained significant. ***p = 0.0004.
Figure 2. When comparing estimated age acceleration between the sexes statistically, the 95% confidence interval includes the null. The original paper showed a significant change within females but not males separately. Here, we demonstrate that the between-sex comparison was not statistically significant.