Sex and gender play an important role in health and disease, yet data collection methods for these demographics are outdated, insufficient, or non-existent. The recent interest in sex disaggregated data due to COVID-19 should be harnessed to reinvent the way we value, collect, and employ sex and gender data.
Sex (biological construct) and gender (social construct) are not fixed binary options, but more of a spectrum that can be fluid throughout a person’s life. Disappointingly, even when utilizing the obsolete view of sex and gender as binary (male v. female) there exists insufficient disaggregated public health data. The ability to analyze data through the lenses of sex and gender is critical to understanding the dynamics of disease and the human response to interventions. Here we will focus on the infrastructure and availability of data disaggregated by sex and gender, but for further discussion of the intersectionality of sex and gender data, please read our recent Q&A with Dr. Sabra Klein.
As we have noted, in the absence of standards, there are significant limitations on how we can interpret and analyze COVID data. In fact, in the United States, there is no requirement or standard for country-level public health data to be disaggregated by either sex or gender, despite calls to modernize these practices.1 Since it is not mandatory, country-level public health data from the CDC will not have that demographic breakdown. Other countries, including New Zealand,2 require country-level surveillance data as well as clinical data to be broken down and analyzed through what they call a gender lens.
When it comes to academic research, as much as we want to think that we're thought leaders on this topic, we too are hindered by biases associated with historical norms in the collection of sex/gender data. Few researchers ask if the pathways that are getting activated or the drugs they are testing function as well and to the same degree in males and females. The National Institutes of Health has mandated that all NIH-funded research must consider sex where relevant,3 but few researchers investigate sex as an independent, influential variable.
Even in hospitals, metadata on sex and gender is either unreliable or inconsistent due to antiquated naming conventions and incompatible electronic health records (EHRs). For example, the establishment of the unified JH-CROWN database from all hospitals in the Johns Hopkins Hospital System required unprecedented work by Dr. Brian Garibaldi, Dr. Scott Zeger, and a team of dedicated students, fellows, and junior faculty. With EHRs, drop-down menus are all different, giving way to differing definitions and categories making cross hospital comparisons impossible. The amount of work required to match up Johns Hopkins’s own systems can make the concept of unifying sex and gender data across all healthcare seem impossible.
The world has made some improvements to sex and gender metadata reporting due to the pressures imposed by the COVID-19 pandemic.4 Global Health 50/50 is a nonprofit organization committed to better understanding differences between males and females in very broad terms. They have their own COVID-19 tracker (shown above from August 10th), but the sex disaggregated data is primarily binary. They have been able to document important information, such as how hospitalization and death rates were roughly 3:1 males to females, which has reduced to about 2:1 as vaccines rolled out.
This non-profit filled a gap where the CDC and state governments were inconsistent and/or lacking on their sex metadata collection and reporting. As we stated earlier, demographic categories with COVID-19 data are unnecessarily convoluted. As shown in the interactive visual below, sex categories are inconsistent between states, and only a few states allow for transgender or non-binary identification in their data collection.
This interactive chart allows you to select a category of COVID-19 data, e.g. vaccinations, and view the different categories of sex and gender data recorded in the United States and how many states utilize each label. In an ideal scenario, there would be multiple categories, allowing for categorizations outside of binary male/female, and all bars would be the same height, showing that the diverse sex and gender labels are available and utilized for all COVID-19 data streams across all states.
There is clearly room to improve, and there are multiple solutions we should begin pursuing from all fronts and all disciplines.
Mandate that country-level public health data be disaggregated by sex. This would bring us in line with our peers abroad and should be considered a floor not a ceiling. The government has already recognized the importance of sex and gender data through the COVID-19 pandemic as we recognize major differences in how the disease progresses and how patients recover. This understanding should translate to better federal data policy moving forward. At minimum the data should be disaggregated by male and female in a binary manner, but an even better choice would be settling on a list of choices that represent the spectrum of sex, which could be instituted through the Office of Management and Budget (OMB) standards as is done with race and ethnicity.5
Invest in data governance and establish consistent terms and consistent options for defining sex and gender. Male and female options are not sufficient for research and analysis. However, a list of 20 different options for identifying sex and gender can exhaust people taking questionnaires and reduce physician efficiency. The diverse communities invested in public health and medical research as well as those most affected by sex and gender naming conventions need to organize, create a governance process, and agree on set terms that will allow for unifying EHRs and increasing the interconnection of datasets. These options need to respect the spectra of sex and gender, be routinely revisted, and not become overly complicated.
Incorporate sex into pre-clinical and clinical studies. Sex is not just a control variable that requires use of equal numbers of male and female mice. Sex has significant biological impact on almost every cell of the body, and should be considered when designing a study. There should no longer be papers published where diseases more common in females, such as depression, are studied exclusively in male mice and rats. This will require funders, journals, and peer reviewers to modify their requirements and perform more due diligence.
Work to regain the trust of sexual minorities and classify their data appropriately. We're still figuring out how to track data on sexual minorities. Many don't trust the medical community and don't often want to be tracked. Studies will not have the statistical power to determine the effects on non-binary sexual groups if there is insufficient data to perform analyses and draw conclusions. The burden is not on sexual minorities to participate in more studies, but for medical researchers to reach out to these groups and reestablish trust to encourage participation.
Sex and gender are key to public health, and lay at the intersection of race, age, and other key demographics. Mending our systems of sex and gender data collection should be the low-hanging fruit we can aim for first before tackling the rest of the United States’s complex issues with demographic data.
1. R. Rubin, Trans health care in the USA: a long way to go, Lancet 386(9995) (2015) 727-8.
2. F. Pega, S.L. Reisner, R.L. Sell, J.F. Veale, Transgender Health: New Zealand's Innovative Statistical Standard for Gender Identity, American journal of public health 107(2) (2017) 217-221.
3. NOT-OD-15-102, Consideration of Sex as a Biological Variable in NIH-funded Research, National Institutes of Health, 09 June 2015.
4. J.N. Newton, C. Griffiths, J. Fitzpatrick, T. Lamagni, I. Campos-Matos, Sex-disaggregated data is reported by Public Health England, The Lancet Global Health 9(8) (2021) e1059.
5. Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity, in: O.o.M.a. Budget (Ed.) Federal Register, 30 October 1997, pp. 58782-58790.
The title, meta, and map images were modified from Global Health 50/50 under creative commons license BY-NC 4.0.