searchSearch data by region...
Pandemic Data Outlook

Filling in the Map of COVID-19 Demographic Data

Many states are releasing demographic data on COVID-19 cases, deaths, tests, and vaccinations to the public, but inconsistent, incomplete, and missing information persists.

Beth Blauer, Associate Vice Provost, JHU
September 20, 2021

Last week, the Johns Hopkins Coronavirus Resource Center released a new data visualization that disaggregates COVID-19 case, death, testing, and vaccination data by age, ethnicity, sex, and race. While demographics alone may not determine outcomes of COVID-19, they are often strong indicators of social determinants of health that define a group or person’s well-being.1, 2 Access to detailed demographic data during this pandemic is critical to understanding the function of this novel disease, identifying vulnerable populations, and guiding the nation’s response.3 Aggregation of this data — sourced from each state’s individual reporting mechanisms — required an unprecedented amount of labor and manual data collection, yet the data remain incomplete and inconsistent.

We have previously reported that states are not consistent with demographic naming conventions even within their own data, so it is unsurprising that the states have varying vocabulary for categorizing people. There has been no national mandate for specific terms to identify demographic groups. Even if federal reporting mandates had been set, the categories defined by the U.S. Office of Management and Budget guidelines on race and ethnicity4 have not evolved to accurately reflect the diversity of the American people. Still, adherence to the OMB guidelines would at least produce consistency between COVID-19 data and census data.


Complete data for any single demographic in any single data category for all 50 states does not exist (as shown above for “race”). This is not primarily due to bad-actor states, like Nebraska, that currently provide no public COVID-19 data, but rather because states are inconsistent in their reporting policies. For instance, West Virginia reports vaccination data by age, race, and sex, but not ethnicity.

There are also data gaps that pertain to individual demographics. Sex and gender data is unnecessarily limited to binary options for male/female in most cases. When some states include “other” as a gender category, it’s unclear if that means “non-binary,” “gender fluid,” “transgender,” or indicates a lack of information. Race is similarly constrained, and, as was discussed in our Q&A with Drs. Darrell Gaskin and Janice Bowie, bi-racial groups are becoming increasingly important in the United States. Only a few states provide the option for patients to choose “two or more races,” yet even that category label does not provide any actionable information. Age category boundaries vary between states. Someone could be considered in the 60-70 group in one state, but in the 55+ group in another. Fortunately, the CRC team was able to address this by using statistics to “rebin” the raw age data from states into the consistent 10-year groups displayed on the visualization.

Almost all states employ a “missing,” “pending,” “redacted,” or “unknown” category for each demographic, which we have grouped as “unknown” for comparison purposes. Offering these categories is understandable as people should be able to choose not to provide demographic information. However, in many cases “unknown” accounts for a large proportion of individuals, such as in Idaho where over 20% of vaccinations went to people reported under “unknown” race. Was the data not requested? Did thousands of people choose not to answer, unlike in other states such as Louisiana where less than 2% of vaccinations are categorized as “unknown” race? Does the data exist but hasn’t yet been released?


Regardless of the reason for being “unknown,” the fact that so many cases, deaths, tests, and vaccinations could be attributed to any demographic group limits the power of the data. This issue is seen repeatedly in many states from Pennsylvania, where over 60% of cases have an unknown ethnicity, to Rhode Island, where over 20% of tests were performed on people labeled under the “unknown” sex category. There is no way of knowing if the unknown cases support trends within the current data or mostly belong to a single demographic group, which would significantly skew the data.

The example below highlights the potential impact of “unknown” data. About 20% of COVID-19 vaccinations in South Carolina were given to Black people even though they comprise about 27% of the population.5 This could lead to conclusions that Black people in South Carolina are “vaccine hesitant” or that there is a “health disparity” preventing them from accessing the vaccine. However, 7.6% of South Carolina’s vaccinations are labeled under “unknown” race; if those were mostly Black individuals, then there may not be any “disproportionality” in vaccine distribution by race in South Carolina. We cannot know without the complete data. More importantly, the leadership of South Carolina does not know if they need to modify or improve their efforts to reach out to a specific population.


Demographics data are meant to help identify and aid all members of our society, and especially those who are vulnerable. Labels like “disproportionate,” “health disparities,” and “hesitancy” are common, but without complete data to support those claims, we risk using the data to spread and substantiate prejudice and distrust, as explained by Dr. Alexandre White. “Disproportionate” simply means that more (or fewer) people are present in a category than would be expected based on the census of the population. Census data are thorough and reliable, but directly comparing those data with COVID-19 data — riddled with gaps and unknowns — can lead to inappropriate conclusions of “disproportionate” impacts and hesitancy.

Due to its enhanced granularity, demographic data is meant to provide better understanding of both COVID-19 spread and the public health response. States that are releasing demographic data deserve to be applauded for committing the additional resources and energy to these efforts during the ongoing crisis, while states that are not collecting and reporting any or all demographic data should immediately begin doing so. Complete demographic data may hold the keys to combating COVID-19, and that data is now more crucial than ever.

1. G.K. Singh, G.P. Daus, M. Allender, C.T. Ramey, E.K. Martin, C. Perry, A.A.D.L. Reyes, I.P. Vedamuthu, Social Determinants of Health in the United States: Addressing Major Health Inequality Trends for the Nation, 1935-2016, Int J MCH AIDS 6(2) (2017) 139-164.
2. R. Song, H.I. Hall, K.M. Harrison, T.T. Sharpe, L.S. Lin, H.D. Dean, Identifying the impact of social determinants of health on disease rates using correlation analysis of area-based summary information, Public Health Rep 126 Suppl 3(Suppl 3) (2011) 70-80.
3. D. McPhillips, Black, Hispanic people miss out on Covid-19 testing and vaccinations, 14 September 2021. (Accessed 14 September 2021).
4. Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity, in: O.o.M.a. Budget (Ed.) Federal Register, 30 October 1997, pp. 58782-58790.
5. South Carolina, 1 July 2019. (Accessed 15 September 2021).

Beth Blauer, Associate Vice Provost, JHU

Beth Blauer is the Associate Vice Provost for Public Sector Innovation and Executive Director of the Centers for Civic Impact at Johns Hopkins. Blauer and her team transform raw COVID-19 data into clear and compelling visualizations that help policymakers and the public understand the pandemic and make evidence-based decisions about health and safety.