searchSearch data by region...
Expert Insight

Q&A: Data May Be Universal, but Context and Format Are Not

Dr. Sara Bertran de Lis explains the evolution of COVID-19 data collection efforts, applauding states for their work while recommending changes needed to improve policy making and public communication.

Share
Authors:
Joshua E. Porterfield, PhD
June 16, 2021

Dr. Sara Bertran De Lis is the head data scientist at the Coronavirus Resource Center. She has previously led efforts at the Centers for Civic Impact to help local governments better understand, collect, and utilize data. She now brings her expertise to U.S. and international COVID-19 data management efforts.

How have the CRC’s data collection efforts evolved during the pandemic?

Our data collection has evolved with each subsequent effort. While the process itself has been consistent, we have learned to predict how things are most likely going to change in the dashboards and try to accommodate these changes in the data structure in advance. This has made our data collection flexible, but robust enough that we don’t have to redo everything each time a state performs an update.

We are now focusing on collecting and analyzing demographic data across states. Without standardization of demographic definitions and granularity in this data, you cannot identify and locate vulnerable populations. There are so many variables when crafting hypotheses about demographics that unless you have very detailed non-aggregated, local data, you're never going to be able to answer any questions. High-level aggregated numbers will identify disparities, but they prevent us from further analyzing them to determine the causes, extent, and potential solutions.

“Lack of standardization within a single state is unacceptable, and needs to be addressed before we can discuss national standards.”

What can states change to improve data collection and reporting?

First of all, data should be scrapable or machine-readable, which means that we can develop code to search for the updated data on the internet, read it, and record it, removing the need for individuals to search through state websites one by one. Downloadable data tables, numbers embedded in the web page, or anything in a dashboard where we can access the source code are great. Even structured PDFs that use the same format on every iteration are useful to the absolutely brilliant data team at the Johns Hopkins Applied Physics Laboratory. However, some states use press releases, images, or graphics, which are not useful formats and really hinder data collection and analysis because they cannot be automatically scraped by the APL software and the data they contain is often incomplete or in aggregated form, lacking all critical detail.

Standardization is also a major issue, but the tricky question is, “who is responsible for it?” The states obviously can play a role by using standards when they exist, but sometimes there are several different standards or none at all. More concerningly than differences in standardization between states is that sometimes divisions within the same state health departments have different standards. We have observed some states that employ different demographic categories for vaccination data than for cases and deaths, such as Georgia.

“States have never before had to collect this much data, analyze it, and release it in real-time. They have done an incredible job.”

Why should states keep collecting and reporting public health data?

While it is easy to criticize, the truth is that every time a state publishes a piece of data there is an enormous amount of effort that goes into it. They have to develop a system to collect individual raw data points, validate the data, clean it up, perform quality checks, validate some more, and then actually publish the information.

It would be a pity to abandon these data systems now. Local governments are constantly making decisions, implementing actions, and improving policies that can be so much better informed by near real-time, high quality data. Public data also empowers individual people to make informed decisions about their personal lives. Individual interaction with the data has resulted in improved data quality and some incredible inventions throughout the pandemic. States should foster that level of public trust and engagement in the future.

What have you learned about how individual Americans consume public health data?

We sometimes doubt the public’s interest in data, but the pandemic has shown us that people are interested in data and they are demanding more. I have learned not to underestimate the ability of people to read and interpret a visualization, notice a mistake, contact you, and therefore improve the overall data quality. I also think my ability to create data visualizations that are understandable for the public has increased dramatically.

The process of deciding what questions a visualization will answer is very important. Once that is decided, we start sketching mock-ups and doing rounds of reviews. After many, many rounds of review among people with diverse backgrounds, we will agree on a final design. I would say that at least 30 people have given feedback incorporated into each visualization on the CRC site.

“Numbers are a universal language, but they need context and that's not universally accessible and understood.”

What do you see as the role of the CRC as we enter a later stage of the pandemic characterized by fewer cases and plateauing vaccinations?

The pandemic is not over even though we in the United States have a higher number of vaccinations compared to other countries. We never know if a new vaccine-resistant variant of the virus is going to appear anywhere in the world; then it won't matter how fast a country has been vaccinated. Monitoring internationally is going to have a much more important role now.

Luckily, we are less demanding with international data. For example, we are probably only going to be interested in the percentage of the population that is fully vaccinated, not which type of vaccine and where it was administered. These essential data are more common between countries compared to what we have been trying to get from states, but there are other challenges. The health systems in other countries are totally different with many languages, reporting methods, and web pages to navigate.

Joshua E. Porterfield, PhD

Dr. Joshua E. Porterfield, Pandemic Data Initiative content lead, is a writer with the Centers for Civic Impact. He is using his PhD in Chemical and Biomolecular Engineering to give an informed perspective on public health data issues.