searchSearch data by region...
Expert Insight

Q&A: Building and Maintaining Public Health Data Pipelines

The Johns Hopkins Applied Physics Laboratory has been a key player in the international public health data field for over 20 years. APL was responsible for teaching local, state, and federal health agencies how to collect and use data for future planning, and is now a key contributor to delivering real-time COVID-19 data to the world.

Joshua E. Porterfield, PhD
September 29, 2021

The first 15 years of Sheri Lewis’s career at the Johns Hopkins Applied Physics Laboratory (APL) focused, in part, on public health surveillance. That has included working closely with U.S. and international public health communities on the development and implementation of electronic disease surveillance systems, mobile health applications for data collection and analysis, and advanced analytics for the prediction and modeling of emerging infectious diseases.

Now, Lewis serves on the Johns Hopkins Coronavirus Resource Center’s executive team as the point person overseeing APL’s critical role in data procurement and analysis. Her perspective on the evolution of public health data infrastructure shows that local, state, and federal government agencies still have far to go to ensure better data practices to prevent and address future public health crises.

What has been APL’s role in the public health data space?

In the last 20 years, APL has revolutionized electronic surveillance of diseases. Back when there was little utilization of information technology in public health, we worked with health departments, starting in the National Capital Region, to help them understand the value of utilizing data and how it would enhance their situational awareness in near real-time. Looking back from where we are now in 2021 that seems like a simple thing, but at the time it was a complete shift. Public health practitioners were really using data in a retrospective manner – when there was an outbreak, they’d dive into the data. We suggested a culture shift: looking at data prospectively and using it to make public health decisions on the front end. We helped with system development and, in parallel, we did a lot of community engagement across the U.S., trying to get public health officials to understand the value of this approach.

ESSENCE, for example, which is the Electronic Surveillance System for the Early Notification of Community-based Epidemics, fundamentally changed the way public health professionals access and utilize data in monitoring established patterns of disease progression in their community. APL began developing that in 1997 and we’ve seen the profound impact of that tool over and over again – of course most recently in the COVID-19 pandemic – and we are continuously improving it to best serve public health.

“While much of APL’s early work was tech-heavy, our main challenge was simply getting people to understand the value of collecting and utilizing data closer to real-time.”

After we started to affect change in the U.S., we began working in the global space with partner nations through the U.S. Department of Defense. We had to get them thinking about how they collected data to enable effective investigations during disease outbreaks. At times, that meant shifting not just from using data retrospectively, but from not collecting or using data at all. Globally, people were moving data around monthly at best. Speeding up the way data were collected and utilized was a huge change on the global level.

How has APL’s role evolved during the COVID-19 pandemic?

One thing the COVID-19 pandemic put center stage was public health experts and public health decisions. The U.S. public also now understands a bit more about what public health does and, unique to this pandemic, has an expectation of being able to utilize data to make decisions for themselves on an individual level. The Johns Hopkins Coronavirus Resource Center is a great example of that.

For many years the general public had very little appreciation for what public health does on a daily basis. With the CRC, we’ve brought data into everyone's living room and they are able to see the data and start to make decisions for themselves, which is really cool. I think the public has a renewed comfort level in how to view, manipulate, and use data.

For APL, that fits us. At its fundamental core, the CRC is a health surveillance initiative. APL is an expert in understanding and using all kinds of data sources. When we stood up ESSENCE, we were using anything we could get our hands on, any type of data. The COVID-19 dashboard, especially in its early days, was somewhat similar — while we were literally only looking at case-based data, it was still an issue of being able to take the data in whatever form it’s provided. That is a uniquely APL capability, and one we were honored to assist with in the dashboard effort.

Unfortunately, one thing the pandemic has also borne out is manipulation of that data and it has underscored the importance of trust in data sourcing and analysis. Just as people can access data and use it to make decisions for themselves, many are also choosing to take it and skew it to tell the story they want to tell. I try to put myself in someone else’s shoes, and if I were not in the field, I would look at all of the different pieces of information available now and wonder what to believe and what not to believe. I do think that we need to be mindful of that problem and figure out a way to counter it moving forward.

What data are available to local and state health departments?

That really depends on what data would be most helpful to them. At this point, there is the potential to have an exponential amount of data at public health agencies’ fingertips, so the question is not, “What is available to them?” It’s really: “What is the fundamental question we’re trying to answer?” If we define that and only collect and look at data that we think will help answer that question, it will be much more useful. That’s a double-edged sword and it gives a lot of us in public health pause because nobody likes to throw away data, but unless there’s an unlimited bandwidth and storage capacity for data, you have to make tradeoffs. Additionally, if you have so much data you’re not able to properly analyze it, there’s also the question of how helpful can the data alone be?

In the early days of building health surveillance tools like ESSENCE, our fundamental question was, “What do people do when they get sick?” Do they go to the pharmacy? Do they miss school? Do they see their doctor or go to urgent care? Are they appearing at emergency rooms – and with what symptoms?

We started exploring all of the different data streams that would represent what people do when they get sick. If state and local public health had a good relationship with their school system, they could potentially get anonymized school absentee and nurse’s office data. We got data from poison control. We looked at point-of-sale transactions for over-the-counter pharmaceuticals. Then there were data sources that health departments had in-house such as mortality data that we were able to incorporate. From there, we got to thinking about what other sources could complement this effort that were freely available, such as weather, air quality, allergens and pollen count. When we strategically added to the data set, we were able to enhance public health practitioners’ ability to have a more complete picture of the health of their community.

“There's an amazing amount of data that people don't even realize exists.”

How would you like to see public health data change moving forward?

Twenty years ago, after the 9/11 terrorist attacks, there was an appetite for investment in data systems. That pushed the field of public health into new areas, but it didn’t have quite the staying power to sustain what was needed to keep up with rapidly-evolving technology and data availability.

Given the immense role public health data collection and analysis has played in the nation and the world’s response to the COVID-19 pandemic, I hope this time that enthusiasm has more staying power and the need for continuous investment into this field is clear.

There are more states that are now collecting and utilizing data than there were previously. That's good, and we want that to continue. We want to see more consistency across the country (and globally) in terms of valuing the continuous collection and utilization of data, and we would like there to be some greater consistency across states and between the states and the federal government. The idea of a global digital health ecosystem is one we need to be thinking about and putting the pieces together to operationalize in the very near future.

One thing I think is vitally important to state is that we don’t just need these data pipelines in a crisis – they need to be a constant tool to understand a community’s health during all times, and the ability and desire to share information must endure. We could augment data and data information sharing during surge situations to the levels we’ve seen during the pandemic, but it would be nice if we kept all the plumbing intact and did some standardization to ensure every house had the same type of pipes. Then we could increase the flow of information on an as-needed basis.

Joshua E. Porterfield, PhD

Dr. Joshua E. Porterfield, Pandemic Data Initiative content lead, is a writer with the Centers for Civic Impact. He is using his PhD in Chemical and Biomolecular Engineering to give an informed perspective on public health data issues.