Evaluating the Effectiveness of COVID-19 Policies: A Q&A with Dr. Elizabeth Stuart

Elizabeth Stuart, Bloomberg Professor of American Health and Professor of Biostatistics, Mental Health, and Health Policy and Management at the Johns Hopkins Bloomberg School of Public Health

In response to the coronavirus pandemic, governments have enacted a host of policies to combat the spread of the virus. These efforts have, in turn, spurred important questions: how do we know these policies are working? And how effective are these policies at stopping the spread of the coronavirus? To learn more about how researchers measure the impact and effectiveness of policy decisions like these, we spoke with Elizabeth Stuart, Bloomberg Professor of American Health and Professor of Biostatistics, Mental Health, and Health Policy and Management at the Johns Hopkins Bloomberg School of Public Health. This conversation has been edited for length and clarity.

What is evidence-based policy, and what does it mean in the context of Covid-19?

In general, evidence-based policy means using data and evidence to inform policy decisions. This can cover a variety of different things in a number of different fields. Evidence-based policy often looks to inform interventions. Essentially, it asks the question, “If we implement this policy, will it lead to a better future than if it was not implemented?” Outside of Covid-19, evidence-based policy can also include monitoring, such as using data to track issues that are impacting a community, like violence or unemployment, and identify which issues need attention. With Covid-19, it can mean using epidemiologic data – essentially data about how a disease is spreading in a community – to inform policies intended to stop that spread. In this moment, we see virus case counts going up in some places, so evidence-based policy might ask, “If we implement widespread testing or contact tracing, is that going to have an impact on what will happen in the future?” Right now, we’re still in the middle of it.

What challenges arise for researchers who evaluate Covid-related policies?

In terms of evaluation, one big picture challenge is that it’s hard to figure out what is the true impact of a policy and what is natural variation that you can’t attribute to a given policy. For Covid-19, a significant challenge has been that it’s a very dynamic situation. The easiest policy evaluations are in scenarios where things were “flat” before and, after a new policy is implemented, we want to determine whether there’s a big, measurable change, relative to what we see in places without the policy change. With Covid-19, we don’t have this sort of natural state or a stable situation of cases over time to compare things to. So, disentangling policy impacts from these natural trends has been tricky.

With policy evaluation in general, the easiest policies to assess are those that have a quick, immediate effect. Take a policy like seatbelt laws—once it was implemented, it had a quick impact on motor vehicle deaths. That is very measurable. Covid-19 is much more dynamic, with a long lag time between exposure and outcomes such as hospitalization; this makes it hard to find the real impact of the policy.

The other complication is data. As has been widely reported in the media, we don’t have uniform standards for consistently reported data at the moment. Ideally, for policy evaluation, we’d have high-quality data showing us the complete number of cases and deaths over time. But because of the limitations of testing, and even collecting data at the local level, it’s hard to know if the data we have access to gives us an accurate representation. One small example is death certificates. If someone dies at home and their death was due to Covid-19, it might not be labeled as such on the certificate, which is just one thing that’s hard to account for.

What would a robust evaluation of Covid-19 policies look like?

Ideally, we would want to have multiple communities that adopted the policy under study and multiple communities that did not, which would give us a good comparison for what would have happened with and without the policy. What’s key is collecting good longitudinal data from all of those places, which basically means data collected over time and space. For example, this could be something like different communities’ daily Covid-19 death counts over a certain period of time. Collecting repeated data points that we can see over time can show a change, like an increase or decrease in new cases, and across locations—say fewer deaths in a community with the policy than in one without the policy.

I’d also look for studies that incorporated the epidemiological knowledge about the disease. That means considering features of the virus that can influence the way a policy works, for instance the incubation period of the disease, where it can take up to fourteen days for people to have symptoms. A stronger evaluation would take this into account.

For the strongest design, we’d like to consider communities that are similar to each other before the policy change was made. For example, two cities (but ideally, many cities) that look similar to one another in terms of trends in the disease and demographics, where one (or some) implemented the policy but others didn’t.

It’s important, too, for evaluations to take into account sensitivity analysis, essentially reflecting on how a model is set up and how it runs. Sensitivity analysis is the way that researchers explore how robust their findings are based on the model’s assumptions and choices. These assumptions inform the model, and sensitivity analysis allows you to look at how those assumptions will affect both your results and your model. In a medical setting, the gold standard is a randomized trial, where randomized patients receive one of two treatments—it gives a really strong result. With policy evaluation, we’re not in a clinic but out in the world. It’s a non-experimental setting in that way, and it’s not randomized: some communities take an action and some communities don’t. In any non-experimental setting there are assumptions you have to make.

Public consumption of data has become a hallmark of Covid-19. Is the media doing a good job of helping the public to understand Covid-related data and modeling?

We recently wrote an article about 10 tips for better understanding Covid-19 models. But generally, I think there are some examples of really high quality reporting and thoughtful discussion of the challenges.

Statisticians are generally concerned about the challenges of understanding and conveying uncertainty. The current crisis has heighted uncertainty for all of us, so stories that can highlight, in a very clear way, what is known and isn’t known, as well as the underlying facts that inform our policies, will help people understand the moment better. It will show that we don’t have the answers right now, and not overstating what we know is really important.

Do you think Covid-19 has changed the way the public thinks about public health and data? 

It’s been a remarkable time to be a statistician in public health. Never before have I had friends and family asking me questions about statistics and the way we measure policy effectiveness. While it’s an unprecedented time for the field, you can’t overstate the scale of tragedy the pandemic has caused. But it’s also shown the public the value of public health, and many institutions, including Hopkins, have set a nice example of using evidence-based science to inform decision-making at all levels, from the individual to the national.

I really hope that Covid-19 research is an area where different fields can work together. A lot of fields have different tools to aid in policy evaluation, from statistics to economics to political science. I’d love to see these groups working in partnership with our colleagues who are studying the disease itself, because we’ll need all these groups to contribute for us to have a response to the virus that is valid, meaningful, and useful.