Data collected from social media and search history can provide a rare perspective on public reaction to government policies. This data could be influential in decision-making when it’s fully incorporated into public health and policymaking infrastructure.
Online data from social media posts and search engine history is one of the newest pieces of the information mosaic that public health officials and policymakers should rely upon. Dr. Mark Dredze, the John C. Malone Associate Professor of Computer Science, researches how best to define the practices and pitfalls of online data collection to help improve its incorporation into existing decision-making mechanisms. He has led teams investigating many social media aspects of the COVID-19 pandemic including public reaction to policy, the spread of misinformation, and vaccine hesitancy. While the data is compelling, there needs to be significant change made to our public health data infrastructure to make it actionable.
We’ve built infrastructure over many years that allows us to collect and manage terabytes of data across a variety of applications. We build data collectors that are fully automated and require a robust support infrastructure. How we collect the actual data depends on the platform and its policies. Platforms like Reddit and Twitter are very open whereas Facebook has placed many limitations on the data it makes available. The scale of the research also plays a role. The type of research I’m involved in often encompasses millions of posts, which requires a sophisticated computational approach.
Online data allows us to quantify things that previously were unquantifiable, namely public reaction to information. We have not had the tools to develop a good understanding of the beliefs of a population, how different populations react to information, and what kind of actions people are taking outside of the healthcare setting to address their health. The use of online data has changed what we know, especially because it gives us information in such a timely manner. That's been particularly important during the COVID-19 pandemic. Online data has allowed us to get a nuanced understanding of the differences in the way people pursue information and believe information.
During the pandemic we had a number of circumstances where prominent individuals promoted specific therapies for which there was no clinical evidence. Historically this didn't happen to the same extent due to advances in communications platforms, but we also didn't have any way to quantify how the population reacted to the information. We now have that ability. When a celebrity says they took a drug that cured their COVID-19 infection and everyone should take it too, we can measure the change in interest in the population for that drug and how many people are looking to purchase it. That's been a significant change in what we can do.
Online data should be viewed as complementary to traditional methods, not more or less accurate, better or worse. We should be careful not to give priority to certain methods because they have been around longer. Just because we understand something better doesn't mean it is better. This is often the argument concerning telephone surveys. We understand so much more about how they work. Okay, but that doesn't mean that they're actually better. It just means we understand more about the failure cases. We need to develop a better understanding about the failure cases for online data. That's part of the goal of my research, to develop the science around online data so we can have the level of confidence that we have with more traditional methods like telephone surveys.
One of the biggest challenges is integrating these types of information with our existing systems. Oftentimes, there's an existing system or process in place for decision-making. That process may not include the ability to understand or factor in online data. We're still trying to figure that out. During the pandemic, public health agencies were trying to make decisions based mostly on case counts, vaccine uptake, etc., but they had a lot of online information coming in like people’s reactions to policies on social media. We don’t fully understand how to integrate that kind of information into the decision-making process.
A couple weeks ago the CDC changed the guidelines for quarantine after exposure. There was a huge backlash. That policy change certainly considered scientific data, but another key aspect of policy is the reaction of the audience. What is the actual effect that change in policy is going to have? Are people going to be receptive to it? Are people going to continue to trust the decision-makers? Are people going to discount that recommendation? That's something that historically has been very hard to measure, so it isn't factored into decisions in terms of hard data. Now we have a lot more information about that, but we don’t know how to make use of it.
COVID-19 has revealed the problem of misinformation to many people. A lot of us knew that this was going on, but COVID-19 has highlighted the extent to which health misinformation is an issue. These problems are latent and affecting all areas of health. This is a very complex problem because what it comes down to fundamentally is how humans process information as social creatures that rely on each other to learn and make decisions. The pandemic has taught us that we need work at the intersection of multiple disciplines: psychology, journalism, public health, computer science, and more. That's how we're going to have any chance at stemming this tide of misinformation. This is not a problem that we can tackle from just one angle.