searchSearch data by region...
Expert Insight

Q&A: Sequencing Data is Key Component of the Digital Immune System

Sequencing data is essential to both the development of tests and vaccines as well as ongoing disease surveillance. Scientific advances have enabled rapid, affordable sequencing of SARS-CoV-2, the virus that causes COVID-19, and its variants, but now we must learn how to communicate these findings and establish lasting sequencing efforts to protect us from novel viral and bacterial infections.

Share
Authors:
Joshua E. Porterfield, PhD
July 14, 2021

Dr. Winston Timp is a professor of biomedical engineering in the Johns Hopkins Whiting School of Engineering and the co-lead of the Johns Hopkins viral genomics effort. His work focuses on utilizing the power of genetic sequencing data to better defend against bacteria, viruses, and other infectious diseases.

Why is it important to continue sequencing COVID-19 data?

Sequencing is essential to respond to and design tests for novel infectious diseases, but it's also important for surveillance. First, if the virus mutates in such a way that current testing becomes invalid because the sequence we’re probing against no longer works, sequencing should be able to still detect the virus and provide data for us to adapt and design new assays. Second, the mutations that we see in the virus can be used to track how and from where the virus is spreading. Finally, it's also useful to identify variants of concern.

How do you communicate sequencing data about variants without sparking panic?

I want to emphasize that the vaccines in the U.S. are still very effective at producing a strong immune response to all known COVID-19 variants. People who are fully vaccinated have a much lower risk of getting infected by even the Delta variant, and they seem to have a much lower risk of severe COVID-19 requiring ICU hospitalization. Yet, the data still suggests that the Delta variant is more transmissible. People should be aware of that so if needed we can reintroduce mitigation strategies such as masking. However, if your area’s vaccination rate is high and the case count is low there shouldn't be a lot of concern. Explaining this clearly is difficult because it requires a clear science communication strategy to explain to people the differences and what the actual risk is.

“We need to explain the significance of variants clearly so people are aware and informed, but this should be driven by data, not fear.”

It's not just science communication to the layperson, but it's also science communication to clinical and public health folks and vice versa. We have to bridge the gap between researchers and clinicians, so that data can be interpreted, leveraged, and used by all. On the science and academic side there is plenty of data on COVID-19, but we have to make sure that that information is flowing out in a useful and organized manner. We need to know what clinicians need, so we can develop the right tools to answer the right questions and address the most critical needs of patients.

How has the availability of sequencing data changed public health?

It's a combination of technological and computational development that prepared us to deliver data rapidly. The first reported cases of COVID-19 were in October 2019. The first genome for SARS-CoV-2 was shared publicly in January 2020. This is an unprecedented speed of data analysis. From that genome, public health agencies across the world were able to design testing for the virus and start vaccine development.

The challenges we ran into from the sequencing were generally not related to the sequencing itself, but in logistics organization and meta-information wrangling from human samples. The sequencing itself was relatively trivial and actually quite boring. The cost for sequencing has gone substantially down, and the technology is solidly established. That's why this is different than before.

“We have the tools, skill, and scientific community to acquire sequencing data rapidly and easily, but we didn't get here easily.”

Is genetic sequencing part of a robust public health data infrastructure?

Dr. Michael Schatz described standing public health data infrastructure as a “digital immune system.” We should be monitoring cases coming in and watching for red flags. That infrastructure should be set up and continuously funded so public health agencies can watch for not just COVID-19 variants and new viruses, but also bacteria and other infections. If we find cases of bacterial infections resistant to third-line antibiotics, we should be sequencing them to figure out where the antimicrobial resistance is coming from and how to shut it down. The more methods we have to analyze what's making us sick, the better we can respond.

Why have U.S. sequencing data collection efforts lagged behind other nations?

There were three things standing in the way: having the logistics setup, having the political will to do it, and establishing clear communication between all the invested parties with differing backgrounds, including the public. At the beginning of the pandemic, individual institutions were interested in getting sequencing to happen in the United States, there was no will at high government levels to engage in it. That has changed and I think it's still important for the United States to have this setup for the next crisis so we're ready, but also so that we can monitor for variants of concern.

Efforts from the CDC in recent months have been instrumental in turning the focus to sequencing and organizing the systems to continue and provide public health departments with access to this technology. Most state public health departments weren't set up to collect sequencing data, so now with the help of the CDC they're getting paired with either commercial providers or academic core providers to help with sequencing. They could also be involved in the creation and maintenance of a centralized database with well-regulated, well-curated meta-information fields, but how the data is being provided and used will have to be very transparent.

What obstacles prevent cross-correlation analyses based on sequencing data?

There are privacy concerns even though viral sequencing data is less personal than human genomic sequencing data. Meta-information such as the date a patient went to the hospital or their sex, zip code, or age could serve to dox them in the case of a variant outbreak that leads back to them. Privacy is important, but it is also a hindrance to analysis. What if I want to know if this variant only affects a subset of the population? I can’t figure it out without meta-information.

“As a scientist, I prefer to have all data freely accessible to everyone, but I know that's not always the right answer.”

This is outside of my purview to solve. It comes down to ethics. Our problems could be overcome by a system that ensures a balance between both individual privacy and improved public health, but I don’t know what that system would look like. It is an immense challenge that we should be coming together to solve, and better communication will be necessary.

Joshua E. Porterfield, PhD

Dr. Joshua E. Porterfield, Pandemic Data Initiative content lead, is a writer with the Centers for Civic Impact. He is using his PhD in Chemical and Biomolecular Engineering to give an informed perspective on public health data issues.