searchSearch data by region...
Expert Insight

Q&A: Establishing Global Genomic Surveillance to Prevent and Respond to Pandemics

For many nations genomic sequencing has been at the core of their responses to the COVID-19 pandemic. But the world needs to expand upon and connect those independent efforts into a global network to better predict and manage future disease outbreaks.

Joshua E. Porterfield, PhD
October 20, 2021

Genomic sequencing data from the SARS-CoV-2 virus has been essential to all aspects of the pandemic response: testing design, vaccine development, and variant tracking. Dr. Michael Schatz, Bloomberg Distinguished Professor of Computer Science, Biology & Oncology, says that these are just the beginning steps in the establishment of a global digital immune network, where genomic sequencing will play a major role in preventing and responding to future pandemics and disease outbreaks. For additional information on the background of SARS-CoV-2 sequencing data please read our previous Q&A with Dr. Winston Timp.

What does a global digital immune network look like?

At this stage it’s just a concept, although pieces of it are coming into place. I give a lot of credit to a colleague and friend of mine, Adam Phillippy, who's now an investigator with the NIH for the National Human Genome Research Institute (NHGRI), for developing the concept. Coronavirus and flu are the primary examples where there's a worldwide network that has been established to track them. Myself, Adam, and many others, have been arguing that this idea needs to be expanded quite extensively. The idea is that given how important those global phenomena are, we should engineer systematic networks that can perform constant sampling globally coupled with a free exchange of information. That way we can anticipate viral and other biological events, see trends, and observe the rise and fall of different species and variants over time.

“Even if you're primarily concerned about what's happening in your local environment, the way you get good information is by looking globally.”

The specific components must be broad. SARS-CoV-2 is the premier example of the importance of sequencing for tracking, with outbreaks and variants originating on the other side of the world having so much impact on our life at home. There are environmental contexts such as looking at microbes in our environment like E. coli and anthrax. In response to the anthrax attacks in the early 2000s, many major transportation centers now have devices that are constantly sniffing the air for pathogenic species. There's a lot of interest to know more about the microbes, viruses, and everything all around us. Now that we can do sequencing so cheaply, affordably, and remotely it opens up many new application areas. For example, cancer diagnostics has been revolutionized by sequencing. At Hopkins and many leading cancer centers, as new patients are admitted, one of the first things often done is genome sequencing to see what mutations have been accumulated in the cancer as the treatments we give for one class of mutations can be totally different for other classes.

How should sequencing data be stored and made available to researchers and the public?

This is definitely not true in every discipline, but in genomics there's a great culture and tradition of sharing data. These data are important to everyone. The only way we can make progress with genetic diseases and other genetic associations is by compiling huge numbers of genomes together, where researchers can look at patterns and trends. In the NIH there are massive databases that are supported to record genetic information, the biggest of which is called the Sequence Read Archive. It records raw genetic information recorded from people, plants, animals, and lots of other species. It's growing very rapidly. It's an extremely valuable data set — just to be able to look at all this genetic information.

An increasingly important dimension is data sharing and analysis in the cloud. This genomics data is reaching the scale of other huge databases. We did an analysis a few years ago comparing genomics data to YouTube, Twitter, and other massive databases, and they are all growing on the same scale. In the same way that Facebook and Google are increasingly relying on cloud computing resources to support their huge datasets, so it goes with genomics data as well. In fact, I'm one of the co-leads of a big initiative from NHGRI to have a cloud resource for genomics data; it's called the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL). We currently have about 300,000 human genomes stored inside the AnVIL so researchers can perform disease association and other genetic analysis entirely in the cloud.

What other infrastructure changes are needed to support the growth of sequencing data?

At the very base of the analysis is the installation of instruments themselves. When we're talking about a global network of genomic surveillance, you can imagine that every hospital, school, daycare has some sort of sequencing capacity that's constantly running, sniffing the air to see what sort of viruses and microbes are being passed around. Those data then have to go somewhere, and there needs to be data sharing so that we can look at patterns. The way that you see the delta, gamma, or any future wave is by observing a huge increase in transmissions and infections. That’s where cloud computing comes in.

“Making use of these data doesn't come from one single advance. We need lots of technologies and lots of people: an interdisciplinary group all coming together.”

We also need to have a sustained effort. With the ongoing COVID-19 pandemic, people are deeply interested in viral sequencing data. I am optimistic that a day will come where the rate of COVID-19 transmission will be low enough to return to normalcy. My fear is that when we get there, people are going to lose that sense of urgency. It's really important that we continue on. I'm proud of all the resources and effort that have been put into this across governments and institutions, especially at JHU. I hope that level of engagement will continue because, sadly, the next pandemic is not that far away, although it's hard to say when. It could be many years, but the only way we will be ready for it is if we keep up the efforts we have started now.

How can we determine actionable results from sequencing data?

This is where computational biology and computer science really play important roles. We need to be able to do many comparisons really quickly. That comes out of software written primarily by computer scientists developed at Hopkins and other leading institutions. We also need to look for patterns in the data. Sometimes that will involve the creation of detailed visualizations, looking at the rises and falls of variants geographically, just like the COVID-19 global map developed at Johns Hopkins by Dr. Lauren Gardner. In addition to those high-level views, we need very sophisticated methods to tease out associations such as why certain people are susceptible or not. That draws on statistics and machine learning, which comes largely from computer science and biostatistics. Then, of course, we need biologists to help interpret data. We need public policy makers to think about how to respond effectively, since we have to be strategic about the responses and interventions to disease outbreaks.

Joshua E. Porterfield, PhD

Dr. Joshua E. Porterfield, Pandemic Data Initiative content lead, is a writer with the Centers for Civic Impact. He is using his PhD in Chemical and Biomolecular Engineering to give an informed perspective on public health data issues.