searchSearch data by region...
Pandemic Data Outlook

If All Data Are Health Data, They Should Be Protected as Such

Policies protecting medical data privacy are well-established and sacrosanct to healthcare providers. But there are few, if any, policies surrounding the mining and selling of other personal data, which often contains health information. This must change.

Beth Blauer, Associate Vice Provost, JHU
Joshua E. Porterfield, PhD
April 26, 2022

Individual privacy is an incredibly complex policy issue that I will not be able to solve in one blog. In fact, a recent episode of Last Week Tonight with John Oliver featured data privacy without even touching on health data, which will be my focus. It is important to reiterate that the Johns Hopkins Coronavirus Resource Center strongly supports the concept of better data for better outcomes, and the increase in data availability over the past few years has been a useful development. However, it is understandable to have concerns about who possesses your data and how they are using it. This is purely in regards to legal data usage under current law, not access to data from cybercriminals. Policy around medical data is much more established, but as discussed with Dr. Itay Fainmesser, all data are health data, and policies around non-medical data are lacking.

Most people in the United States are familiar with some of the laws that protect patient privacy, particularly the Health Insurance Portability and Accountability Act (HIPAA). Congress enacted HIPAA in 1996 to protect individual patients’ sensitive health data, requiring patient consent and notification before any disclosure or sharing of data. This covers protected health information, which the U.S. Department of Health and Human Services defines as anything, “including demographic data, that relates to the individual’s past, present, or future physical or mental health, the provision of healthcare to the individual, or the past, present, or future payment for the provision of health care to the individual.” Activities by healthcare providers, insurance companies, and any businesses that work with health data for billing and analysis are regulated by this law. While there are certain circumstances1 where health data can be released without patient approval (such as judicial proceedings, cadaver organ donation, and compliance with local, state, or federal laws), HIPAA is considerably strict and carries severe penalties for infractions.2

This policy framework may seem prohibitive to the use of patient data, but medical research has been incredibly successful working within HIPAA rules. Patient data is allowed to be used for research purposes when patients sign consent forms or an institutional review board or privacy board (IRB) deems the data requested to be necessary and there is no risk of the patient being identified through its use. The National Institutes of Health (NIH), which funds much of American medical research, offers additional guidelines to help researchers remain HIPAA compliant while protecting patient data.3

Your health data is safe within a medical or research institution due to these policy protections. Then, when discoveries or research projects need to be shared across institutions, there have been exciting technological advancements that help circumvent the need for data sharing agreements and additional IRB approvals. The use of a common data language (OMOP) between institutions allows researchers to share code, not data. (For a more detailed review please read our interview with Dr. Paul Nagy, who has led that effort here at Johns Hopkins.) When institutions all use the same data language, the code behind models can be applied to other institutions' data behind their own firewalls without the need to centralize data. No individual information ever needs to leave the host institution as the developed code and the results in aggregate can be shared and directly interfaced with the peer institution. Individual identification is incredibly difficult when data is provided in aggregate, and essentially impossible when only code is shared.

Our policy framework for health data within medical systems works well. But HIPAA does not cover “other” data such as cell phone usage, internet search history, and location tracking that private industry uses to gather health information. Our phones and computers are constantly recording enormous amounts of data about our lifestyles, locations, and consumption choices. Medical institutions may not be able to tell private companies you were hospitalized for three days, but those firms can purchase your location information from a data broker to see if your phone was in the hospital for three days when it is normally elsewhere. More detailed information on your IP address and search history could even reveal that you were specifically in the cardiac unit. The company now knows to target you with ads for blood pressure medication, exercise equipment, and life insurance. While incredibly unsettling, that is perfectly legal and covered in most apps’ and websites’ user agreements.

Apart from monetized use of these data by predatory companies, the information is still important and useful to researchers. We recently discussed with Dr. Michael Darden the impact of cell phone data in improving epidemiological and economic models to save lives and inform policy throughout the pandemic. These data generated from individual devices are far more accurate and updated than the annual surveys that many researchers typically rely upon. Few people want to answer a paper survey once a year, but almost everyone carries a smart phone with them every day. However, we cannot completely sacrifice individual privacy simply to improve research outcomes.

We need to accept that these data exist and need to be regulated. Regulation of this data could be as strict as HIPAA considering companies can infer health information about as accurately as if they had access to electronic health records. Outside of the U.S., there has been more work on this front, such as the European Union’s General Data Protection Regulation (also known as the “right to be forgotten”), which gives individuals the right to request organizations delete their private data in certain circumstances.4 It also could be feasible to adopt something similar to the OMOP common data model where cell phone companies and internet providers are only allowed to sell or provide consumer data in aggregate or the code for models on consumer outreach developed by use of the data. Being part of an aggregate data set is much safer since each data point (location, time, usage, etc.) is uncoupled and part of a pool. At minimum, people need to be informed who has their data and what they are doing with it. Almost 30 years ago we acknowledged the importance of protecting medical data amid rapidly evolving technology. Now that technology has advanced even further than anticipated, policy must evolve as well to cover all health data, not just what originates in a doctor’s office.

1. Department of Health and Human Services Office for Civil Rights, Summary of the HIPAA Privacy Rule, 26 July 2013. (Accessed 16 April 2022).
2. HIPAA Violation Fines. (Accessed 16 April 2022).
3. National Institutes of Health, Protecting Personal Health Information in Research: Understanding the HIPAA Privacy Rule in: Department of Health and Human Services (Ed.).
4. B. Wolford, Everything you need to know about the “Right to be forgotten”. (Accessed 16 April 2022).

Title image from the Blogtrepreneur Flickr Photostream.

Beth Blauer, Associate Vice Provost, JHU

Beth Blauer is the Associate Vice Provost for Public Sector Innovation and Executive Director of the Centers for Civic Impact at Johns Hopkins. Blauer and her team transform raw COVID-19 data into clear and compelling visualizations that help policymakers and the public understand the pandemic and make evidence-based decisions about health and safety.

Joshua E. Porterfield, PhD

Dr. Joshua E. Porterfield, Pandemic Data Initiative content lead, is a writer with the Centers for Civic Impact. He is using his PhD in Chemical and Biomolecular Engineering to give an informed perspective on public health data issues.