Real-world data (RWD) are becoming increasingly critical to clinical research. The FDA has put forth definitions surrounding RWD, as well as issued guidance around its use in research emphasizing the principle of data being “fit-for-purpose”—selecting the data needed to answer the question at hand. Meanwhile, stakeholders engaged in clinical development have increasingly recognized that RWD will enable them to conduct studies faster, at a lower cost, and often, with a more representative and diverse population.
However, not all RWD is fit-for-purpose—that is, captured and stored in such a way that the data is ready to address the question at hand. In order to move forward with using RWD in a way that is efficient and effective, we need to build a shared understanding of the different types of data within the broad umbrella that is RWD and make clear which type of RWD is fit-for-purpose for a specific question.
The longest used, and most widely recognized, form of RWD is claims data. Although claims data can help identify important pieces of information, such as healthcare utilization and total cost of care, claims data also has limitations.
By working with claims data, we can understand when a person checks into the hospital, initiates treatment, or switches treatment. But what isn’t available in claims data is the layers of context and meaning, a true understanding of the patient’s journey. What led them to check into the hospital? If they were having a mental health crisis, did they have suicidal ideation or intent? Did they switch treatment because it wasn’t working, because it wasn’t tolerable, or another reason? Additional limitations of claims data include a lack of outcomes assessment. The data typically collected within an insurance claim does not provide information on disease severity, symptomatology, or changes in either over time—all of which are insights that can be gained through analysis of EHR data.
In many situations, claims data can be an important piece of the puzzle—but I want to emphasize it is only one piece, and often not the critical piece that finally brings the image into focus. As a leader at a company that works primarily with de-identified data from electronic health records, or EHRs, I propose that the behavioral health field needs a paradigm shift. A portion of the focus that has traditionally been centered on claims data should be reallocated to prioritization of EHR data–especially when these data types can answer our questions in a more targeted way than claims data can.
EHR data provide the clearest, most complete picture of what is actually happening when a patient receives care. Holmusk’s NeuroBlu Database contains EHR data from over 1.5 million patients across more than 30 health systems across the U.S. Many of these health systems use different EHR systems, meaning the data is captured in a variety of different ways.
Our teams ingest the data from these disparate sources and map the data to a common data model, using standards set by the Observational Medical Outcomes Partnership (OMOP). Once these processes are complete, the data are harmonized and ready to be used for research.
At a baseline, EHR data provides information on diagnoses, treatment prescribed, and any behavioral health assessments or other structured measures taken during each clinical encounter. Holmusk’s NeuroBlu Database extends far beyond this baseline. Our data science teams have developed natural language processing models to unlock key insights from unstructured data like clinical notes. These clinical notes are recorded each time a patient visits the clinic–far more frequently than structured behavioral health assessments are measured and recorded.
The ability to pull insights from these clinical notes equips the NeuroBlu Database with dense and robust data on each of its patients, enabling the creation of research cohorts with very specific inclusion/exclusion criteria. This deep context and granularity is especially important in behavioral health, a field that has long relied on subjectivity and does not have a standardized way to measure conditions.
EHR data can answer the same questions that have traditionally been addressed solely with clinical trials—and can also provide benefits that are extremely difficult to achieve in clinical trials. Traditional recruitment methods for a clinical trial may turn up several hundred patients for the study, while filtering on the same inclusion/exclusion criteria in the NeuroBlu Database would produce thousands. In addition, EHR data ensures a more representative population, while clinical trials often disqualify certain populations, such as patients with comorbidities.
However, EHR data, often is discredited because of a widely used term known as “data missingness.” The idea is that because clinical practice is a less controlled environment than a clinical trial setting, it may result in inconsistent data collection and missing data that will impact research down the road. Some researchers favor clinical trial measures not traditionally found in EHR, leading them to consider the absence of these measures as “missingness.”
As a RWD expert and a mental health clinician, I’m here to say that “missingness” is a misnomer. The measures that are often collected in traditional clinical trials simply will not be found in the EHR—because it is not feasible to administer an assessment that takes an hour during a 45-minute patient appointment, especially when you also want to provide the best care for your patient.
Though efforts to increase the frequency that clinicians use these assessments and psychometric measures are helpful, there is a severe overreliance on these efforts as a panacea to “data missingness” in RWD, especially given the paucity of resources and myriad theoretical orientations present in behavioral health settings. I propose that a much more cost-effective way to address this issue is increased investment in technologies and quantitative science solutions to make the best use of existing data and the measures already in place across systems.
Where fit-for-purpose can be established using EHR-derived RWD, one need not be blindly adherent to the preferred clinical trial measure where a number of other validated measures are available that measure nearly identical constructs (e.g., an overreliance on the antiquated though validated MADRS and HAM-D vs. the PHQ-9, where numerous studies have demonstrated robust validity of all three measures when assessing changes in depressive symptomatology). The call by FDA to choose a data source based on its “fit-for-purpose” is to find data to answer the question at hand, not to strategize about how to systematically change data sources to satisfy preferences that have little basis in data.
All of this to say: EHR data isn’t suffering from “missingness.” It simply is different from clinical trial data—and many of these differences are beneficial. Studies that leverage EHR data are more cost effective, can be completed more quickly, and do not expect patients to shoulder the burden of conducting research. In many cases, EHR data also rises above its RWD counterparts, such as in the availability of outcomes data. Across the field, behavioral health stakeholders would benefit from further investment in systems that make EHR data readily accessible and fit-for-purpose.