Boosting the Odds of Creating Value from Data Integration
When we bring together large sets of data from different sources, our goal clearly is to expose an important insight that surpasses the sum of information in those various parts. In our case, that insight hopefully improves people’s health and lives and the ways we deliver health care. Certainly, data integration in any field requires resources – time, money, technology, even intellectual capital. In our experience, increasing the likelihood that those investments pay off lies in a few key considerations.
First, the most valuable insights may come from data that is disparate, not simply different. We need to be able to answer not only questions that can be probed by bringing together more of the same sorts of data, but also data that stretches the context from which the answer arises. In health care, this means reaching beyond electronic health records and claims data. Humans act as filters of many contexts – genomic, psychological, socio-economic and others – in which they exist; the state of their health is the result. If we consider only data created by their interactions with the health care system, we risk missing the actual determinants of health. Typically, medical researchers drill deeper and deeper in one facet of health – we may know everything about the genomics of a tumor. That knowledge may not, however, tells us all we may need to know to improve a patient’s health. We may need to understand how likely patients are to use different potential medications based on their household income to really improve their health. As the sources of data grow from patient reported outcomes or mobile devices, the same high standards with which science judges evidence still must apply.
When we bring together large sets of data from different sources, our goal clearly is to expose an important insight that surpasses the sum of information in those various parts
Second, there is much to be gained when we contemplate the need to protect people’s privacy in a broader context. We need to strike the appropriate balance between potential benefit and data security that affords privacy, which can never be absolute. As in the rest of our lives, we should weigh risks. If we wanted driving to be essentially risk free, we might limit the capacity of cars to travel more than 5 miles per hour. As a society, we have agreed that it is worth a far greater risk to be able to travel at a more reasonable speed. One way to further this balancing is to move beyond the view that data security only comes with physical proximity. Keeping data only on the premises in an organization’s servers no longer is the only way to maximize privacy. Cloud-based computing already offers the best tools for compliance and security. Beyond that advantage, the cloud offers other improved experiences to enhance data integration – less legacy technology to overcome and more opportunity to scale at a marginal cost. As cloud-based solutions continue to improve – lowering costs even further, for example – they should facilitate reconsideration of the castle mentality’s pertinence in a digital world. That in turn will unlock value in data integration by increasing data liquidity. The steady movement toward a value-based compensation model in health care, which creates far more incentives to know more about patients than the fee-for-service model historically has, represents more impetus for change.
Finally, data integration should start from a counterintuitive point – from the use case, not from the need to understand the composition and complexity of the data sets. It is easy to get lost in the artistry of algorithm and construct what is most apparent, rather than what is needed. It is easy to be distracted by a classifier that while accurate has no practical value. At Duke, we work to avoid this through what we have dubbed the “grand fusion” – an intentional melding that brings together teams of content experts such as clinicians and experts with quantitative and technical expertise and that marshal a comprehensive toolbox of frequentist and Bayesian biostatistics along with machine and deep learning. This is no small task given differences from training to vocabulary. But in the end, it is this kind of collaborative approach that creates opportunity. Much like the disparate data itself, it is the integration of people with specific perspectives that expose the most valuable insights in exploring it.