Data Sharing in Healthcare
Due to worldwide digitalization we are experiencing a growing flow of data within hospitals and other medical institutions that is generated, stored and used for the treatment of patients. Nowadays it’s generally accepted that Artificial Intelligence can lead to improvements in medical care. We as a society are faced with the challenge of using this data sensibly and at the same time protecting the privacy of patients.
New regulations like the DSGVO and the European GDPR have been introduced to prevent sensitive information from leaking or falling into the wrong hands. On the one hand these regulations of course protect the privacy of patients but on the other hand they severely hinder the integration of new AI processes in healthcare, which ultimately reduces the profit for society.
This is because the usage of said data goes beyond the single patient. Especially in Research & Development the analysis of data can lead to valuable new insights, which can be used to improve patient treatment. In this respect patients who give scientists the authorization to work with their data do not just help themselves but future patients as well. The necessity to use this valuable data becomes apparent.
New data strategies increase medical value
A successful data strategy of hospitals and medical institutions can not only lead to better treatment of diseases, for example in regards to precision medicine or better decision making of medical personnel but also to prevent diseases. With help of new Machine Learning algorithms conspicuous birthmarks in skin cancer screening or cardiological diseases can be recognized in early stages. The early detection of diseases ultimately results in monetary profits for our society, since treatment, hospital and insurance costs can be reduced. Especially research strongly benefits from process-based secondary use of data.
The secondary use of data must be ensured
As of today the secondary use of data in healthcare occurs too rarely. The barriers are too big and the required personnel not available to establish all necessary processes. Usually hospitals do not employ own data analytics teams. Previous processes of anonymization or pseudonymization of data are prone to error and cannot be long term solutions. Because of these problems we need new important interfaces to make it possible that data can be safely shared between medical institutions and external analysts or other data pools. In the following we continue with an example use case and suggest a solution for these issues.
Problem statement: Share medical data from inside an institution with external service providers
We assume that inside an institution, e.g. a hospital, different medical data is being generated. That might be radiological findings, scans or laboratory parameters. The medical personnel wants to use this data for research and development purposes. Because of missing know-how and workforce an external service provider (e.g. data scientists) shall help.
Hereby two essential problems arise:
- How the data is transferred from the hospital to the data scientist and
- How the institution can ensure that the data is fully ‘anonymous’, meaning the privacy of its patients is thoroughly protected.
In regard to the first problem we need to address that in the past the usual process in data analysis was the transfer of data from its origin to the analysis. To ensure that no private information of patients is being leaked it was tried to anonymize or pseudonymize the data (which reduced the usability and credibility of data and was prone to error) or don’t even use data analysis in the first place.
A development platform with principles of Ethical AI as interface between Data Owner and Data Analyst
Because of new AI regulations and laws Companies and AI developers need processes and tools that can be used to assess and support compliance and truly develop trustworthy applications. A possible approach is using a software platform as an interface between data owner and data user.
A platform as an interface can solve several problems at once. Robust and certified processes are needed to securely build trusted applications and apply Machine Learning. The platform should implement guard rails and strict technical measures to ensure the highest level of data protection and data privacy and especially a clear understanding of responsibilities, roles, scopes and requirements. A software platform can help reduce the effort involved in tracking access rules, task assignments, and scoped development installments. Principles of Ethical AI are implemented directly into the development process itself. Backed by constant updates and a trustworthy enterprise support a safe long-term data strategy is ensured.
Following this proposal we apply the idea to our case: the hospital shares the data via a secure, closed instance – the development platform – with the data scientist. The analyst receives special user rights and access rules by the data owner and can work with the data without any movement by the data itself. The information is still stored in a data quarantine inside the hospital IT network. The analyst can then perform analytics, train Machine Learning algorithms and analyze results.
The previous course of action is reversed: the data is not being transferred to the analyst, but rather the analyst is getting to the data via the platform.
Differential Privacy provides mathematically guaranteed privacy
To achieve and truly fulfil a new standard of data privacy new innovative approaches, which must be jointly implemented, need to be used. For example, Differential Privacy is a strong, mathematical definition of privacy in machine learning analysis. It „mathematically guarantees that anyone seeing the result of a differentially private analysis will essentially make the same inference about any individual’s private information, whether or not that individual’s private information is included in the input to the analysis.”
By using Differential Privacy, we as people (or the patients inside the hospital network), gain the ability to really control and govern the information leaked to external parties about our private information. By measuring the loss of privacy the data owner can properly oversee and control the data and the insights.
If we combine the principles of Differential Privacy with the idea of an holistic development platform as interface for data analytics and Machine Learning we can properly solve both proposed problems at once and create a secure and robust foundation for Research & Development in healthcare.
Gradient Zero is currently working together with the Clinic of Radiology und Nuclear Medicine of a large University Clinic on the integration of our own AI development platform DQ0. With DQ0 as the secure data quarantine and interface between sensitive patient data and external data analysts we ensure highest privacy standards and compliance and offer a secure environment for the development of Machine Learning and advanced analytics.