On November 17, 2021, after eight months of work and a consultation in which some forty stakeholders participated, the French supervisory authority (CNIL) finally delivered its guidelines on health data warehouses, in line with the French Health Data Hub.
Because it is essential for those involved in health research and innovation to be fully aware of the regulations on health data processing, but also because rules are evolving, Ventio, your ultra high-tech specialist in the processing of sensitive health data has dissected these new guidelines for you, and can thus advise you on how to apply them.
We will first review what this type of warehouses represents for health research and innovation, before presenting the essential elements of the guidelines.
I – Health-data warehouses
- What are health data warehouses for?
Every day in France, health data are collected on patients, with the primary purpose of healthcare. The creation of health data warehouses has other purposes, notably research. Consequently, the data collected must be able to be re-used in a different way, since the purpose is not the same as the one initially intended.
A warehouse of this type can be broadly defined as a digitized space storing data that are accessible according to strict rules.
As a patient – user of the French healthcare system, you may have already come to the hospital and been informed that your data would be reused for research and put into a data warehouse.
And you, the caregivers, the data from your work (reports, prescriptions, …) are also likely to be placed in the warehouse, and associated with information about you.
What happens to these data, how are they processed, how do they allow research to progress, what are your rights, what are the risks? These are eminently sensitive issues combining rights for privacy, cyber risks, ethics in health and integrity in research, sovereignty and innovation. The challenge of this CNIL reference framework is to create a space of trust around this sensitive data in order to innovate in health while respecting the rights of data subjects, patients and caregivers.
- Who is the CNIL guidelines intended for and what is its scope ?
These guidelines are intended for organizations (called data controllers) that have public-interest missions and want to collect health data for reuse. This includes, for example, hospitals that collect data initially for care, but may wish to reuse it for other purposes.
Reuse purposes include (but are not limited to):
- Exclusive use by the data controller (understand internally) to operate medical diagnostic tools, as well as to conduct feasibility studies.
- Reuse for health research, which requires compliance with a specific framework (such as research authorization or a CNIL reference methodology), and which may therefore be open to third parties.
It is therefore possible to develop or test innovative tools, for example based on artificial intelligence for diagnostic assistance, and to carry out massive data mining on entire populations for health research, seeking links between the development and evolution of diseases, and genetic, biological, behavioral or environmental factors…
In order to be able to reuse health data, the organization will have to declare that it is compliant, and therefore implement technical and organizational measures to ensure the security of these sensitive data and respect the rights of the data subjects (first and foremost the patients, but also the health professionals involved). The CNIL has decided to publish these guidelines in order to specify these compliance rules.
II. What’s inside ?
- What personal information can be stored in a health data warehouse, for how long and who can access it?
On the one hand, the data can only contain what is in the medical and administrative files, as well as data collected during research projects.
- Directly identifying data
Healthcare professionals, who for example produce examination reports, have their professional contact information integrated into the warehouse.
All information that could directly identify the patient (name, telephone, address, social security number, …) must be stored separately from other sensitive data.
This identifying data is only accessible to a limited number of authorized persons and only in specific cases, for example to manage the warehouse, to recontact patients to offer to participate in research projects or in case of incidental findings involving their health.
- Non-Directly Identifying Data
Sensitive non-directly identifiable patient data can contain everything else, for example weight, height, biology, medical imaging, genetics, sex life, drug use, travel, lifestyle habits… They are pseudonymized and in theory do not allow identification without additional information.
This second group of non-identifying data is accessible after evaluation of the scientific and ethical relevance upon request to the governance of the warehouse. They can be accessible internally or externally to authorized research teams.
- Duration
The maximum storage period for nominative and pseudonymized data is 20 years.
- Anonymized data
Finally, be aware that data can be anonymized, which is in principle an irreversible process, and can then be published or transmitted to any recipient. This is particularly relevant to the current trend towards open science, which involves sharing and publishing research data for reproducibility and reuse.
So expect that your data, once anonymized, will be made public, and that in theory it will be impossible to trace back to you.
- What is the duty of information of organizations managing a health data warehouse and their obligations to respect the rights of the data subjects (patients and health professionals)?
Throughout the collection process, as well as for each re-use for research, the right to information must be respected. The CNIL’s guidelines cover the case of data previously collected and therefore already present in medical files, but also those that will be collected in the future. The principles are those of the RGPD, with a duty to inform unless the organization is able to demonstrate that this requires a disproportionate effort. The commission gives the conditions for taking advantage of this exception to the principle of information (too many people, data too old, too expensive to inform everyone individually), but with safeguards: in particular to have to integrate the justification to the privacy impact assessment, and to communicate publicly on the constitution of the warehouse, for example in the media.
Health professionals, as employees or service providers, must be informed of the transfer of their personal data to the warehouse by means internal to the organization (e-mail, work contract, posting, etc…).
With regard to the other rights (access, rectification, erasure, restriction, objection), since the organization is able to identify the patients and health professionals whose pseudonymized data are processed, it must allow them to exercise these rights and provide clear information on how to do so.
- What are the technical and organizational measures to be implemented?
The sensitive health data stored in a warehouse concerns a large number of people, potentially all the health data of a region or country. The guidelines do not skimp on security to guarantee the confidentiality, integrity and availability of the data and provide a non-exhaustive list of some 50 technical and organizational security measures. Although it would be tedious to list them here, let us nevertheless cite the main categories:
- Physical, logical and cryptographic partitioning: filtering and encryption of communications, backups, specific encryption policy depending on the type of data, …
- Access management, authentication, logging: limiting access according to the roles of each authorized user, individualizing and tracing accesses.
- Requirements on pseudonymization and anonymization measures: no improvisation in this area, measures must comply with best practices and take into account their evolution, with the need to document and be able to demonstrate compliance
- Awareness-raising of users on medical secrecy, on risks and obligations in terms of health data processing, signature of a charter by users
- Securing the workstations accessing the data, with legal supervision if these workstations are not directly under the administration of the data controller
- Procedure to be put in place for the management and treatment of incidents. Data breaches to be documented and notified to the CNIL and to the data subject according to the risk
This set of very strict measures should be seen in the current context of strong cyber-pressure on the information systems of organizations, particularly hospitals. In particular, the guidelines require that warehouses be separated from the organization’s main information system.
- Is it possible to outsource the creation and management of my health warehouse and what are the constraints?
Building a digital space with such constraints is a complex project for which data controllers do not necessarily have the resources and skills in-house. Such warehouses are also called upon to be interoperable with multiple digital services. Organizations can therefore call on processors (e.g. digital service companies, hosting companies, resource or cloud service providers, tech giants, etc.), with a number of constraints illustrated by the case of the Health Data Hub. In particular, the CNIL considers that remote access from outside the European territory is a transfer, which requires a detailed analysis of the data flows and has consequences on the measures to be implemented.
Processors must be under a European jurisdiction or from an „adequate“ country, i.e. one that offers legal guarantees deemed to be of an equivalent level. The subcontracting agreement must distribute responsibilities related to security measures and incident management.
Finally, data may not be transferred outside the European Union, unless the destination country has an adequate level of protection. For hosting, storage or retention, the chosen service provider must also be a certified health data host or equivalent.
III – Health data warehouses, an opportunity and new challenges
By aiming to allow the reuse of data initially collected for care and in the framework of internal clinical research projects, the health data warehouses constitute a real opportunity for research and innovation in healthcare. Nevertheless, this reuse raises a certain number of questions, particularly from the point of view of the accumulation of data with potential biases and their use, which may impact on public interest purposes:
- Health professionals, whose constraints in terms of professional responsibilities are increasing, will have to be reassured to provide objective and complete expert data. Will they play the game of the new purpose? Will the perceived risks of future implication or the fear of being challenged individually by an AI cause bias? The perverse effect would be to have a warehouse in which the expert data contributed would be too poor and de facto not very usable for research. Health data warehouses will have to prioritize quality over quantity, which will require a strong commitment from the governance bodies, and first and foremost from the health professionals and researchers who will be members.
- Pseudonymization in compliance with the guidelines is not an easy task for certain types of data. The case of unstructured documents (reports, prescriptions, digitized documents, free comments, biomedical analysis and imaging) can be complex. Anonymization according to the recommendations of the CNIL for publication and transfer to any recipient may be even more difficult. The case of medical imaging data, for example in brain imaging with research issues on neurodegenerative diseases, raises questions, since anonymization requires measures that may reduce the usefulness of the data. Just as a photo or video of a patient submitted to the warehouse would have to be blurred, the equivalent of blurring brain images directly affects its integrity, reduces its scope and its ability to be used for research. This is a challenge, but also an opportunity to develop innovative solutions for pseudonymization and anonymization.
- Contracting with processors who are able to implement the technical measures recommended by the standard for the protection of this sensitive data calls for good anticipation of the risks. Future cyber-attacks on these warehouses will make it possible to refine the contractual clauses allocating responsibilities between the stakeholders, data controllers, subcontractors and their insurance companies.
These guidelines are equal to the challenges of security and protection of individuals, given the state of the art in technical and legal terms. Let’s hope that health data warehouses will effectively and rapidly provide massive data for health research and innovation at the level of similar initiatives from other countries already well advanced on the subject.
Do you have a project to create a health data warehouse? Do you already have such a digital space but are unsure of its technical and legal compliance with these new CNIL guidelines? Are you looking to assess the compliance of a data reuse project and need to carry out a privacy impact assessment? Ventio offers you its expertise in information security and GDPR, as well as in the fields of biomedical research and new digital technologies.