What makes a Safe Haven safe?
Recently we have compiled a toolkit that enables us to create additional Safe Haven services as part of our work in building the Edinburgh International Data Facility.
For many years EPCC has operated the technical end of the Scottish National Safe Haven (NSH), the Trusted Research Environment which underpins nationwide public-benefit research with sensitive data in Scotland. Our Safe Haven services follow two sets of defining principles: the Five Safes model  designed by the Office for National Statistics, and the Scottish Government Charter for Safe Havens  . Both sets of guidelines highlight something which is absolutely central: a Safe Haven service comes in two separate parts, a secure system (run by one organisation, us, perhaps) and an information governance function (IG, performed by a different organisation)—a “gate” and a “gatekeeper”.
EPCC provides the secure systems, but we don’t mark our own homework when it comes to data security. Information governance does that. For the Scottish NSH service, we provide the system and the eDRIS team within Public Health Scotland provide the IG. They are the service owners, we are the operators.
Providing Safe Havens for enabling research with public data is all about information risk management and assurance, and is essentially a records management exercise with sufficient detail logged at appropriate stages of the data life cycle. The Five Safes model provides a way to balance risks of one kind (eg technical) against risks of another (eg “bad actors”). The model breaks down the decisions surrounding data access and use into five related but separate principles, usually framed as questions, like this.
Access to sensitive data is usually restricted to “approved researchers”; how do we define an “approved researcher”?
What are the appropriate skills and credentials required for researchers to access, manage and process your data? What about the system’s administrators?
What evidence is needed to demonstrate these skills, and do they need to be refreshed or renewed on a regular basis?
Who approves the kinds of research that can and can’t be done with each data set?
What constraints have the data providers placed on data use?
Are there specific contractual obligations regarding data use?
What has been done to ensure that the proposed project use of the data complies with constraints?
Are there specific requirements on transparency, citation, publishing and access to aggregations of the data?
Can management of the data be delegated to the project or must it be actively managed by a dedicated independent data administrator?
Where is this all written down and approved!?
What are the security concerns and specific requirements for storing and processing the data? What do the data providers need?
Are access restriction controls and specific login processes required?
Can the data potentially be downloaded to a local device or uploaded to a remote location? For instance, upload and download of data from the Scottish NSH by researchers is not only not allowed, it is technically disabled.
Are the data accessible by the researchers proportionate to the approved project requirement, in line with GDPR requirements?
Where personal data are involved, have the data been treated so as to minimise risk of disclosure of any individual’s information?
How will output from the project research be published and tracked?
What approvals and controls are required for project output to be extracted from the safe environment?
How will FOI requests regarding the project be received, tracked and fulfilled?
Is a public or private website or a Virtual Data Room  required?
Safe Havens at EPCC
The systems we operate as part of our Safe Haven services are safe settings in the sense above, and the procedures are defined. These procedures also support elements of the other “safes”: eg safe people, for instance (who has administrator permission to which part of the system and have they been suitably trained?). But the security of the system alone is not enough to implement a Safe Haven service, it must be supported by additional IG procedures (the gatekeeper). IG is typically responsible for safe data, safe projects and safe outputs, and the user-side of safe people.
The Scottish Government Charter for Safe Havens provides a useful template that builds on the Five Safes model, covering division of responsibilities, Service Provider authority, data safety, data sensitivity reduction, data separation, staff training and managed collaborations. This last point refers to collaborations with commercial entities—firms with ideas for new products and services which could offer public benefit.
Scotland doesn’t sell public data, but neither does it want to lock out innovators from creating valuable products. Managed collaborations are best, with approved researchers taking the lead and working closely with firms to maintain the public trust that is absolutely vital to any kind of research with public data.
 The Five Safes model was devised in 2003 by Felix Ritchie at the UK Office for National Statistics (ONS).
 A Charter for Safe Havens in Scotland, 2015, ISBN: 978-1-78544-496-8 (web only), https://www.gov.scot/publications/charter-safe-havens-scotland-handling-unconsented-data-national-health-service-patient-records-support-research-statistics/pages/1/
 https://www.investopedia.com/terms/v/virtual-data-room-vdr.asp for a definition of a Virtual Data Room.
Homepage image: MCCAIG via Getty Images