EIDF: a unique service for academia and industry

The Edinburgh International Data Facility, which is being developed by EPCC, will facilitate new products, services, and scientific studies by bringing together regional, national and international datasets.

When it comes into service next year, the Edinburgh International Data Facility (EIDF) will be a place to store, find and work with data of all kinds.

Part laboratory and part repository, EIDF is the underpinning data and computing infrastructure of the Data Driven Innovation (DDI) Programme. On the repository side it provides long-term hosting and curation of datasets for a wide range of stakeholders. On the laboratory side it offers both cloud-like and high-performance computing environments for researchers and innovators to work with data (their own or EIDF-hosted, open or restricted, large or small).

Launched at the end of 2018, the DDI Programme is one of six funded within the Edinburgh & South-East Scotland City Region Deal. It has ambitious targets to support, enhance and improve talent, research, commercial adoption and entrepreneurship across the region through better use of data.

The Programme targets 10 industry sectors, with interactions managed through five DDI Hubs: the Bayes Centre, the Usher Institute, Edinburgh Futures Institute, the National Robotarium, and Easter Bush. The activities of these Hubs are underpinned by EIDF.

EIDF will grow and mature with the DDI Programme, expanding in capacity and capability, responding to the needs of the innovation Hubs and, through them, to learners, researchers innovators and entrepreneurs from across the region and beyond.

What will it look like?

Most users of the EIDF will work in the Data Service Cloud, which will offer a rich set of data science and analytics tools, from browser-based notebooks to full desktop environments. We aim to create ready-to-use environments for data analysts, scientists and engineers, with pre-installed, pre-configured toolsets backed by the CPU, GPU and storage resources needed to get the job done.

The Data Service Cloud will sit on top of an Analytics-Ready Data Layer (ARD Layer), where EIDF data can be shared and re-used for science and innovation. This ARD Layer will grow over time as we collect more and more data and make it available. Innovators and researchers looking for data can search and browse through the Data Catalogue to discover what analytics-ready data EIDF has, and how they can get access.

EIDF data managers will work with data depositors at the Data Ingest Gateway, ensuring that incoming data are safely stored in the Data Lake Archive Layer, and well- described in the Data Catalogue. Data in the Data Lake will be stored for the long term, following best practices in digital preservation.

EIDF data wranglers will work in the Data Preparation Layer, often in collaboration with data depositors and others, to turn archived data from the Data Lake into analytics-ready data products in the ARD Layer, so completing what we hope will be a virtuous circle of innovation.

Safe Haven services

EIDF will also offer Safe Haven services to health and government users, following best practice in independent governance and supporting the linkage of complex personal data for public benefit research and policy-making under national and regional safeguards.

Building on EPCC’s expertise in operating the National Safe Haven for NHS Scotland, we will offer Safe Haven services for organisations wishing to host and govern access to their data assets in a highly secure environment. Safe Havens will be isolated from the rest of EIDF, with user approvals, data ingress and egress, and permitted software all controlled by information governance bodies independent of the infrastructure itself.

How will it evolve?

Apart from getting bigger, the most noticeable change will be in the richness and variety of the datasets that will be available. Our goal is to collect and curate a large number of interesting datasets and make them “analytics ready”. Some of these datasets might be small; we hope that many will be truly large, demanding the petabyte scales of the underlying hardware. We’ll be working on making them as useful as possible – easily findable, accessible, linkable and interoperable.

Where are we now?

Work on the software foundations had been proceeding well since late last year, and a Phase 1 development system is already in place at EPCC's Advanced Computing Facility (ACF). Procurement for Phase 2, the first “proper” piece of EIDF, got underway in April this year.

The last few months have been a challenge, of course. Building work on EIDF’s new home at the ACF were suspended in March, casting doubt on our timescales for first service in January 2021. Nevertheless, contingencies sprang into operation and we’re working with our contractors and procurement partners to keep things on track as much as we can. Building work on CR4 is starting up again slowly, after a five-week hiatus. The contractors will be working under social distancing rules, and thus progress will be slower, but work is at least underway. We do expect supply-chain challenges in sourcing new kit, but we are working with our principal vendor partner on minimising delays as far as possible.

 We’ll keep you informed through the EIDF website and mailing list: please subscribe for updates.

Author

Rob Baxter, EIDF Programme Manager and EPCC Group Manager

Image: Getty Images/iStockphoto