Enhancing The Data Infrastructure

Research use of health and social care data is constrained by current reliance on data recorded using rigidly structured methods. However, qualitative health and social care data is often recorded in free-text. Building on our £4.3M UoE/City Deal investment in the regional DataLoch™, we will develop and deploy tailored Natural Language Processing and Artificial Intelligence methods to enhance existing routine data with data extracted from free-text clinical records, which will have a wide range of potential applications for research and health and social care.

What are our intentions?

To develop, evaluate and routinely implement processing of free-text health and social care records to obtain a complete and deep understanding of people’s medical profiles and circumstances (including diagnoses, social and family history, the presence of geriatric syndromes, functional deficits and frailty markers), place of residence (home, extra-care housing, care home) and household composition (living alone, fitness and frailty of other members of the household).

Specifically, we have the following objectives:

Understand requirements and datasets

Standardised terminologies, geriatric syndrome ontology and computable phenotypes (parameters)

Analyse deep data using natural language processing and machine learning

Build a collaborative community with academics, geriatricians, primary care physicians, palliative care physicians, nurses, allied health professionals, social carers, regional/national health data initiatives

The work proposed in this work-package will be published in international peer-reviewed conferences and journals.

Why is this important?

We have access to world-class linked routine health care data in Scotland and in other UK countries through our leadership role in Health Data Research UK. The £4.3M City Deal investment in the ‘DataLoch’ will enhance access, linkage, data security and the core analytical platform for the regional population of 1.3 million people. This provides a superb foundation to harness the potential of quantitative data to inform our understanding of health and care, to underpin new prediction tools, and to support the implementation and evaluation of new models of care with the potential to spread across the UK. However, even in centres of excellence, existing routine data is not perfect for research in later life because critically important data for this context is often only recorded in free text fields (which is a problem in NHS data, but particularly the case for social care data).

Figure 1. WP3 Architecture of Research Design

How will we achieve this?

This work-package will provide a data infrastructure to support various data-driven research activities in ACRC and it is composed of 4 areas of tasks as depicted in Figure 1. Lower components provide essential basis for upper ones and right components provide key supports to the left ones.

Task 1

The first task, probably the most important at the initial stage for this work-package, is understanding the ‘data infrastructure’ requirements for realising ACRC goals. This will be achieved via a forum that brings together all stakeholders.

It will have two deliverables:

Data infrastructure requirement specifications.
Dataset identification and access.

Task 2

Task 2 is to work on terminology standardisation and computable phenotypes. Standardisation is essential in health and social data research, with multiple levels of data standardisations that are relevant to ACRC data infrastructure.

It will deliver:

A terminology for late life health and social care.
A geriatric syndrome ontology (standardised classifications).
A phenotype (standardised definitions) library for geriatric syndrome and frailty.

Task 3

Task 3 is to use Natural Language Processing (NLP) to analyse deep data from various unstructured data sources to complement structured datasets. Built upon the team’s current NLP work.

It will deliver:

Adapted NLP models on structured reports of medical imaging data for geriatric medicine.
New NLP models for late life health and social care.
The transfer of learning NLP and machine learning models for ACRC research.

Task 4

Task 4 is to establish an active Research Community for co-design and collaboration on the technical work in this work-package. The community will comprise of leads of other ACRC work-packages, Healthcare and social care professionals, NLP research groups, National health data initiatives (HDR UK), Regional health data initiatives and biomedical AI / MRC precision medicine CDTs.

Who are we working with?

We work closely with the UK clinical NLP groups under the HDR UK text analytics project including King’s College London, University College London, University of Birmingham, Cambridge University, Swansea University, Manchester University and University of Sheffield.
As part of Edinburgh Clinical NLP Group’s collaborations, we work closely with the Mayo Clinic.
We will seek collaborations with other top NLP groups such as Stanford NLP group and particularly establish connections with the industry players in clinical NLP such as Deepmind, Facebook and Amazon.

Meet the Team: Enhancing the Data Infrastructure

This article was published on 16 Mar, 2021