Clinical Natural Language Processing Research Group

HDR UK Human Phenome Project on Deriving and Applying Health-Related Phenotypes at Scale

We aim to extend the phenotyping potential of disease status algorithms with linked health data and unstructured data from electronic medical records.

This project aims to adapt and extend disease status algorithms developed for UK Biobank for application to large, UK-based mid- to older age cohorts and clinical trials (including, for example, Generation Scotland (n=25,000) and SHARE (n>270,000)). The second aim is to extend the deep phenotyping potential of existing algorithms through incorporating structured, coded information from a range of additional linked datasets (Scottish prescribing/dispensing data and laboratory test results) as well as unstructured data from electronic medical records (including correspondence and radiology reports).

Generation Scotland


Project title

Deriving and applying health-related phenotypes at scale


Phenomics, Applied analytics (NLP), understanding the causes of disease

Research team (in UoE)

Catherine Sudlow, Honghan Wu, William Whiteley, Hang Dong, etc.


Health Data Research UK