Usher Institute

Developing and validating a risk prediction model for Long COVID

Developing and validating a risk prediction model for Long COVID using the EAVE II study in Scotland.

Summary (research in a nutshell)

Most patients with Severe Acute Respiratory Syndrome 2 (SARS-CoV-2) coronavirus recover within a few weeks. Some people, however, continue to have symptoms that last for weeks or months. These long-term symptoms are commonly referred to as “Long COVID” and can involve different body systems. We do not know how many people have Long COVID and who is at the highest risk of developing it.   

In December 2020, the UK’s National Institute for Health and Care Excellence (NICE)  introduced rapid guidance (see 'Relevant links') to manage the long-term effects of COVID-19, in collaboration with the Scottish Intercollegiate Guidelines Network (SIGN) and the Royal College of General Practitioners (RCGP). This described two working definitions of:  

  1. Ongoing symptomatic COVID-19: individuals with signs and symptoms of COVID-19 from four weeks up to 12 weeks.  

  1. Post-COVID-19 syndrome: individuals with signs and symptoms that develop during or following an infection consistent with COVID-19, continue for more than 12 weeks and are not explained by an alternative diagnosis.   

The term ‘Long COVID’ therefore commonly refers to those who continue to present signs and symptoms four weeks after acute COVID-19 infection i.e. both ongoing symptomatic COVID-19 and post-COVID-19 syndrome. Diagnostic codes reflecting these working definitions were introduced in the Scottish GP electronic system and the hospital diagnoses system in early 2021. 

Who are we and what do we want to do? 

We are a group of health data analysts and researchers who aim to develop and validate a risk prediction model to identify who is at greatest risk of developing Long COVID.  

To do this, we will build on research using the Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 (EAVE II) cohort. This is a study which uses patient data to track the COVID-19 pandemic in Scotland.  

How will we do this? 

We will create an operational definition of Long COVID by observing the long-term healthcare activity of individuals with COVID-19 to investigate potential indicators for Long COVID. For comparison, we will look at two control groups: 

  1. individuals with a negative PCR test  
  2. the general population (i.e. everyone without a positive test)

We will observe a wide range of healthcare datasets, including data from General Practices (GPs), hospital data, GP out of hours data, outpatient data, NHS 24 data, medication data and mortality data. Examples of potential Long COVID indicators we will investigate include the number of encounters/consultations, new diagnoses, clinical area or care specialty, new prescriptions, and more severe outcomes such as hospitalisation and death. 

For more detailed information on potential signs and symptoms of Long COVID in GP records, we will investigate GP free text data. These are written records used to capture more granular details of a patient’s GP encounter. We will develop a text analytics tool with EAVE II’s trusted third-party, Albasoft Ltd, to create codes to extract relevant Long COVID information from the free text. This means we will not access the free text itself, but only the derived codes. 

We will then investigate any Long COVID indicators which present together in clusters. These clusters will be used as our operational definition for Long COVID. We will test our Long COVID definition using different follow-up periods, groups, and calendar time-periods to make sure our operational definition is robust. 

Using our operational definition, we will then create a statistical model to calculate the probability of an individual developing Long COVID based on their characteristics. Characteristics of interest (or potential risk factors) include information on socio-demographics, location, underlying health conditions, and severity of acute COVID-19 infection (e.g. if they were hospitalised or admitted to ICU with COVID-19). We will experiment with different machine learning techniques to enhance our model and ensure our probabilities are accurate as possible. 

How are data handled? 

All patient data are pseudonymised or ‘de-personalised’, meaning it will not identify an individual because identifiers have been removed/encrypted. There is the possibility of re-identification on account of the information pertaining to an individual person. However, all data are stored on Public Health Scotland’s secure network, with access strictly controlled. All outputs will undergo rigorous statistical disclosure control measures, to ensure no identifiable or confidential information is released. 

Name Role
Professor Aziz Sheikh Principal Investigator, Professor of Primary Care Research and Development | University of Edinburgh
Professor Chris Robertson  Professor of Statistics, University of Strathclyde | Public Health Scotland
Dr Vicky Hammersley Project Manager | University of Edinburgh
Dr Luke Daines Project Lead, post-doctoral researcher
Professor Colin Simpson  Associate Dean at Victoria University of Wellington and Honorary Research Fellow | University of Edinburgh 
Dr Karen Jeffrey Postdoctoral Data Analyst
Dr Lana Woolford Patient and Public Involvement (PPI) Coordinator
David Weatherill Patient and Public Involvement (PPI) member 


Vicky Hammersley (Project Manager):

Key Collaborations 

EAVE II - Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 study 

Albasoft Ltd 

University of Strathclyde 

Public Health Scotland 

BREATHE – Health Data Research Hub for Respiratory Health 

Long COVID Scotland 

Partners and Funders 

The EAVE II Long COVID project is funded by the Chief Scientist Office (CSO) at the Scottish Government (COV/LTE/20/15).

This research also used data assets made available as part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant ref MC_PC_20058)

Other relevant links

National Institute for Health and Care Excellence (NICE)

NICE: COVID-19 rapid guideline: managing the long-term effects of COVID-19 [NG188]

Scottish Intercollegiate Guidelines Network (SIGN)

Royal College of General Practitioners (RCGP)

Scottish GP electronic system 

Hospital diagnoses system


March 2021 - March 2023 

Scientific themes (keywords) 

Long COVID-19, Ongoing symptomatic COVID-19, Post COVID-19 syndrome 

Methodology keywords 

Real-world evidence, Surveillance