About Long COVID

Find out more about the research of the Long COVID project: what we are doing, why we are doing it and who we are doing it with.

What is the Long COVID project?

What is long COVID (condition)?

Long COVID is a term used to describe when a person continues to show signs of, or experience symptoms relating to, COVID-19 more than four weeks after an acute infection.

Find out more about Long COVID (condition)

Currently, there are two working definitions of long COVID:

Ongoing symptomatic COVID-19: which describes individuals who show signs and symptoms of COVID-19 for anywhere between 4 to 12 weeks. 
Post-COVID-19 syndrome: which describes individuals with signs and symptoms that develop during or following an infection consistent with COVID-19, continue for more than 12 weeks and are not explained by an alternative diagnosis.

These definitions were first described in December 2020, when the UK’s National Institute for Health and Care Excellence (NICE)  introduced rapid guidance to manage the long-term effects of COVID-19, in collaboration with the Scottish Intercollegiate Guidelines Network (SIGN) and the Royal College of General Practitioners (RCGP).

Read the NICE: COVID-19 rapid guideline: managing the long-term effects of COVID-19 [NG188]

Diagnostic codes reflecting these working definitions were introduced in the Scottish GP electronic system and the hospital diagnoses system in early 2021.

What are we doing? What are our aims?

The Long COVID project aims to develop and validate a risk prediction model to identify which groups of people are at greatest risk of developing long COVID.

Why is our work important?

Often people who are infected with the SARS-CoV-2 coronavirus and develop COVID-19 recover within a few weeks.

Sadly, there are some people who continue to have symptoms for several weeks, months or longer. These long-term symptoms are now commonly known as “Long COVID” and can involve different body systems. Unfortunately, it is still unknown exactly how many people have long COVID and who is at the highest risk of developing it.

Our work will help provide policy makers and health professionals with a better understanding of who is at greatest risk of developing long COVID, which in then turn will help to inform the development of targeted treatment and prevention strategies.

How will our operational definition be different to existing definitions of long COVID?

The operational definition will provide us with a clear criteria that we can used to identify which individuals in our dataset are living with long COVID.

Existing definitions are designed to provide guidance on which signs and symptoms characteristic of long COVID.

In contrast, our operational definition will allow us to identify cases of long COVID based on whether the clinical codes for these specific symptoms, as well as related tests and treatments, are recorded in their medical records within the 4 – 26 weeks following a positive test for COVID-19.

Unfortunately, this also means that patients who do not have this information recorded will not be identified by our operational definition.

How are we doing this?

Creating an operational definition

Before we create our prediction model, we will first need to develop an operational definition for Long COVID.

To do this, we will examine the long-term healthcare activity of individuals with a positive PCR test for COVID-19 and investigate potential indicators for Long COVID. Examples of potential indicators include:

Codes used by GPs to record symptoms (such as fatigue, breathlessness, or loss of taste and smell), investigations (such as chest x-rays, echocardiograms, and blood tests) and sick lines
the number of interactions with the health system, including:
- GP visits;
- Hospital admissions;
- Outpatient attendances for respiratory conditions;
- A&E visits;
- Out of hours encounters;
- Intensive care unit (ICU) admissions; and
- NHS 24 telehealth interactions.

For comparison, we will look at two control groups:

Individuals with a negative PCR test
Everyone who has not yet taken a PCR test

Find out more about: What is a PCR test?

Which data will we use?

To do this, we will build on research previously done using the Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 (EAVE II) cohort, which uses patient data to track the COVID-19 pandemic in Scotland. 

Find out more about EAVE II

This will involve use examining a range of healthcare datasets, including data on:

General Practices (GPs) - including out of hours;
Hospital and outpatient care;
Telehealth and telecare;
Medication;
Mortality

To gather additional information on potential signs and symptoms of Long COVID in GP records, we will also investigate GP free text data. These are the written records (e.g. notes) used to capture more in-depth details of a patient's encounters and consultations with a GP.

Examining the free text

To identify long COVID using free text contained in health records, we will work with EAVE II and Public Health Scotland's trusted third-party, Alabsoft Ltd.

Our analysts will not physically access or see any free text records themselves.

Instead, Albasoft will use an especially designed tool to identify a range of terms commonly used by GPs to indicate long COVID from the records' underlying code.

Cluster analysis

We will then use cluster analysis to investigate which long COVID indicators tend to occur together (i.e. occur in 'clusters').

These clusters will be used as our operational definition for Long COVID.

We will test our Long COVID definition using different follow-up periods, groups, and calendar time-periods to make sure our operational definition is robust.

Developing a statistical model

Once our operational definition has been developed, we will then move onto creating a statistical model to calculate the probability of an individual developing Long COVID based on their characteristics.

Characteristics of interest (or potential risk factors) include information on:

Socio-demographics (e.g. sex, age, ethnicity);
Location;
Underlying health conditions; and
Severity of acute COVID-19 infection (e.g. if they were hospitalised or admitted to ICU with COVID-19).

We will experiment with different machine learning techniques to enhance our model and ensure our probabilities are accurate as possible.

How is the data kept safe?

All patient data are pseudonymised or ‘de-personalised’, meaning it will not identify an individual because identifiers have been removed/encrypted.
All data are stored on Public Health Scotland’s secure network, with access strictly controlled.
All outputs will undergo rigorous statistical disclosure control measures, to ensure no identifiable or confidential information is released.

Who are we working with?

Find out more about the Long COVID project's funders and partners

Find out more about the Long COVID team

How long will the project last?

March 2021 - August 2023

This article was published on 27 Jun, 2023