Data Harmonisation
A project looking to compare antibody levels post COVID-19 vaccination using multiple datasets involved in the National Core Studies Immunity programme.
Project Summary
People's immune systems can respond differently to COVID-19 vaccines and SARS-CoV-2 viral infections. For example, individuals with a compromised immune system (such as blood cancer patients) have a lower than average responses to the vaccines.
The antibodies (proteins that help the body recognise and remove the virus) provided by the vaccines or infection come in different forms, which can become present in the body at different times. For example, some forms may appear soon after vaccination / infection but disappear within days, whilst others may take weeks to reach peak or optimum levels but last much longer.
How various antibodies rise and fall after vaccination has not been compared and contrasted between multiple large datasets before – particularly, there have been no comparisons made between small controlled cohort based serology datasets and large population level national datasets.
The aim of this project is to understand how long (how many days) it takes for different immune responses to achieve peak or adaquate antibody levels post vaccination, and what effects different factors may have on this response time.
The main factors we will examine are:
- Age
- Clinical risk factors (e.g. blood cancer, diabetes or being immunocompromised)
- The type of vaccine AND vaccine dose administered at the time antibody levels were measured (e.g. first, second or third dose vaccine)
- Sex
We are also aiming to produce a synthetic dataset that can be used to train and explore the possibilities of further antibody analysis with datasets that have been harmonised into the OMOP Common Data Model (CDM).
Each cohort, database or dataset will have their own method or structure which they use to record and organise their data. This can make it difficult for researchers to look for or use information across multiple cohorts, as the same type of information might be recorded in different ways (such as different coding languages) or stored in different locations.
Using a Common Data Model (CDM) is one of the ways data researchers use to overcome this problem. CDMs are software tools that can help pool together data from various data-sources (such as the cohorts of CO-CONNECT’s data partners).
In a sense, CDMs are third-party data translators, reading the different coding languages used in each cohort and re-writing their information in one standardised, easy to read language that is easier for researchers to search through.
The work will involve bringing together datasets of experimental results from different groups that were funded by National Core Studies Immunity (NCSi) programme, as well as two national cohorts from Scotland that are part of the EAVE Surveillance Platform with Public Health Scotland (PHS). These have focussed on different demographic populations, such as blood donors, people who attended primary care and had a blood sample taken, health care workers, older people attending hospital or those with suppressed immune systems.
We are converting all data into the OMOP CDM standard, using software and methodology developed for the CO-CONNECT project. Several groups have used a common experimental test whilst some use different tests, known as assays. As such, some of the work will involve normalisation of these variables, so that they can be compared.
This means we have to translate the test data from many different sources into a ‘common data model’. This will allow us to compare many people, being tested in different settings and studies, in the same model.
We will perform a harmonised analysis, using the same analysis script that works on CDM, on both the immunological COVID-19 datasets from the NCSi programme and the national cohorts from EAVE/PHS that have been made compatible with the CDM.
Birmingham Elderly Cohort
2 National cohorts from Public Health Scotland:
- People who attended primary care
- Blood donations
By looking at harmonised data from a larger group of people, we can provide more accurate evidence about the way people’s immune systems respond to COVID-19 vaccines. Integration of these results will increase our ability to determine who is at greatest risk and how this might be addressed.
Our aim is to show how data analysis can be performed more efficiently and become instantly repeatable by adhering to the same data standard (OMOP CDM). If additional datasets are onboarded into SAIL and converted into the OMOP CDM then the same analysis routines can be applied to them to generate further results for comparison.
By also generating a synthetic dataset we can provide a teaching resource for these types of harmonised analyses.
Team members
Name | Project Role | Institute / organisation |
Aziz Sheikh | Principal Investigator | Usher Institute, The University of Edinburgh |
Calum Macdonald | Project Lead | Usher Institute, The University of Edinburgh |
Andrew Boyle | Project Administrator | Usher Institute, The University of Edinburgh |
Chris Orton | Operations Manager (SAIL Databank) | SAIL Databank / Swansea University |
Debs Smith | Patient & Public Involvement Lead | Public contributor |
Gabriella Linning | Communications & Engagement Officer | Usher Institute, The University of Edinburgh |
Helen Parry | Contact for Birmingham Elderly Cohort | University of Birmingham |
Jim McMenamin | Expert Consultant | Public Health Scotland |
Lana Woolford | Patient & Public Involvement Coordinator | Usher Institute, The University of Edinburgh |
Lynn Laidlaw | Patient & Public Involvement Lead | Public contributor |
Natalia Reglinska-Matveyev | Project Manager | Usher Institute, The University of Edinburgh |
Paul Moss | Principal Investigators (National Core Studies Immunity Programme) | University of Birmingham |
Sarah Beard | Project Manager (National Core Studies Immunity Programme) | University of Birmingham |
Key Collaborators
University of Edinburgh (Usher Institute)
Funding
This work is funded by the UK Research and Innovation through the COVID-19 National Core Studies Immunity programme [MC_PC_20060].