Long-read DNA analysis can give rise to errors
Advanced technologies that read long strings of DNA can produce flawed data that could affect genetic studies, experts warn.
New methods that can read lengthy sections of genetic material – categorised by a series of letters – are up to 99.8 per cent accurate, however, in a genome of more than 3 billion letters, this may equate to millions of mistakes in the results.
Experts from The Roslin Institute examined three recent studies reporting human genome sequences from long-read technologies. The data contained thousands of errors even after corrective software was used, they found.
Researchers say data produced by these technologies should be interpreted with caution, as it may create problems for analysing genetic information from people and animals.
These errors may falsely indicate that an individual has a genetic difference that heightens their risk of a particular disease. Such mistakes could have major implications if these technologies are used in clinical studies to diagnose patients, the team suggests.
Previously, genetic sequencing technologies were focused on reading short strings of DNA. These sequences would be patched together, which is time consuming and labour intensive. This approach is useful for reading individual genes but is inappropriate for entire organisms.
Long-read technologies are incredibly powerful but it is clear that we can't rely on software tools to correct errors in the data – some hands-on expertise may still be required. This is important as we increasingly use genomic technologies to understand the world around us.
The findings are reported in a commentary in the journal "Nature Biotechnology". The Roslin Institute receives strategic funding from the Biotechnology and Biological Sciences Research Council.
Mathematical biology, analysis and prediction at The Roslin Institute