Predicting effects of protein variants

Predictors of effect benchmarked using deep mutational scanning data - July 2020

Schematic of scientist comparing effects of protein variants

All humans (except identical twins!) have unique genomes, containing thousands of mutations, which are mostly harmless. However, when a mutation occurs that is harmful and causes a disease, it can be very difficult to tell which mutation caused it. This problem has plagued genetic studies for years. Many groups today use a ‘computational variant effect predictor’, a computer program that predicts how damaging each mutation is likely to be. These predictors offer the advantage of speed, simplicity and low cost over exhaustively testing each mutant in a lab. However, measurements of the accuracy of these predictors tend to be biased by the data they are assessed on. When different predictors are tested using different datasets, their accuracy can’t be compared to each other either.

MRC HGU researchers Ben Livesey and Joe Marsh wished to understand which of the dozens of predictors, which all claim to be the best, actually worked well at predicting disease-causing mutations. They collected data from 31 deep mutational scanning (DMS) experiments. DMS is a relatively new lab technique, which allows us to measure the effects of all possible mutations within a protein. Results from an accurate predictor should correlate well with the results from a DMS experiment. As the DMS results are also completely independent from the data used to train and test the predictors, it is also a far less biased way to compare them.

They found that DeepSequence, an unsupervised machine-learning method outperformed all others for predicting the effect of mutations in human proteins. Several others including SNAP2, DEOGEN2, VEST4 and REVEL also performed well. In addition, the researchers investigated how effective the DMS results themselves were for predicting human disease. The DMS results out-performed all of the predictors assessed. DMS holds great promise for directly identifying damaging mutations in the near future, although the computational predictors remain faster and cheaper.

Links

Marsh Research Group

Original article: https://doi.org/10.15252/msb.20199380

This article was published on 8 Jul, 2020