The Roslin Institute
Roslin Logo

Computer machine learning to fight disease

Scientists find new way to forecast bacterial health risk to humans


Bacteria are everywhere: on us, in us and in the environments where we live.  While the majority are beneficial for animal and human health, a tiny subset can cause serious life-threatening infections. Bacteria can cause disease by infecting us and by producing poisonous toxins. Fortunately, basic hand-washing and cooking of food can prevent the majority of foodborne illnesses.

Here at The Roslin Institute, we have found a new way to use computer modelling to accurately tell apart bacterial strains that can infect humans, and those that cannot. This has many public health and livestock welfare benefits and has the potential to change the face of how we interpret, monitor and manage microbial disease.

Our findings allow us to work out both the likely source of bacteria that is causing an outbreak and also to determine which bacterial strains are the most dangerous.  For example, if we consider E. coli isolates that can produce Shiga toxins, our research indicates that only a minority of these strains, even of serotype O157, are a serious threat to human health. We can use machine-learning to target treatments and interventions against the most dangerous strains.

Furthermore, we hope that by investigating the genes the software uses to identify the dangerous strains, we can learn why these particular isolates are more of a threat to human health

Professor David GallyThe Roslin Institute


Bacteria on plate

What is machine learning?

The study of genetics often involves using computers to process vast amounts of data. These computer programs can be used to find and predict patterns in the data, leading to useful genetic forecasts. Scientists can now design computer programs that can learn, improving their forecasting skills; the more data they interact with, the more powerful the software. This self-improvement of computer models is called machine learning.

In one study, E. coli genetic information from both human and bovine strains was introduced to the computer program so it could learn the properties of strains that allow them to infect cattle and humans. 10% of the samples from cattle were categorized as containing genetic information fitting with human isolates. The scientists interpreted this to mean that this subset of bacterial strains from cattle have more capacity to infect humans. This test was repeated using samples from the UK and the USA to see if the computer could still sort them, and it did. Additionally, the computer was able to identify more bacterial genes that allow E.coli to make people sick than previously known.

Why E. coli?

E. coli are common in humans and many animals, most are harmless but certain types can cause serious disease. Trying to eliminate all E. coli would be futile, but developing a system to eradicate, or at least better control the strains that cause the worst human health outcomes may be possible. One such harmful type is E. coli O157, which lives in cattle without making them ill, but the bacteria produces toxins that can make people ill. Humans are exposed to bacteria from contaminated food or drink, usually leafy greens or undercooked or processed meat and dairy products.  While such illnesses are rare in the UK, about 1000 cases a year, a small number of these can result in bloody diarrhoea, kidney damage and sometimes death.  Not all E. coli O157 from cattle can cause such serious illness. Some of these E .coli strains potentially cannot infect humans at all. If we are trying to prevent the spread of E. coli O157, it is important to know which strains to target. If we can identify bacterial strains that pose little health risk to humans or animals, we can decide not to treat against them, reserving antibiotics and other interventions to ensure they work against bacteria when we really need them to.

What can we use this for?

 For the first time, scientists have used machine learning to forecast whether bacteria can infect humans based on its genetic code and which isolates represent more of a threat to human health. Beyond forecasting if a particular strain E. coli can infect humans or not, this method of machine learning can be used to predict any ‘feature’ of the bacteria or bacterial population based on the genetic code,  Examples include predicting antibiotic resistance and the animal or environmental source of bacteria.  The value of this technique increases as we sequence more bacteria and bacterial communities. The data we collect includes infection severity or treatment success. As we get more data, the predictions should increase in accuracy. Also, there are many bacteria that we cannot study directly, due to difficulty growing them in laboratory settings. If we can predict their behaviour using machine learning, we can improve our understanding of which bacteria are harmless, healthy or harmful on a scale that has not been possible until now.

Further Reading

Predicting the potential of E. coli transmission from cattle to humans

Original Publication