Bayes Centre

AI Inside

OpportunityMatch demonstrates how AI algorithms widely available today can be used “under the hood” to deliver value – but also what their limitations are.

Baye s- AI 3


OpportunityMatch is a good example of the kind of software developers can build nowadays using readily available “off-the-shelf” AI algorithms. Behind the scenes, it uses thousands of text documents from the University’s database of research outputs to train a “semantic model” using machine learning methods that allows us to compare searches (which are also simply documents) to these documents.

This is achieved by converting every document to a so-called “embedding”, which is a point in a multi-dimensional space, in such a way that the distance between any two points would be smaller, the more similar they are in terms of their “content”. Of course, the algorithm has no real “understanding” of what the documents or the words in them mean – it simply assumes that, in statistical terms, words more closely related to others appear more often close to each other in pieces of text. For example, “cat” would be closer to “dog” than to “house”, simply because other words like “animal” appear more often in documents that talk about cats and dogs than they do in documents about houses.

Like many other AI-based systems used to process language and text, the similarity model (the mapping from documents to points) is learned from data using a technique called deep neural networks. These are networks of simple mathematical functions whose output depends on the weights between them, and these weights are adjusted when they are being trained with data samples until their produce the desired output. A good analogy for   this is a network of interconnected water pipes with valves on them, where you have several streams of water with different pressure coming in, and you want to re-distribute the water to several output channels at desired pressures – the weights are the settings of the valves that you need to get right to produce the correct “output”.

No real human understanding of the documents is fed into OpportunityMatch – its models are trained simply from the documents using a statistical approach, i.e. it is purely data-driven. Like many other contemporary AI systems, it is therefore not capable of answering any more complex questions, or explain its results – it is simply trained to perform a single function. Nonetheless, this produces a substantial advantage over a simple word-by-word comparison or documents, which would only be able to find documents if the keywords entered in a search have exact matches in those found in the documents.

One can argue that this bears some (very limited and small) resemblance to human intelligence – a human would certainly be able to detect a publication that takes about “rare feline diseases” would be a relevant result for somebody looking for “unusual cat illness”.

OpportunityMatch also demonstrates a common pattern among software apps that use AI nowadays, in that a lot of non-AI functionality is needed to power the application, and only some elements of it are powered by AI. As in the case of our application, this sometimes involves no development of new AI methods. In fact, we built the entire system within a month using only open source software, and the fact that so many of these software components are now readily available explains the flurry of AI-based tools appearing for all kinds of applications.