Although artificial intelligence (AI) has come a long way, its developers are still struggling with models sometimes spewing out AI nonsense, made up on the spot when the model isn’t able to provide a specific, correct answer to a query.
Indeed, AI can provide perfect answers and clear rubbish with the same ease, reflecting the major challenge in how the large language models (LLMs) underlying AI deal with uncertainty, which a team of researchers has addressed with a newly developed method, per a Tech Xplore report on April 24.
Specifically, the researchers from the Institute for Machine Learning of the Department of Computer Science at ETH Zurich have created an algorithm that can reduce the uncertainty of LLMs designed for text processing and generation.
As Jonas Hübotter from the Learning & Adaptive Systems Group, who developed the new method as part of his Ph.D. studies, explained:
“Our algorithm can enrich the general language model of the AI with additional data from the relevant subject area of a question. In combination with the specific question, we can then extract from the depths of the model and from the enrichment data precisely those connections that are most likely to generate a correct answer.”
For instance, users can feed their locally stored data into an LLM like Llama, after which the so-called SIFT (Selecting Informative data for Fine-Tuning) algorithm, developed by ETH computer scientists, can use the additional information provided to select a particular response most relevant to the query.
How the algorithm to reduce AI nonsense works
The SIFT utilizes the way in which the LLM finds relevant information – sectioning the language information in the training data into word parts and then arranging the semantic and syntactic relationships between them as connecting arrows or vectors in multidimensional space. The dimensions of space arise from the relationship parameters that the LLM identifies independently during training using the general data.
Meanwhile, relational arrows pointing in the same direction in the vector space indicate a strong correlation. The wider the angle between two vectors, the less two units of information relate to one another.
Now, the SIFT algorithm uses the direction of the prompt’s relationship vector to identify those information relationships that are closely related to the question but also complement each other in terms of content. As Hübotter pointed out:
“The angle between the vectors corresponds to the relevance of the content, and we can use the angles to select specific data that reduces uncertainty.”
On the other hand, the traditional method for selecting the information suitable for the answer is the ‘nearest neighbor method,’ which often accumulates redundant information that is widely available, sometimes overlooking certain parts of the query for the sake of the more common information.
However, the SIFT takes into account the extent to which the pieces of information included complement each other or whether the information vectors point in different directions, facilitating the identification of relevant information for all aspects of the question.
Additionally, targeted information selection can reduce the computing power demand by indirectly measuring uncertainty and deciding for itself how much more data it needs to provide a sufficient answer, adjusting the computational overhead by the query’s complexity and the relevant data availability.
The SIFT achieves this by continuously adapting the weighting of the arrow directions to its calculations during data retrieval, making the enriched model more reliable the more it’s used, known as test-time training, which can also reach the same output performance with smaller models.