Preventing AI embellishments and hallucinations

Neil McKechnie • March 26, 2024

Significant progress on accurate I&R recommendations

A recurring theme from Yanzio is the need for trustworthy AI, especially with data that is used to help vulnerable people in need. That's why we devote a lot of time and attention to finding tools and approaches so we are always improving, like the RAG Triad and AI Safety with your data.

When AI "hallucinates", it is embellishing or outright inventing information in a confident manner, even when it may be utterly incorrect. This is one of the primary reasons people may have reservations about using AI, and the source of humorous and even concerning news articles. But as one analyst noted, everything generative AI does is a hallucination, it just happens to be correct most of the time.

We had been seeing this occasionally in Yanzio results and made a concerted effort to investigate, detect and reduce it. The "Fees" field seemed to suffer the most, so we rolled up our sleeves and had a close look.

Tackling frequent hallucination in the Fees field

By analyzing large sets of AI recommendations from multiple I&R organizations we found hallucination rates of up to 21% in the Fees field - meaning about 1 in 5 records had some embellished or invented information produced by AI. For example, if originally a record said the fees were "Sliding scale", the AI might recommend a revised version of "Sliding scale. $50 for the first visit and $25 for subsequent visits." Not only were the dollar amounts invented, they aren't really a "sliding scale" which has more to do with the help seeker's ability to pay for any visit.

Over the course of several weeks, we improved the rate of hallucination in the Fees field down to just 5% by changing the AI instructions, and also using different combinations of Large Language Models (LLMs) for different field types. (GPT, which powers ChatGPT, is an example of an LLM.)

Even better, we have a different LLM evaluate the recommendations and it is flagging nearly all of those 5% instances before a person sees them.

Hallucination rates are now below 1% for the Fees field

As a result, what a person actually sees as an AI recommendation for the Fees field now has a hallucination rate of far below 1%. A fantastic improvement!

We are applying these techniques to nearly all fields now and seeing similar improvements. We are actively working on further improvements, and new tools and methodologies are emerging in the industry that promise even more progress.

Stay tuned...things are moving fast but Yanzio continues to keep pace with the evolution of AI.

< Older Post

Newer Post >