A man communicating with AI
OpenAI introduces new benchmark to ensure language models produce more accurate answers
OpenAI has introduced a new measurement benchmark to ensure that language models provide more accurate answers based on verified facts.
The company in an announcement on 30 October said the new benchmark known as SimpleQA will aid in measuring the factuality of language models, with a focus on getting models to correctly answer short, fact-seeking questions.
Solving the problem of factuality
Anyone who has used generative AI chatbots such as ChatGPT knows that they give inaccurate or factually incorrect answers many times.
This is because training models that produce factually correct responses is challenging in the AI space.
As a result, current language models often produce false outputs or answers unsubstantiated by evidence, a problem known as “hallucinations”.
It is also difficult to measure the factuality of such evidence, but OpenAI seeks to correct this With SimpleQA, The company is focusing on measuring the factuality of answers to short, fact-seeking questions rather than long ones.
While this reduces the usefulness of the new benchmark, it is easier to track the factuality of such responses.
The training dataset, according to OpenAI will have high correctness and diversity, with challenging design for frontier models and a good researcher UX.
The process
To build SimpleQA, OpenAI hired AI trainers to browse the web and create short, fact-seeking questions and corresponding answers.
Questions included in the dataset must meet a strict set of criteria, one of which is that a second, independent AI trainer answered each question without seeing the original response. Only questions where both AI trainers’ answers agreed were included.
To finally verify the quality of the dataset, a third AI trainer answered a random sample of 1,000 questions from the dataset, which matched the original agreed answers 94.4% of the time, with a 5.6% disagreement rate, ensuring a high level of factuality.
How do you rate this article?
Subscribe to our YouTube channel for crypto market insights and educational videos.
Join our Socials
Briefly, clearly and without noise – get the most important crypto news and market insights first.
Most Read Today
Samsung crushes Apple with over 700 million more smartphones shipped in a decade
2Peter Schiff Warns of a U.S. Dollar Collapse Far Worse Than 2008
3Dubai Insurance Launches Crypto Wallet for Premium Payments & Claims
4XRP Whales Buy The Dip While Price Goes Nowhere
5Luxury Meets Hash Power: This $40K Watch Actually Mines Bitcoin
Latest
Most Read Today
MOST ENGAGING
Also read
Similar stories you might like.