A picture depicting an AI model with a question mark in its face
OpenAI Will Train Models to Turn Themselves in When They Cheat
In Brief
- • OpenAI introduced a new training framework called Confessions to keep AI models honest.
- • The approach aims to curb hidden manipulative behaviors in large language models.
- • The initiative is part of OpenAI’s broader effort to reduce AI risks and increase transparency.
In a bid to keep large language models (LLMs) honest about their operations, OpenAI has announced a training framework called Confessions. The new framework will compel models to ‘confess” when they break rules in an attempt to serve users.
In a published paper titled “Training LLMs for Honesty via Confessions”, the company said the training will make models to acknowledge when they behave badly by “confessing” such behavior. The framework became necessary as a means of ensuring that models don’t feed users untrue information in a bid to offer answers that seem to be desired.
OpenAI Makes Models More Honest With Confessions
OpenAI is a leading company in the artificial intelligence industryknown for its LLM called ChatGPT. Over the few years of its existence, ChatGPT has received a lot of praise as well as criticisms when it comes to serving users. However, a study recently revealed that models can be deceptive in dealing with humans.
The researchers explained that because AI models are trained using human data, they have learned the ability to deceive via techniques such as manipulation, sycophancy, and cheating safety tests. This presents risks that may eventually result in losing control over AI systems entirely.
To prevent this, OpenAI researchers are seeking to make models more open about what they did, for example potentially problematic actions such as hacking a test, sandbagging or disobeying instructions to provide users with the answers they seek.
The company intends to accomplish this by incentivizing openness of AI models by increasing its rewards whenever it makes such confessions rather than reducing it. This way, more models can be more forthcoming about how they operate and provide answers to users who seek them, thus eliminating the potential risks associated with keeping those secrets.
OpenAI Mitigating AI Risks with Confessions
OpenAI has been under fire for ChatGPT’s many weaknesses in the past, which threatens the very foundation of AI technology, hence the need to address the concerns arising. The model is infamous for the suicide of a teen abd the breaking of a 15-year old marriage, among other unhealthy things it has done.
The company promised to make some amends to ensure that ChatGPT does not pose any threat to users, and Confessions may be one of the ways it is working towards making it more transparent about its activities and reducing the risks that could come with AI secrecy.
More Must-Reads:
How do you rate this article?
Subscribe to our YouTube channel for crypto market insights and educational videos.
Join our Socials
Briefly, clearly and without noise – get the most important crypto news and market insights first.
Most Read Today
Peter Schiff Warns of a U.S. Dollar Collapse Far Worse Than 2008
2Samsung crushes Apple with over 700 million more smartphones shipped in a decade
3Dubai Insurance Launches Crypto Wallet for Premium Payments & Claims
4XRP Whales Buy The Dip While Price Goes Nowhere
5Luxury Meets Hash Power: This $40K Watch Actually Mines Bitcoin
Latest
Most Read Today
MOST ENGAGING
Also read
Similar stories you might like.