Amid mounting concerns over the safety of artificial intelligence (AI), OpenAI has just introduced a novel approach to AI oversight called chain-of-thought (CoT) monitoring, aiming at improving the transparency and reliability of AI reasoning.
As it happens, the CoT monitoring allows for tracking how AI models break down complex tasks into sequential thoughts (or the ‘chain of thought’), providing deeper insights into their decision-making processes and detecting exploits, the company said in an article published on March 10.
Specifically, CoT reasoning models are able to ‘think’ in natural language that is understandable by humans, and monitoring the said ‘thinking’ has facilitated the detection of misbehavior. This includes subverting tests in coding tasks, deceiving users, or giving up when a problem is too difficult – like humans may sometimes do.
Chain-of-thought monitoring for transparency and security
OpenAI has found that other large language models (LLMs) can be effective in performing this monitoring. Using GPT-4o as a monitor for a frontier reasoning model as it trains has shown its capability to flag attempts at reward hacking in programming environments.
By examining the step-by-step reasoning of AI models through CoT monitoring, researchers can identify instances where the system may be acting deceptively To put a stop to it, they may introduce specific criteria (like not thinking about reward hacking), a CoT summarizer or sanitizer model, or even possible penalties.
That said, penalizing undesirable thoughts could have certain downsides.
In fact, it could encourage AI models to conceal their reasoning instead of eliminating harmful behavior. Therefore, OpenAI has warned that going overboard with optimization on CoT reasoning could push models to manipulate their responses to align with human expectations while still pursuing their own goals behind the scenes.
The concern is not without foundation, as some AI models have indeed exhibited deceptive behaviors alongside becoming more intelligent and developing more advanced reasoning abilities, emphasizing the need for stronger monitoring and extreme caution.
Elsewhere, the use of AI is becoming more widespread, including among government agencies, in performing household tasks, taking passengers between destinations, generating videos based on text, image, voice, and video inputs, discovering and developing new medicines, and more.