As artificial intelligence (AI) continues to proliferate, it has become a popular tool in many areas, ranging from seeking answers to everyday questions to scientific research thanks to its ability to crawl up-to-date relevant information from the internet, but there’s a problem.
Indeed, generative AI search tools analyze and repack information themselves, cutting off traffic flow to original sources, as opposed to traditional search engines that usually operate as intermediaries, guiding users to news websites and other quality content.
This way, AI chatbots’ conversational outputs sometimes obscure serious underlying issues with information quality and raise the question of evaluating how these systems access, present, and cite content produced by news publishers.
With this issue in mind, Columbia University’s Tow Center for Digital Journalism has analyzed the ability of eight generative AI search tools with live search features to accurately retrieve and cite news content, and how they behave when they cannot succeed in this, publishing the results on March 6.
‘Confidently wrong’
The findings were staggering. As it turns out, chatbots were generally bad at declining to answer questions they couldn’t answer precisely, providing incorrect and speculative answers instead. Premium chatbots were more confident in offering incorrect answers than their free counterparts.
Furthermore, multiple chatbots seemed to circumvent Robot Exclusion Protocol preferences, generative search tools fabricated links and cited third-party republished and copied versions of articles, and content licensing deals with news sources provided no guarantee of accurate citation in responses.
Specifically, the chatbots collectively provided wrong answers to more than 60% of queries by the research team, with Perplexity reaching a 37% incorrectness level, and Grok 3 having a much higher error rate, answering an astounding 94% of the queries incorrectly.
Most of them had an alarming level of confidence, generally abstaining from using qualifying phrases like ‘it appears,’ ‘it’s possible,’ ‘might,’ and the like, or admitting knowledge gaps or limitations with statements like ‘I couldn’t locate the exact article.’
Alarmingly, confidence in providing incorrect answers was even more pronounced in premium models, like Perplexity Pro, which costs $20/month, and Grok 3, with a price tag of $40/month. According to the tests, both answered more prompts correctly than the free versions but, paradoxically, also demonstrated higher error rates.
According to the researchers, this stems primarily from their proclivity to provide definitive (even when wrong) answers rather than declining to answer the question directly. Moreover, their authoritative conversational tone makes it difficult for users to distinguish between accurate and inaccurate information.
Problem with citations
On top of that, there was another contradiction. Some publishers disallow chatbots’ crawlers in their robots.txt, and five of the eight chatbots tested (ChatGPT, Perplexity, Perplexity Pro, Copilot, and Gemini) have made their crawlers’ names public, allowing publishers to block them. The other three (DeepSeek, Grok 2, and Grok 3) haven’t made their crawlers publicly known.
In some instances, the chatbots either incorrectly answered or declined to answer queries from publishers that permitted them to access their content, while in others, they correctly responded to queries about publishers whose content they shouldn’t have had access to.
Additionally, the generative AI search tools had a common tendency to cite the wrong article. For example, DeepSeek misattributed the source of the excerpts provided in the researchers’ queries 115 out of 200 times. In other words, it most often credited news publishers’ content to the wrong source.
And even when they got it right, the chatbots often failed to properly link to the original source. Because of this, publishers seeking to have visibility in search results weren’t getting it, while the content of those wishing to opt out was visible against their wishes.
Finally, generative AI search tools had the propensity to fabricate URLs, additionally affecting users’ ability to verify information sources. As it happens, over half of the responses from Gemini and Grok 3 cited made-up or broken URLs leading to error pages.
Can AI replace search engines?
All things considered, AI can be a solid tool when seeking responses to simple queries but, for the time being, it cannot replace traditional search engines that take the user directly to the original source of information. On top of that, fact-checking is still necessary when conducting AI searches for critical tasks.
So, if you’re wondering “Can I use AI to search?” the answer is yes, you can. However, always make sure to double-check the data it gives you with original and/or alternative sources, and give credit where credit is due, i.e. ensure all the sources are properly cited if used in content creation.