Latest Generation A.I. Systems Show Rising Hallucination Rates, Raising Concerns for Reliability

A new wave of powerful artificial intelligence systems—from leading global tech companies like OpenAI, Google, and DeepSeek—are increasingly generating factual errors despite their advanced capabilities, sparking growing concerns among users, researchers, and businesses worldwide. As these A.I. bots become more capable at tasks like complex reasoning and mathematics, their tendency to produce incorrect or entirely fabricated information—known as “hallucinations”—is not only persisting but actually worsening, as revealed in a recent investigative report by The New York Times (nytimes.com).

For Thai readers, whose lives, work, and even education are being rapidly transformed by automated chatbots and digital assistants, this trend highlights critical risks. When A.I. systems are relied upon for important decisions or information—like medical advice, legal guidance, or sensitive business operations—these hallucinations can lead to costly errors and loss of trust.

Recent incidents shed light on the severity of the issue. A case involving Cursor, a programming tool, saw an A.I.-powered tech support bot falsely notify customers of a policy change, prompting confusion, backlash, and cancelled subscriptions. The company’s leadership clarified later that the supposed policy never existed—a textbook A.I. hallucination. Similar stories are becoming commonplace on Thai online forums, especially as young professionals and students rely on chatbots for research, translation, or exam preparation.

The fundamental problem, experts agree, is rooted in how these A.I. systems are trained and how they “decide” what to say. Modern chatbots—including those tied to internet search engines such as Google and Bing—use vast datasets and statistical models, effectively guessing the “right” answer rather than applying a strict rule set. As a result, mistakes are inevitable. Amr Awadallah, chief executive of Vectara and a former Google executive, put it simply: “Despite our best efforts, they will always hallucinate. That will never go away.”

Recent research is painting an even starker picture. The most advanced OpenAI models (referred to as o3 and o4-mini), have shown dramatically increased hallucination rates. On one benchmark test (PersonQA), the latest OpenAI system hallucinated 33% of the time—over twice the rate of its predecessor. On more general question-answering benchmarks (SimpleQA), its error rate soared as high as 79%. Tests on Google’s reasoning models and DeepSeek’s R1 have shown similar or greater error frequencies.

For Thai educators and students, especially those embracing digital learning platforms or smart classroom assistants, these findings are particularly troubling. The widespread roll-out of A.I. in Thailand’s education sector—from language tutoring apps to grading software—has been touted as a leap forward, but if these tools cannot reliably distinguish fact from fiction, there is a risk that misinformation could quietly become embedded in the learning process.

The underlying challenge is only made worse by a widely-adopted new training approach called “reinforcement learning.” In this setup, A.I. learns primarily through trial and error, adjusting its actions to maximize rewards—leading to surges in skills like math, but instability in factual accuracy. As highlighted by Laura Perez-Beltrachini, a University of Edinburgh researcher focusing on the hallucination problem, “The way these systems are trained, they will start focusing on one task—and start forgetting about others.”

The complexity of these deep neural networks is such that even leading scientists don’t fully understand how or why hallucinations happen. Professor Hannaneh Hajishirzi from the University of Washington, part of a team attempting to trace A.I. behavior back to its training data, admits that the scale and mystery remain daunting: “We still don’t know how these models work exactly,” she said, underlining the urgent need for additional research and increased transparency from tech companies.

As global companies near the limits of available English-language internet data for feeding their A.I. systems, the push for improvement now increasingly relies on these less-predictable training tactics. The result has been a notable decline in factual reliability at the very moment when people are starting to trust and use A.I. assistants for more consequential work.

According to internal data released by leading technology companies and independently verified by firms like Vectara, newer reasoning models are often making up information in as few as 1-2% of complex summarization tasks, but this figure can spike as high as 27% or more depending on the system and task. For business leaders, medical professionals, policy-makers, and educators in Thailand, this puts a spotlight on the need for rigorous verification and oversight.

The issue has implications in legal and ethical spheres, too. The New York Times is currently pursuing legal action against OpenAI and partner Microsoft, alleging copyright infringement relating to news content used in A.I. training—raising broader questions about intellectual property, data privacy, and the responsibilities of A.I. developers (see: nytimes.com). These debates are particularly relevant in Thailand, where digital literacy and legal frameworks around A.I. remain works-in-progress and where new A.I.-powered services are proliferating faster than effective regulation.

Looking ahead, the proliferation of flawed yet powerful A.I. systems will require a more cautious and pragmatic approach. Industry leaders are promising improvements. OpenAI, for example, says it is actively researching ways to reduce hallucinations, with spokesperson Gaby Raila stating, “We’ll continue our research on hallucinations across all models to improve accuracy and reliability.” However, as history shows, technical fixes may not be a panacea.

Practical recommendations for Thai readers, whether individuals, educators, or business stakeholders, include: never relying solely on A.I. for critical decisions; double-checking all facts, figures, and source citations produced by chatbots; and staying updated on the latest guidance from reputable institutions. Institutions like the Ministry of Digital Economy and Society, as well as leading Thai universities, are well positioned to issue clear usage guidelines, promote digital literacy in schools, and encourage responsible deployment of A.I. in both public and private sectors. Media literacy campaigns, similar to those used to combat online scams and fake news, should now include reference to A.I.-generated errors.

In summary, while the promise of A.I. in revolutionizing work, study, and daily life in Thailand is considerable, the risk of hallucination is here to stay—at least for now. As a community, Thais must adapt by combining the power of advanced digital tools with a healthy dose of traditional scepticism and fact-checking habits, ensuring that human judgment and cultural context guide our embrace of new technology.

Sources:

Latest Generation A.I. Systems Show Rising Hallucination Rates, Raising Concerns for Reliability

Related Topics

Related Articles

As AI Gets Smarter, Its Hallucinations Get Worse: New Research Raises Industry Alarms

Rise of AI Sparks Debate Over Human Relevance and the Future of Work

Humanity-Ending AI? Exploring the Latest Research on Existential Risks