As AI Gets Smarter, Its Hallucinations Get Worse: New Research Raises Industry Alarms

Artificial Intelligence systems, particularly the large language models that drive today’s chatbots and virtual assistants, are experiencing a troubling twist in their evolution: the more advanced and “intelligent” they become, the more likely they are to fabricate convincing but false information—a phenomenon known as AI hallucination. New research and industry reporting reveal that the latest generation of “reasoning” AI models, despite appearing more capable and articulate, are showing a dramatic increase in these errors, raising serious concerns for everyday users and global industries alike.

For Thai readers who increasingly rely on tools like ChatGPT and Google’s AI for work, education, and daily life, the stakes are high. As these AI systems become enmeshed in banking, healthcare, media, and government, their proneness to inventing facts threatens not just individual credibility but also the accuracy of public information, business decision-making, and potentially national security.

AI hallucination, as defined by researchers and the technical community, is when a machine learning model produces output that sounds plausible but is actually false or misleading, often without any warning to the end user. This is not simply a technical hiccup—it’s a systemic limitation in how large language models (LLMs) gather, synthesize, and generate language from vast datasets. According to recent coverage by The New York Times and Futurism, the industry’s hope that smarter models would mean fewer errors has proven to be misplaced, with OpenAI’s latest “reasoning models”—o3 and o4-mini—found to hallucinate significantly more than previous versions. Internal accuracy benchmarks reportedly show the o4-mini model hallucinating nearly half the time (48%), while o3 does so at a 33% rate—both a sharp rise from earlier models Futurism, New York Times.

This alarming trend is not unique to OpenAI’s technology. Competing state-of-the-art models developed by Google and DeepSeek have displayed similar, sometimes higher, hallucination rates. Paul Verma, cofounder of Okahu, whose consultancy helps Thai and regional businesses implement AI, warns: “Not dealing with these errors properly basically eliminates the value of AI systems” Futurism. Vectara’s Chief Executive echoes this inevitability: “Despite our best efforts, they will always hallucinate. That will never go away.”

Attempts to resolve this issue remain elusive. Even as the world’s most advanced companies pour tens of billions of US dollars into larger, more sophisticated AI architectures, experts admit there is still a fundamental mystery at play: not even the creators of these systems fully understand why the models hallucinate more as they scale up. One key theory involves the increasing reliance on “synthetic data”—that is, data generated by AIs themselves to supplement or replace human-generated examples, as high-quality real-world data becomes scarce. Training models on this feedback loop of machine output may be compounding errors and further distancing AI from factual accuracy, according to current research discussed in InfoSecurity Magazine and TechCrunch TechCrunch, InfoSecurity Magazine.

While the hallucination problem has existed since the earliest chatbots, the scale and danger have grown. In 2023, estimates suggested about 27% of chatbot outputs could contain “hallucinations” and nearly half of all outputs carried factual inaccuracies. Now, with models like o4-mini scoring a 48% hallucination rate on OpenAI’s own benchmarks, the reliability of AI-powered learning, research, and automated services is under greater scrutiny than ever before Wikipedia.

The implications for Thai society are particularly acute, given Thailand’s accelerating embrace of digital technology in classrooms, medical consultations, and content production. Thai educators and healthcare professionals have already begun to feel the impact. According to a 2025 review in the journal “Current Review of Generative AI in Medicine,” medical practitioners must remain vigilant and always verify AI-generated information with trusted sources due to “core limitations” that remain unresolved despite rapid advances in generative AI models [PubMed, 2025].

Within classrooms, language models like ChatGPT and Gemini Advanced score high marks on knowledge tests but are underperforming when presented with nuanced or open-ended questions—requiring Thai teachers and students to treat AI-provided “facts” with caution. Leading education researchers in Thailand, referring to experiences in science and math curricula, caution that “AI should be considered an assistant, not an authority. We must always check its work to avoid propagating dangerous errors,” according to a recent roundtable at Chulalongkorn University and coverage in The Bangkok Post.

Culturally, Thailand’s reverence for respect, seniority, and knowledge creates unique risks with hallucinating AI models. Many Thais defer to perceived experts or technology, meaning unrecognized AI errors could spread rapidly and widely, especially in social media and community settings. Tech literacy campaigns, long promoted by Thailand’s Ministry of Digital Economy and Society, now face the added challenge of addressing “AI literacy”—teaching citizens to recognize that even the most authoritative-seeming AI can confidently get things wrong.

Internationally, some of the world’s leading AI theorists predict that as the rush to ever-bigger, more sophisticated models continues, we may be reaching a point of diminishing returns. As one Oxford researcher wrote in a recent Science commentary, “scale alone no longer guarantees improvement—deeper understanding and novel safeguards are required to keep AIs meaningful and safe” Mashable. The very inconsistency of AI sometimes “inventing quotations or references to nonexistent folk sayings” has become a persistent trope in Western and Asian news, with Google’s chatbot notoriously fabricating plausible Thai idioms that no one has ever used.

So what does the future hold? Industry insiders expect continued investment in this space, with hopes that new breakthroughs in model interpretability and safeguards will eventually limit hallucinations. Researchers recommend a multipronged response that includes better user interfaces exposing “confidence levels,” incorporating fact-checking layers into consumer products, and more transparent reporting of model limitations.

For Thailand’s government, businesses, educators, and the public, some practical steps are advisable for the moment:

Treat all AI-generated information as provisional, particularly in sensitive sectors such as health, law, and education;
Cross-check important outputs with trusted human experts or primary sources;
Encourage digital and AI literacy campaigns at all levels of schooling;
Demand transparency and clear disclaimers from companies marketing AI-based services; and
Continue supporting research on AI safety and model validation within Thai universities and think-tanks.

Moving forward, it is crucial that Thailand—alongside partners in ASEAN and the global community—remains vigilant, informed, and critical as artificial intelligence becomes ever more present in daily life. As the world’s smartest machines become increasingly inclined to “hallucinate,” human skepticism, collaborative research, and government oversight are likely to be our strongest safeguards.

Sources: Futurism, New York Times, Mashable, TechCrunch, InfoSecurity Magazine, Wikipedia, PubMed, Digital Trends, TechSpot

As AI Gets Smarter, Its Hallucinations Get Worse: New Research Raises Industry Alarms

Related Topics

Related Articles

Latest Generation A.I. Systems Show Rising Hallucination Rates, Raising Concerns for Reliability

AI Tools Offer Emotional Support and Practical Guidance for Laid-off Workers, Says Xbox Executive

Scientists Investigate How AI Tools Like ChatGPT Are Changing Our Brains