KPMG has withdrawn a report examining how organizations are using artificial intelligence, after concerns were raised that the document contained apparent “hallucinations”—a term that, in the AI world, usually means outputs that sound confident and coherent but are factually wrong or unsupported. The episode is notable not only because it involves one of the most recognizable names in professional services, but because it underscores a problem that has become increasingly familiar across industries: even when AI is used to summarize, analyze, or draft, the result can still drift away from reality.
What makes this story resonate is that it isn’t about a fringe experiment or a small pilot that never left the lab. It’s about a report intended for real-world decision-making—work that, by its nature, should be grounded in evidence, methodology, and verifiable claims. When errors appear in a deliverable like this, the question quickly becomes less “how could this happen?” and more “what does this mean for how we trust AI-assisted work going forward?”
The immediate takeaway is straightforward: AI outputs can be unreliable, even when they are presented as informative. But the deeper implications are more complex. This incident highlights the gap between how AI systems generate language and how organizations evaluate truth. It also raises uncomfortable questions about workflow design—specifically, where verification happens, who owns the final accuracy, and what safeguards exist when AI is involved in producing analysis.
A report meant to clarify AI usage, derailed by credibility issues
According to the reporting around the withdrawal, KPMG pulled the document after issues were identified that were described as hallucinations. While the details of the specific inaccuracies weren’t fully laid out in the public discussion, the pattern is familiar: an AI-assisted report can include statements that appear plausible, cite information incorrectly, misinterpret context, or blend separate facts into a single narrative. Sometimes these errors are subtle—an incorrect figure, a misattributed quote, a claim that doesn’t match the underlying data. Other times they’re more obvious, but still difficult to catch without careful review.
In practice, hallucinations aren’t just “wrong answers.” They’re a failure mode of generative systems that produce text by predicting what should come next, based on patterns learned during training. That means the system can generate content that reads like it belongs in a report even if it lacks a factual basis. The language may be polished; the reasoning may sound structured; the conclusion may align with what readers expect. Yet the underlying support can be missing.
For a professional services firm, that mismatch between fluent writing and verifiable evidence is precisely what creates risk. A report about AI usage is not merely descriptive—it implies a level of rigor. Readers assume that claims are backed by sources, that definitions are consistent, and that the analysis reflects actual organizational behavior rather than a generalized guess.
Why this matters now: AI adoption is accelerating faster than governance
The timing of this incident is significant. Many organizations are moving from experimentation to deployment, and from deployment to internal standardization. As AI becomes embedded in workflows—summarizing documents, drafting policy language, generating marketing copy, analyzing customer interactions—the demand for “AI literacy” grows. But literacy isn’t just knowing what AI can do. It’s knowing how to validate what it produces.
This is where governance often lags. Teams may adopt AI tools quickly because they reduce time-to-draft and lower the cost of producing first versions. But the processes required to ensure accuracy—source checking, data validation, audit trails, and clear accountability—can take longer to implement. The result is a mismatch: speed increases, but verification capacity doesn’t always scale at the same pace.
Even organizations with mature quality assurance practices can be caught off guard when AI changes the nature of the work. Traditional review processes are designed for human-authored drafts where the author’s knowledge and citations can be traced. AI-generated drafts introduce additional uncertainty: the text may be derived from patterns rather than from specific sources, and the system may not provide transparent evidence for each claim. That forces reviewers to do more than check grammar or coherence—they must verify substance.
And verification is expensive. It takes time, expertise, and access to underlying data. When deadlines are tight, the temptation is to treat AI output as “good enough” until proven otherwise. The KPMG withdrawal is a reminder that “until proven otherwise” is not a safe standard for reports that influence decisions.
The hidden complexity of “AI usage” reporting
There’s another reason this kind of error can be particularly damaging: reporting about AI usage is inherently slippery. Organizations don’t all define “AI” the same way. Some consider basic automation as AI; others reserve the term for machine learning models. Some count internal tools built on top of large language models; others include vendor-provided features. Even within a single company, different departments may use AI differently—customer support, HR screening, fraud detection, software development, analytics, and compliance each have distinct workflows and risk profiles.
So when a report tries to describe how AI is being used across a landscape, it must make choices about definitions, scope, and methodology. If those choices are unclear—or if the report inadvertently invents details—readers may not immediately detect the problem. The narrative can still feel right because it matches the general direction of industry trends. But the specifics can be wrong.
That’s why hallucinations in this context are more than embarrassing mistakes. They can distort the perceived maturity of the market, mischaracterize adoption rates, and lead organizations to benchmark themselves against inaccurate baselines. In other words, the harm isn’t only reputational; it can be strategic.
A unique take on the “trust problem”: language fluency is not evidence
One of the most important lessons from incidents like this is that language fluency is not evidence. AI systems are optimized to produce text that is statistically likely to follow a prompt. That optimization can create a false sense of authority. A report can read like it was carefully researched even when it wasn’t. It can include structured sections, confident phrasing, and a tone of measured expertise. But none of that guarantees that the claims are grounded in verifiable inputs.
This is why the conversation about AI reliability often gets stuck at the level of “the model hallucinates.” That framing is true, but incomplete. The more useful question is: what does an organization do differently when it knows that the output is not inherently trustworthy?
The answer is not simply “don’t use AI.” Many teams will continue to use AI because it provides real value—drafting, summarization, brainstorming, and even assisting with analysis. The real shift is in how AI is integrated into the workflow. Instead of treating AI as a source of truth, organizations should treat it as a generator of hypotheses, drafts, and candidate narratives that must be validated.
In a well-designed system, AI can accelerate the early stages of work while humans and data systems handle verification. But that requires explicit design choices: what inputs the AI is allowed to use, what outputs require citation, what claims must be backed by data, and what happens when verification fails.
What “withdrawal” signals about accountability
When a major firm withdraws a report, it sends a message about accountability. It suggests that the errors were significant enough to undermine confidence in the deliverable. It also indicates that the firm’s internal review process either did not catch the issues before publication or that new information emerged after release that made the problems impossible to ignore.
Either scenario is instructive. If the issues slipped through, it points to gaps in review coverage—perhaps insufficient fact-checking, inadequate source validation, or reliance on AI-generated text without robust verification. If the issues were discovered later, it suggests that post-publication monitoring and correction mechanisms are active, which is a positive sign. But the withdrawal still highlights that the cost of getting it wrong can be high, even for organizations with strong reputations.
For readers, the withdrawal should not be interpreted as proof that AI is useless. It should be interpreted as proof that quality control matters—and that quality control must be adapted to AI-assisted production.
How teams can reduce hallucination risk in real workflows
While the full internal details of KPMG’s process aren’t public, the broader industry has learned some practical strategies to reduce hallucination risk. These strategies are not magic; they are discipline.
First, require traceability for factual claims. If a report includes statistics, named entities, or specific assertions, those claims should be linked to sources that can be reviewed independently. AI can help draft the narrative, but it should not be allowed to “invent” citations. Where possible, use retrieval-based approaches that pull from approved documents or datasets, rather than relying on the model to generate facts from memory.
Second, separate drafting from verification. A common failure mode is blending the two. Teams may ask AI to produce both the draft and the “supporting evidence” in one step. That increases the chance that the evidence is also unreliable. A better approach is to let AI draft structure and wording, then run a separate verification pass where humans confirm facts and where automated checks validate consistency with known sources.
Third, define what counts as a “must-verify” claim. Not every sentence needs the same level of scrutiny. But claims about adoption rates, regulatory requirements, performance metrics, or case studies should be treated as high-risk. If a report is about AI usage, then the definition of “usage” and the numbers behind it should be treated as must-verify.
Fourth, use adversarial review. Hallucinations can be hard to spot because they are fluent. One technique is to have reviewers challenge the report aggressively: ask “what would disprove this?” and “where could this be wrong?” Another is to cross-check key claims against multiple independent sources. If the report relies on a single dataset or a single interpretation, it becomes vulnerable.
Fifth, keep an audit trail. If AI is used, organizations should record prompts, versions, and sources. That doesn’t eliminate hallucinations, but it makes debugging possible. When errors are found, you need to know whether the issue came from
