Police forces in England and Wales have been told to pause the use of artificial intelligence in court statements unless specific safeguards are in place, according to the head of Police.AI, the organisation that has been at the centre of efforts to bring AI into policing.
The instruction is significant not because it suggests a blanket retreat from technology, but because it draws a bright line around one of the most sensitive parts of the criminal justice process: the production of statements that may be relied upon by courts. In other words, the message is less “stop using AI” and more “do not automate justice-critical outputs until you can prove the system is safe, accurate, and properly governed.”
For police leaders, this is a practical warning. Court statements are not routine administrative documents. They are evidence. They can shape bail decisions, influence charging choices, affect how cases are presented, and ultimately determine outcomes for defendants, victims, and witnesses. If an AI tool introduces errors—whether through misinterpretation, hallucination, biased summarisation, or inconsistent formatting—those mistakes can become embedded in the record. The harm is not theoretical; it can be procedural, evidential, and personal.
What makes the guidance notable is the emphasis on safeguards before automation. That framing matters because it implies that the debate is not simply about whether AI can help with drafting or analysis, but about what level of human control is required, what checks must exist, and how accountability should work when something goes wrong.
In recent years, policing has experimented with AI in multiple ways: triaging reports, assisting with translation, supporting investigations, and helping officers search large volumes of data. Some uses are relatively low-risk—tools that suggest, organise, or highlight information for human review. Others are higher-risk, particularly where the output is treated as authoritative. Court statements sit firmly in the latter category.
The instruction to halt AI use in court statements until safeguards are in place suggests that, at least for now, the threshold for “acceptable” AI assistance is higher than many forces may have assumed. It also indicates that regulators, oversight bodies, and legal stakeholders are increasingly focused on the end-to-end chain of custody for information: how data is collected, how it is processed, who reviews it, and how the final product can be defended in court.
A key question is what “safeguards” means in practice. While the details of the instruction may vary depending on the force and the specific AI workflow, safeguards in this context typically include several overlapping elements.
First, there must be accuracy and reliability testing that reflects real-world conditions. It is not enough for an AI system to perform well on curated examples. Court statements are built from messy inputs: incomplete notes, audio transcripts with background noise, witness accounts that evolve over time, and case files that contain contradictions. A safeguard approach would require testing that measures error rates, identifies failure modes, and demonstrates that the system behaves consistently across different types of cases.
Second, there must be transparency about how the AI was used. Courts and defence teams need to understand whether AI contributed to drafting, what it was trained on (or how it was configured), and what steps were taken to verify the content. If an AI tool is used as a “ghostwriter,” even partially, the provenance of the statement becomes harder to explain. Safeguards therefore often include documentation requirements and clear audit trails.
Third, there must be robust human oversight. The instruction to pause AI use until safeguards are in place implies that, in the current state, some workflows may not have ensured that a qualified person meaningfully checks every claim, every quote, and every factual assertion. Human oversight cannot be a rubber stamp. It needs to be structured: officers must know what the AI did, what it might get wrong, and how to verify the output against primary sources.
Fourth, there must be governance: policies that define acceptable use, training for staff, and escalation routes when issues arise. Without governance, AI tools can drift into “normal” use faster than the organisation’s ability to manage risk. Safeguards are meant to prevent that drift.
Fifth, there must be accountability mechanisms. If an AI-assisted statement contains an error, who is responsible—the officer who signed it, the force that deployed the tool, the vendor that supplied it, or the system itself? Safeguards should clarify responsibility and ensure that errors can be investigated and corrected quickly.
This is where the instruction becomes more than a technical adjustment. It is a signal that policing is moving toward a model where AI is treated like a high-impact system rather than a productivity feature. That shift aligns with broader trends in public sector technology: the recognition that systems affecting rights and legal processes require stronger controls than systems that merely assist with internal tasks.
The legal stakes are also part of the story. In England and Wales, criminal proceedings rely heavily on the integrity of evidence and the fairness of disclosure. If AI is used to draft or refine statements, questions can arise about whether the statement accurately reflects what was observed or reported. Even if the AI does not fabricate new facts, it can still distort meaning through paraphrasing, omit nuance, or introduce subtle changes that alter interpretation. Those risks are precisely the kind that defence counsel may challenge, especially if they suspect that the statement is not a faithful representation of the underlying material.
There is also the issue of consistency. If different forces or even different teams within a force use AI tools in different ways, the style and structure of statements may vary. That variation can be manageable, but it can also create confusion about what is “standard” and what is “AI-generated.” Safeguards likely aim to standardise the approach so that the court receives predictable, verifiable outputs.
Another dimension is the relationship between AI and officer confidence. When AI produces a polished draft, it can create an illusion of correctness. Officers may be more likely to accept the output quickly, especially under time pressure. Safeguards must therefore address not only the system’s performance but also the human factors: how people interact with AI, how they verify outputs, and how they avoid over-trusting the tool.
This is where the unique take on the development lies. The instruction to pause AI use in court statements is not simply about preventing errors; it is about preventing a particular kind of institutional failure—one where the organisation’s processes evolve faster than its ability to supervise them. Technology can make workflows faster, but speed is not the same as reliability. In justice settings, reliability is the currency.
If police forces are told to halt AI use until safeguards are in place, it suggests that some existing workflows may have been too optimistic about what “assistance” means. In everyday language, “AI-assisted drafting” can sound harmless. But in legal contexts, even small changes can matter. A sentence that reads smoothly may still be wrong. A summary that sounds plausible may still omit critical detail. A rephrasing that improves clarity may still change emphasis. Safeguards are designed to ensure that the final statement remains anchored to primary evidence and that any AI contribution is fully accountable.
It is also worth noting that this guidance may reshape how forces think about AI procurement and deployment. If court statements are considered high-risk outputs, then vendors and internal teams will likely face stricter requirements: clearer documentation, better logging, more controllable generation, and stronger validation processes. Forces may demand features such as citation to source material, constraints that prevent the system from inventing details, and interfaces that make verification easier rather than harder.
In practical terms, the pause could mean several things. Some forces may stop using AI tools entirely for statement drafting. Others may restrict AI to non-evidential tasks—such as formatting, grammar suggestions, or translation—while keeping factual content strictly manual. Some may require additional sign-off steps, such as mandatory review by a supervisor or a specialist unit before any AI-influenced statement reaches court.
There is also the possibility of a phased approach. Instead of a permanent ban, forces may be allowed to resume AI use once they demonstrate compliance with agreed safeguards. That would align with the idea that the goal is responsible deployment rather than technological rejection.
However, even a phased approach carries challenges. Safeguards take time to implement: training staff, updating policies, configuring systems, and building audit trails. During that period, forces may experience friction in workflows. Yet that friction may be the point. Justice systems have historically been cautious about introducing tools that can change the nature of evidence. The pause is a reminder that caution is not bureaucracy—it is protection.
The instruction also highlights a broader tension in public sector AI: the desire to modernise and the obligation to preserve trust. AI can improve efficiency, but it can also erode confidence if people believe the system is opaque or unaccountable. In policing, trust is not abstract. It affects cooperation from communities, perceptions of fairness, and the legitimacy of outcomes.
By insisting on safeguards before automation, the guidance attempts to preserve that trust. It tells the public and the courts that police forces are not treating AI as a shortcut around legal standards. Instead, they are treating it as a tool that must earn its place through demonstrable safety and accountability.
There is another subtle implication: the instruction may encourage a shift from “automation” to “augmentation.” Automation suggests the system produces the output with minimal human involvement. Augmentation suggests the system supports human work while leaving the final responsibility clearly with people. In court statements, augmentation is likely the safer direction. Officers can still benefit from AI for tasks like structuring drafts, improving readability, or highlighting inconsistencies—but the factual core must remain grounded in verified evidence.
That distinction matters because it changes how success is measured. Under an augmentation model, the question becomes: does the tool help humans produce more accurate statements without increasing risk? Under an automation model, the question becomes: can the system reliably produce statements on its own? The instruction to pause until safeguards are in place suggests that, for now, the bar for automation in this domain has not been met.
For readers trying to understand what happens next, the most realistic expectation is that forces will conduct internal reviews of their AI workflows
