arXiv to Ban Authors for a Year Over AI Slop Papers

arXiv is moving from “trust, but verify” to something closer to “verify, or don’t publish.” In a policy shift aimed squarely at the growing problem of low-quality, AI-generated research writeups, the platform’s computer science leadership says it will ban authors for a year when there is clear evidence that LLM-generated material was not actually checked.

The announcement, attributed to Thomas Dietterich—an arXiv section chair for computer science—signals that arXiv is treating certain kinds of AI slop not as a vague quality issue, but as a conduct and accountability problem. The trigger is specific: if a submission contains “incontrovertible evidence” that the authors did not check results produced by large language models, then the authors can be barred from arXiv for a year. While the details of how “incontrovertible” will be determined are not fully spelled out in the public summary, the examples Dietterich pointed to are telling. They include hallucinated references—citations that look plausible but do not exist or do not support the claims being made—and “meta-comments” left by an LLM, such as telltale editorial remarks that reveal the text was generated without being properly cleaned up and verified.

This is not the first time arXiv has faced concerns about paper quality. But the nature of the current wave is different. For years, preprints have varied widely in rigor, and the platform has always been explicit that posting a preprint does not mean peer review. Yet the new concern is not merely that some papers are weak; it’s that some papers may be built on unverified generative output—text that can sound authoritative while being wrong in ways that are difficult for readers to detect quickly. In other words, the risk is shifting from “bad science” to “unverified machine-generated scaffolding.”

To understand why this matters, it helps to consider what LLMs change in the publication pipeline. A human author can make mistakes, but those mistakes often leave traces: a flawed experiment, a misapplied method, a statistical error, or a reasoning gap that can be challenged. With LLM-assisted writing, however, the failure mode can be more subtle. References can be fabricated. Explanations can be coherent while still being disconnected from the underlying work. Even when the core technical content is real, the surrounding narrative—what the paper claims it did, why it matters, and how it relates to prior work—can be generated or polished in ways that are not grounded in verification. That creates a new kind of credibility problem: the paper may read like scholarship, but parts of it may be closer to a confident draft than a documented study.

Dietterich’s framing suggests arXiv is trying to draw a line between acceptable use of generative tools and unacceptable submission behavior. The key phrase is not “AI was used,” but “the results of LLM generation were not checked.” That distinction matters because it implies arXiv is not aiming to ban AI assistance outright. Instead, it is targeting a specific breach: submitting content that bears the hallmarks of being generated and then treated as if it were validated, without the necessary verification steps.

The examples—hallucinated references and “meta-comments”—are also important because they are relatively concrete. Hallucinated references are a classic symptom of LLM unreliability: the model can produce citations that match the style of real papers while being entirely fabricated. Meta-comments are another symptom: LLMs sometimes generate internal-sounding notes, transitions, or commentary that should never appear in a final academic manuscript. When these artifacts show up in a submission, they indicate not just poor editing, but a deeper failure to treat the output as something requiring human review.

From a policy perspective, this approach is both pragmatic and risky. Pragmatic because it gives arXiv a basis for action that doesn’t require subjective judgments about novelty or taste. Risky because “incontrovertible evidence” can be interpreted narrowly or broadly depending on enforcement. If enforcement is too strict, it could punish authors for honest mistakes—especially in cases where references are wrong due to citation formatting errors, version mismatches, or incomplete bibliographic data. If enforcement is too lenient, it may fail to deter the worst offenders. The balance will likely depend on how arXiv operationalizes the standard: what counts as incontrovertible, who decides, and what appeals process exists.

There is also a second part to the shift: future arXiv submissions will need to be accepted at a “reputable peer-reviewed venue.” That requirement changes the incentives around preprints. Historically, arXiv has been a place where researchers share early results, get feedback, and establish precedence. Requiring peer-reviewed acceptance for future submissions—at least for those under the policy umbrella—would reduce the ability to use arXiv as a rapid dissemination channel. It also raises a question that many researchers will immediately ask: does this apply only to banned authors, or does it represent a broader rule for all submissions? The public summary indicates it is tied to the policy direction, but the exact scope is not fully detailed in the excerpted information. Still, the message is clear: arXiv is signaling that it wants a higher bar for certain categories of content, and that it is willing to restrict access when authors repeatedly cross the line.

This is where the story becomes more than a simple “ban bad papers” headline. The deeper issue is trust infrastructure. Preprint servers are a kind of public commons: they lower barriers to sharing and accelerate scientific communication. But commons systems degrade when participants exploit the lack of gatekeeping. In the past, exploitation might have looked like plagiarism, duplicate submissions, or deliberate fraud. Today, exploitation can look like generating plausible text and packaging it as research, even when the underlying claims are not verified. The result is a flood of documents that consume attention and computational resources—both human and automated—without contributing reliable knowledge.

That attention cost is not theoretical. Researchers increasingly rely on search, recommendation systems, and automated literature mining. If the corpus becomes saturated with low-quality or unverified content, downstream systems can learn the wrong patterns. Even if a reader eventually spots the problem, the damage may already be done: the paper may be cited, summarized, or used as training data for other models. In that sense, AI slop is not just an aesthetic problem; it is a contamination problem.

ArXiv’s move can be read as an attempt to protect the integrity of the preprint ecosystem by making accountability explicit. The one-year ban is a deterrent, but it also communicates a principle: authors are responsible for verifying what they submit, including content that originates from generative tools. This aligns with arXiv’s Code of Conduct language referenced in the public discussion. By tying enforcement to conduct rather than to “quality” alone, arXiv is effectively saying that the platform is not merely a library of documents—it is a community with obligations.

There is also a cultural dimension. Many researchers are experimenting with LLMs for legitimate tasks: drafting introductions, rephrasing for clarity, summarizing related work, or helping with code explanations. Those uses can be beneficial, especially when paired with careful verification. But the existence of a tool that can produce fluent text at scale creates a temptation to treat fluency as evidence. The policy shift is a direct response to that temptation. It tells authors: if you use an LLM to generate claims, you must verify them as you would any other source. If you don’t, you’re not just making a mistake—you’re violating the expectations of scholarly communication.

One unique angle in this story is how it reframes “AI slop” as a reproducibility and verification issue rather than a writing issue. Hallucinated references are not merely embarrassing; they break the chain of evidence. Meta-comments are not merely unprofessional; they suggest the manuscript was not treated as a final artifact. Both are signals that the paper’s content may not have been grounded in the authors’ own checking. That is why the policy focuses on whether authors checked LLM-generated results. It’s about whether the paper can be trusted as a record of work.

Still, the policy raises practical questions that will matter to authors and reviewers alike. How will arXiv detect hallucinated references reliably? Will it rely on reports from the community, automated checks, or both? Hallucinated references can sometimes be caught by bibliographic databases, DOI resolvers, or cross-checking against known corpora. But not all errors are easily detectable. Some references may exist but be irrelevant. Some may be real but incorrectly cited. Some may be missing due to incomplete metadata. The policy’s “incontrovertible evidence” standard suggests that enforcement will likely focus on cases where the mismatch is undeniable—such as references that do not correspond to any real publication, or meta-comments that clearly indicate unedited LLM output.

Another question is how this interacts with the reality of academic writing. Even without LLMs, references can be wrong. People cite the wrong year, misremember a title, or copy a citation template incorrectly. The difference is that LLM-driven errors may be more systematic and more frequent, especially when authors ask models to “add relevant citations” or “improve the related work section.” If enforcement is based on clear artifacts, it may avoid punishing ordinary citation mistakes. But the community will watch closely to see whether the policy becomes a blunt instrument or a targeted one.

The “reputable peer-reviewed venue” requirement also has implications for how researchers plan their workflows. Many fields use arXiv as a staging ground before journal submission. If certain authors are required to go through peer review before returning to arXiv, it could slow their dissemination. But it could also encourage better internal review before posting. In effect, it pushes authors toward a more disciplined pipeline: verify claims, validate references, and ensure the manuscript reflects actual work rather than generated prose.

There is also a broader policy conversation lurking behind this move: what should