Last summer, Peter Degen’s postdoctoral supervisor walked into his office with a question that sounded, at first, like a compliment gone strange. One of their papers—published in 2017—was suddenly being cited far more than anyone expected. Citations are supposed to be slow, cumulative signals: a paper gains traction when other researchers find it useful, and the number of references rises as the work becomes part of a field’s shared toolkit. This one had followed that pattern for years, earning a respectable handful of citations.
Then the curve broke.
Instead of a steady climb, the paper began to spike. Hundreds of citations appeared in short bursts, showing up every few days as if something had switched on. Within a relatively brief period, the paper was no longer just “cited”—it was among the most cited works associated with the supervisor’s career. For a scientist, that kind of sudden visibility can feel like vindication. But it also raises an uncomfortable possibility: what if the signal is real, but the mechanism behind it isn’t?
Degen was asked to investigate, not because the paper was necessarily wrong, but because the citation surge itself didn’t behave like normal scholarly uptake. The key issue wasn’t whether the research mattered; it was how the citation system was being used, and what that might mean for the reliability of metrics scientists increasingly depend on to understand what’s happening in research.
This is where the story stops being only about one paper and starts becoming about the infrastructure of modern science.
In many fields, citations have become more than a record of intellectual influence. They’re used as proxies for quality, impact, and even momentum. Funding decisions, hiring, promotions, and institutional rankings often rely on citation counts or citation-derived indicators. Even when people claim they don’t “just chase metrics,” the reality is that metrics shape behavior. Researchers learn what gets rewarded. Journals learn what gets attention. Institutions learn what looks good on dashboards.
So when citations behave oddly—especially when the oddness looks synchronized, automated, or disproportionate—it doesn’t just distort individual careers. It can distort the collective map of knowledge.
What makes this moment different is the speed and scale at which research-adjacent content can now be produced and circulated. Over the last few years, AI tools have made it easier to draft text quickly, rephrase existing material, generate literature reviews, and even produce “related work” sections that sound plausible without necessarily adding new understanding. That capability doesn’t automatically create fraud. But it does change the environment in which scholarly communication happens. When the volume of generated academic-like writing increases, the pathways by which citations spread can change too.
A citation spike can occur for legitimate reasons—new applications, replication efforts, policy relevance, or a wave of follow-up studies. But the pattern described in Degen’s case resembles something else: a surge that arrives too quickly and too consistently to be explained by organic scholarly discovery alone.
To understand why, it helps to look at what citations actually represent in practice. A citation is not a vote for truth. It’s a reference to a source that an author claims is relevant. In an ideal world, authors cite because they read and used the work. In the real world, citations can be influenced by convenience, habit, and the structure of academic writing. Literature reviews often cite foundational papers, sometimes repeatedly, sometimes without deep engagement. Methods sections may cite standard approaches even when the authors are not fully verifying assumptions. And in some cases, citations can be inserted because they make a manuscript look better aligned with prior work.
When AI-generated or AI-assisted writing enters the pipeline, those tendencies can intensify. If a system can quickly produce a coherent paragraph that includes a citation, it can also quickly produce many paragraphs across many manuscripts. The result is not necessarily a single fraudulent paper. It can be a distributed phenomenon: lots of papers that cite the same target because the citation appears in the training data, because it’s suggested by a tool, because it’s included in a template, or because it’s been identified as “important” by some automated process.
That’s the unique twist in this kind of citation anomaly. It’s not just that low-quality work might be getting published. It’s that the citation graph—the network of references that researchers use to navigate the literature—can start to reflect the behavior of the publishing ecosystem rather than the behavior of scientific understanding.
Degen’s investigation, as described in reporting around the case, centers on the mechanism behind the spike. The paper in question evaluated the accuracy of a specific statistical analysis approach applied to epidemiological data. Epidemiology is a domain where statistical choices matter enormously. Small differences in assumptions can lead to large differences in conclusions, especially when data are noisy, incomplete, or biased by sampling. A paper that clarifies the accuracy of a method could be genuinely valuable, and it could become widely cited if the method is adopted broadly.
But the timing and intensity of the surge suggest that something else may have been at play. If the paper was being cited “every few days” in hundreds of instances, the question becomes: were those citing papers actually using the method? Were they testing it? Were they applying it to new datasets? Or were they citing it as part of a generic methods narrative—one that could be generated quickly and repeatedly?
This distinction matters because it separates two kinds of impact. One is epistemic impact: the work changes how researchers think or analyze. The other is bibliometric impact: the work becomes visible in citation databases regardless of whether it’s truly being used.
Scientists care about epistemic impact. But the systems around them often measure bibliometric impact. When those diverge, the metric becomes misleading.
The broader challenge, reported across AI and research communities, is that the academic ecosystem is under stress from the sheer volume of content that can now be produced. Peer review is designed to evaluate manuscripts, not to continuously audit the integrity of the citation graph. Citation databases index what gets published. Search engines and recommendation systems amplify what gets indexed. And once a paper is flagged as “highly cited,” it can attract even more citations simply because it appears prominent.
This creates feedback loops. A paper that becomes highly cited can become more likely to be recommended by tools that rank sources by popularity. Those tools then influence what authors include. Authors then include those citations, sometimes without reading deeply, because the citation is already “validated” by its prominence. The loop can accelerate quickly, especially when the publishing pipeline is producing many manuscripts at once.
In that environment, a citation spike can be both a symptom and a cause. It’s a symptom of a system that is generating and circulating academic-like text at scale. And it can become a cause of further distortion, because the spike changes how other researchers discover and select sources.
There’s also a subtler issue: citations are often used as a proxy for quality, but quality is not the same as usefulness. A paper can be cited because it is foundational, because it is controversial, because it is frequently misused, or because it is referenced as a cautionary example. A citation count doesn’t tell you which of those is happening. It tells you that someone pointed to it.
When the citation graph is stable, those ambiguities average out. When the graph is being perturbed by automation, the ambiguities can become systematic. If many citing papers are generated with minimal engagement, the citation count becomes less a measure of influence and more a measure of distribution.
That’s why the Degen case resonates beyond one lab. It highlights a failure mode in how science tracks itself.
If citations are becoming harder to interpret, what should scientists do?
One answer is to treat citation metrics as weaker signals than they used to be. That sounds obvious, but it’s difficult to implement because institutions still need numbers. Another answer is to improve the citation ecosystem itself—by making it harder to publish low-quality work, by strengthening peer review, and by developing tools that detect citation manipulation or citation laundering. But detection is hard. Citation patterns can be legitimate, and false positives can harm real scholarship.
A more practical approach is to shift emphasis from raw counts to context-aware evaluation. Instead of asking “How many citations does this paper have?” researchers can ask “How are those citations being used?” Are citing papers applying the method correctly? Are they replicating results? Are they extending the work? Or are they citing it as background while doing something else entirely?
Some of this can be done manually, but at scale it requires computational support. Bibliometric analysis can examine citation contexts, co-citation networks, and the relationship between citing papers’ topics and the cited paper’s subject. If a paper is being cited in contexts that don’t match its scope, that’s a red flag. If citations cluster around certain templates or certain sections of papers, that’s another clue. If the citing papers share unusual characteristics—similar phrasing, similar structure, or similar reference lists—that can indicate coordinated behavior.
However, even these approaches face limitations. AI-generated writing can be diverse enough to evade simple similarity checks. And legitimate interdisciplinary work can produce citations that look “off” at first glance. The goal isn’t to punish citations; it’s to understand when citation behavior is decoupled from scientific engagement.
There’s also a cultural dimension. Academia has long treated citations as a kind of social proof. When a paper is highly cited, it feels safer to cite it. That safety can be rational—high citation counts often correlate with genuine influence—but it can also become a trap. In a world where citation counts can be inflated by automated or low-engagement publishing, social proof becomes less reliable.
The Degen story underscores that the problem isn’t only “AI slop” in the sense of low-quality writing. It’s the way low-quality writing can contaminate the signals that scientists use to navigate the literature. Even if the underlying research is correct, the surrounding ecosystem can make it harder to know what’s happening.
And that affects more than individual reputations. It affects how young researchers choose what to study. It affects what gets funded. It affects what
