New AI Training Method Achieves Breakthrough with Just 78 Examples, Outperforming Traditional Models

A recent study conducted by researchers from Shanghai Jiao Tong University and the SII Generative AI Research Lab (GAIR) has unveiled a revolutionary approach to training large language models (LLMs) for complex, autonomous tasks. This groundbreaking research challenges the long-standing belief that vast amounts of data are essential for developing effective AI agents. Instead, the study introduces a new framework known as LIMI, which stands for “Less Is More for Intelligent Agency.” The findings suggest that the quality of data, rather than its quantity, is the critical factor in achieving high levels of machine autonomy.

The LIMI framework builds on previous research that has indicated that smaller, carefully curated datasets can yield superior results compared to traditional models trained on extensive datasets. In their experiments, the researchers demonstrated that they could train LLMs using just 78 meticulously selected examples, enabling these models to outperform counterparts trained on thousands of examples by a significant margin on key industry benchmarks.

This discovery holds profound implications for enterprise applications, particularly in scenarios where data is scarce or costly to collect. As organizations increasingly seek to leverage AI for various tasks, the ability to develop powerful AI agents with minimal data could transform the landscape of artificial intelligence.

Defining Agency in AI

At the core of this research is the concept of agency, which the researchers define as the emergent capacity of AI systems to function as autonomous agents. These agents actively discover problems, formulate hypotheses, and execute solutions through self-directed engagement with their environments and tools. In essence, agency refers to AI systems that do not merely think but also work effectively in real-world scenarios.

Traditionally, training frameworks have operated under the assumption that higher levels of agentic intelligence necessitate large volumes of data. This perspective aligns with the classic scaling laws of language modeling, which suggest that more data leads to better performance. However, the researchers argue that this approach often results in increasingly complex training pipelines and substantial resource requirements. Moreover, in many fields, data is not only scarce but also challenging and expensive to obtain.

The LIMI framework seeks to overturn this paradigm by demonstrating that sophisticated agentic intelligence can emerge from minimal yet strategically curated demonstrations of autonomous behavior. By focusing on the quality of the training data, the researchers aim to simplify the training process while enhancing the effectiveness of AI agents.

How LIMI Works

The LIMI framework operates through a systematic pipeline designed to collect high-quality demonstrations of agentic tasks. Each demonstration consists of two primary components: a query and a trajectory. A query represents a natural language request from a user, such as a software development requirement or a scientific research goal. The trajectory, on the other hand, encompasses the series of steps the AI takes to address the query, including its internal reasoning, interactions with external tools, and observations from the environment.

For instance, if the query is “build a simple chat application,” the trajectory would detail the agent’s internal thought processes, action plans, code it writes and executes, and the resulting outputs or errors. This trajectory may involve multiple iterations of planning, execution, and reflection until the desired objective is achieved.

To construct their dataset, the researchers began with 60 queries derived from real-world scenarios encountered by professional developers and researchers. They then expanded this pool by utilizing GPT-5 to synthesize additional queries based on GitHub Pull Requests. A team of four computer science PhD students was tasked with vetting the quality of these queries, ultimately selecting 18 examples to create a high-quality set of 78 queries focused on software development and research workflows.

The generation of trajectories involved collaboration between the PhD students and a command-line interface (CLI) coding agent powered by GPT-5. Together, they completed the 78 tasks through an iterative process, capturing the entire interaction sequence until each task was successfully accomplished. This approach ensured that the models learned not only from successful outcomes but also from the complete problem-solving process, including how to adapt strategies and recover from failures during collaborative execution.

Evaluating LIMI’s Effectiveness

To assess the effectiveness of their framework, the research team evaluated the models on AgencyBench, a benchmark specifically designed to measure agentic skills, alongside other established benchmarks for tool use and coding. They fine-tuned GLM-4.5, a powerful open-source model, using their 78-sample dataset and compared its performance against several leading models, including the base GLM-4.5, Kimi-K2-Instruct, and DeepSeek-V3.1.

The results were striking. The LIMI-trained model achieved an average score of 73.5% on AgencyBench, significantly outperforming all baseline models, the highest of which (GLM-4.5) scored only 45.1%. This superiority extended to other benchmarks covering tool use, coding, and scientific computing, where LIMI consistently outperformed all competitors.

Perhaps most notably, the study revealed that the model trained on just 78 examples surpassed models trained with 10,000 samples from another dataset, delivering superior performance with 128 times less data. This finding fundamentally reshapes the way we approach the development of autonomous AI systems, suggesting that mastering agency requires a deeper understanding of its essence rather than merely scaling training data.

Implications for Enterprise Applications

The implications of this research are profound, particularly for enterprises seeking to harness the power of AI. Traditionally, organizations have invested significant resources into collecting vast datasets to train their AI models. However, the LIMI framework offers a practical alternative. By focusing on creating small, high-quality datasets, companies can leverage their in-house talent and subject matter experts to develop bespoke agentic tasks without the need for massive data collection projects.

This shift in approach lowers the barrier to entry for organizations looking to build custom AI agents tailored to their specific workflows. Instead of relying on extensive datasets that may be difficult or expensive to obtain, businesses can now concentrate on curating a limited number of high-quality examples that align closely with their operational needs.

As industries transition from viewing AI as a mere tool for analysis to recognizing its potential as a working agent capable of executing tasks autonomously, the LIMI framework provides a sustainable pathway for cultivating truly agentic intelligence. This evolution in thinking about AI could lead to more efficient and effective applications across various sectors, from software development to scientific research.

Conclusion

The LIMI framework represents a significant advancement in the field of artificial intelligence, challenging conventional wisdom about the necessity of large datasets for training autonomous agents. By demonstrating that high-quality, strategically curated data can yield superior results, this research opens up new avenues for developing intelligent AI systems that can operate effectively in real-world scenarios.

As organizations increasingly seek to leverage AI for a wide range of applications, the insights gained from this study will undoubtedly influence future research and development efforts. The ability to create powerful AI agents with minimal data not only enhances efficiency but also democratizes access to advanced AI capabilities, empowering businesses of all sizes to innovate and thrive in an increasingly competitive landscape.

In summary, the LIMI framework underscores the importance of data quality over quantity in the pursuit of intelligent agency. As researchers continue to explore the boundaries of AI, the lessons learned from this study will play a crucial role in shaping the future of autonomous systems and their integration into everyday workflows.