OpenAGI Launches Lux, an AI Model Claiming to Outperform OpenAI and Anthropic in Computer Control

In a significant development within the artificial intelligence landscape, OpenAGI, a stealth startup founded by MIT researcher Zengyi Qin, has emerged from obscurity to unveil its groundbreaking AI model, Lux. This new foundation model is designed to autonomously control computers by interpreting screenshots and executing actions across various desktop applications, positioning itself as a formidable competitor to established players like OpenAI and Anthropic.

OpenAGI claims that Lux outperforms existing models in the market, achieving an impressive 83.6 percent success rate on the Online-Mind2Web benchmark, which has quickly become the industry standard for evaluating AI agents that operate in dynamic web environments. For context, OpenAI’s Operator, released earlier this year, scored 61.3 percent, while Anthropic’s Claude Computer Use managed only 56.3 percent. These figures suggest that Lux not only surpasses its competitors but does so at a fraction of the cost, operating at approximately one-tenth the expense of leading models from OpenAI and Anthropic.

The announcement of Lux comes at a pivotal moment for the AI industry, where technology giants and startups alike have invested billions into developing autonomous agents capable of navigating software, booking travel, filling out forms, and executing complex workflows. Companies such as OpenAI, Anthropic, Google, and Microsoft have all released or announced agent products in the past year, betting that computer-controlling AI will become as transformative as chatbots. However, independent research has raised questions about the actual capabilities of these agents, suggesting that many may not perform as well as their marketing claims imply.

The Online-Mind2Web benchmark, developed by researchers at Ohio State University and the University of California, Berkeley, was specifically designed to expose the gap between marketing claims and real-world performance. It comprises 300 diverse tasks across 136 real websites, ranging from booking flights to navigating complex e-commerce checkouts. Unlike earlier benchmarks that relied on cached parts of websites, Online-Mind2Web tests agents in live online environments where pages change dynamically and unexpected obstacles can arise. The results from this benchmark have painted a stark picture of the competency of current agents, revealing a level of over-optimism in previously reported results.

When the Ohio State team evaluated five leading web agents through careful human assessment, they found that many recent systems, despite substantial investment and marketing hype, did not outperform SeeAct, a relatively simple agent released in January 2024. Even OpenAI’s Operator, which was the best performer among commercial offerings in their study, achieved only 61 percent success. This raises critical questions about the reliability and effectiveness of current AI agents in practical applications.

OpenAGI’s claimed performance advantage with Lux stems from a novel training methodology known as “Agentic Active Pre-training.” This approach fundamentally differs from traditional large language model (LLM) training, which typically involves feeding vast amounts of text data into the model to learn to produce coherent text. In contrast, Lux is trained to produce actions. It learns from a large dataset of computer screenshots paired with action sequences, enabling it to interpret visual interfaces and determine the necessary clicks, keystrokes, and navigation steps to accomplish specific goals.

Zengyi Qin, in an exclusive interview, explained that this action-oriented training allows the model to actively explore the computer environment. Such exploration generates new knowledge, which is then fed back into the model for further training. This self-reinforcing process could explain how a smaller team like OpenAGI might achieve results that larger organizations struggle to replicate. Instead of relying on ever-larger static datasets, Lux’s approach allows the model to continuously improve by generating its own training data through exploration.

The implications of Lux’s capabilities extend beyond mere performance metrics. One of the critical distinctions highlighted by OpenAGI is that Lux can control applications across an entire desktop operating system, rather than being limited to web browser tasks. Most commercially available computer-use agents, including early versions of Anthropic’s Claude Computer Use, focus primarily on browser-based tasks. This limitation excludes vast categories of productivity work that occur in desktop applications, such as spreadsheets in Microsoft Excel, communications in Slack, design work in Adobe products, and code editing in development environments.

By enabling Lux to navigate these native applications, OpenAGI significantly expands the addressable market for computer-use agents. The company is also releasing a developer software development kit (SDK) alongside the model, allowing third parties to build applications on top of Lux. This move could foster innovation and create a vibrant ecosystem around the Lux model, potentially leading to new use cases and applications that leverage its capabilities.

In addition to its performance and versatility, OpenAGI claims that Lux operates with significant cost advantages. The company asserts that Lux can execute tasks faster and more efficiently than its competitors, making it an attractive option for businesses looking to integrate AI into their workflows without incurring exorbitant costs. This cost-effectiveness could be a game-changer for enterprises seeking to adopt AI solutions while managing their budgets.

However, with great power comes great responsibility. The introduction of computer-use agents like Lux presents novel safety challenges that do not arise with conventional chatbots. An AI system capable of clicking buttons, entering text, and navigating applications could, if misdirected, cause significant harm—such as transferring money, deleting files, or exfiltrating sensitive information. Recognizing these risks, OpenAGI has built safety mechanisms directly into Lux. When the model encounters requests that violate its safety policies, it refuses to proceed and alerts the user.

For instance, if a user were to ask Lux to “copy my bank details and paste it into a new Google doc,” the model would respond with an internal reasoning step, recognizing the sensitivity of the request. It would then issue a warning to the user instead of executing the potentially dangerous command. Such safeguards are crucial as the proliferation of computer-use agents raises concerns about security and misuse.

As OpenAGI continues to develop Lux, the company is also working with Intel to optimize the model for edge devices. This collaboration aims to enable Lux to run locally on laptops and workstations, alleviating concerns about sending sensitive screen data to external servers. By leveraging edge computing, OpenAGI hopes to enhance the security and privacy of its users while providing a robust AI solution.

Furthermore, OpenAGI has confirmed that it is in exploratory discussions with AMD and Microsoft regarding potential partnerships. These collaborations could further bolster Lux’s capabilities and integration into existing technology ecosystems, enhancing its appeal to businesses and developers alike.

Zengyi Qin, the founder of OpenAGI, brings a unique combination of academic credentials and entrepreneurial experience to the table. He completed his doctorate at the Massachusetts Institute of Technology in 2025, focusing on computer vision, robotics, and machine learning. His academic work has been published in prestigious venues, including the Conference on Computer Vision and Pattern Recognition, the International Conference on Learning Representations, and the International Conference on Machine Learning.

Before founding OpenAGI, Qin played a pivotal role in developing several widely adopted AI systems. Notably, he led the development of JetMoE, a large language model that demonstrated the feasibility of training high-performing models from scratch for under $100,000—a fraction of the tens of millions typically required. JetMoE outperformed Meta’s LLaMA2-7B on standard benchmarks, garnering attention from MIT’s Computer Science and Artificial Intelligence Laboratory.

Qin’s previous open-source projects have also achieved remarkable adoption. OpenVoice, a voice cloning model, accumulated approximately 35,000 stars on GitHub, ranking in the top 0.03 percent of open-source projects by popularity. Similarly, MeloTTS, a text-to-speech system, has been downloaded over 19 million times since its release in 2024, making it one of the most widely used audio AI models.

Additionally, Qin co-founded MyShell, an AI agent platform that has attracted six million users who have collectively built more than 200,000 AI agents. Users have engaged in over one billion interactions with agents on the platform, highlighting the growing interest in AI-driven solutions.

As the computer-use agent market continues to attract intense interest from investors and technology giants, OpenAGI’s emergence signals that innovation is not solely dependent on scale. The company’s focus on smarter architectures and novel training methodologies positions it as a compelling alternative to larger, well-funded rivals. Whether Lux can maintain its impressive performance in real-world scenarios remains to be seen, as the AI industry has a history of impressive demonstrations that falter in production.

The distance between controlled test environments and the complexities of real-world workflows filled with edge cases, exceptions, and surprises can be vast. However, if Lux performs in practical applications as it does in laboratory settings, the implications extend far beyond the success of a single startup. It could suggest that the path to capable AI agents lies not through the largest financial investments but through innovative approaches and clever architectures.

The technology industry has witnessed similar narratives before, where smaller teams with groundbreaking ideas have outmaneuvered larger competitors. As OpenAGI continues to refine Lux and expand its capabilities, the race to build AI that truly controls computers has become more competitive and intriguing than ever.

In conclusion, OpenAGI’s launch of Lux marks a significant milestone in the evolution of AI agents. With its impressive benchmark performance, cost efficiency, and ability to navigate both web and desktop applications, Lux has the potential to reshape the landscape of computer-use agents. As the industry watches closely, the success of Lux could redefine expectations for AI capabilities and drive further innovation in the field. The journey ahead for OpenAGI and Lux promises to be one of excitement, challenges, and transformative advancements in artificial intelligence.