In a significant move that underscores the intensifying competition in the artificial intelligence landscape, Baidu has unveiled its next-generation foundation model, ERNIE 5.0, just hours after OpenAI released an update to its flagship model, GPT-5.1. This launch, which took place at the Baidu World 2025 event, marks a pivotal moment for the Chinese tech giant as it seeks to establish itself as a formidable player in the global enterprise AI market.
ERNIE 5.0 is described as a proprietary, natively multimodal model capable of processing and generating content across various formats, including text, images, audio, and video. This capability positions ERNIE 5.0 as a versatile tool for enterprises looking to leverage AI for a wide range of applications, from automated document processing to complex data analysis.
Baidu’s announcement comes at a time when the demand for advanced AI solutions is surging, particularly in sectors that require sophisticated document understanding and multimodal reasoning. The company aims to capitalize on this trend by offering a model that not only competes with existing solutions like OpenAI’s GPT-5 and Google’s Gemini 2.5 Pro but also claims to outperform them in several key areas.
One of the standout features of ERNIE 5.0 is its ability to handle joint inputs and outputs across different modalities. Unlike many existing models that rely on post-hoc modality fusion—where separate models process different types of data before combining the results—ERNIE 5.0 integrates these processes into a single architecture. This technical differentiator is expected to enhance the model’s efficiency and effectiveness in real-world applications.
During the Baidu World 2025 event, the company shared benchmark results indicating that ERNIE 5.0 has achieved parity or near-parity with leading Western foundation models across a variety of tasks. In particular, the model reportedly outperformed or matched GPT-5-High and Gemini 2.5 Pro in areas such as multimodal reasoning, document understanding, and image-based question answering (QA). These capabilities are critical for enterprise applications, especially in fields like finance and legal services, where accurate document processing and comprehension are paramount.
Baidu highlighted ERNIE 5.0’s performance on specific benchmarks, including OCRBench, DocVQA, and ChartQA, which test document recognition, comprehension, and structured data reasoning. The company claims that ERNIE 5.0 surpassed both GPT-5-High and Gemini 2.5 Pro in these areas, emphasizing its potential for automating complex tasks that involve interpreting and analyzing large volumes of information.
In addition to its strengths in document processing, ERNIE 5.0 has demonstrated impressive capabilities in image generation. According to Baidu’s internal evaluations, the model either tied or exceeded Google’s Veo3 across various categories, including semantic alignment and image quality. This suggests that ERNIE 5.0 can generate and interpret visual content with a level of contextual awareness that may surpass that of models relying on modality-specific encoders.
The model’s audio and speech processing capabilities are also noteworthy. ERNIE 5.0 achieved competitive results on audio understanding benchmarks such as MM-AU and TUT2017, as well as in question answering from spoken language inputs. While audio performance was not the primary focus of the launch, its inclusion indicates Baidu’s intention to support a comprehensive suite of multimodal applications.
In terms of language tasks, ERNIE 5.0 has shown strong results in instruction following, factual question answering, and mathematical reasoning—core competencies that define the utility of large language models in enterprise settings. The introduction of ERNIE 5.0 Preview 1022, a variant optimized for text-intensive tasks, further enhances the model’s appeal for developers and businesses focused on language processing.
Baidu’s pricing strategy for ERNIE 5.0 positions it at the premium end of the market. The model is available through Baidu’s ERNIE Bot and the Qianfan cloud platform API, with specific pricing set at $0.85 per 1,000 input tokens and $3.40 per 1,000 output tokens. This pricing aligns with other top-tier offerings from Chinese competitors like Alibaba, while remaining competitive compared to U.S. alternatives such as OpenAI’s GPT-5.1, which charges $1.25 for input tokens and $10.00 for output tokens.
The contrast in cost between ERNIE 5.0 and earlier models, such as ERNIE 4.5 Turbo, highlights Baidu’s strategy to differentiate between high-volume, low-cost models and high-capability models designed for complex tasks and multimodal reasoning. This approach reflects a broader trend in the AI industry, where companies are increasingly segmenting their offerings to cater to diverse customer needs.
Baidu’s ambitions extend beyond the launch of ERNIE 5.0. The company is actively pursuing international expansion, with several initiatives aimed at broadening its AI footprint beyond China. Among these initiatives is GenFlow 3.0, Baidu’s largest general-purpose AI agent, which now boasts over 20 million users. This platform features enhanced memory and multimodal task handling capabilities, making it a valuable tool for businesses seeking to integrate AI into their operations.
Additionally, Baidu has launched MeDo, an international version of its no-code builder Miaoda, which is now available globally. The Oreate productivity workspace, which supports document, slide, image, video, and podcast creation, has also reached over 1.2 million users worldwide. These products reflect Baidu’s commitment to providing accessible AI solutions that empower users to harness the power of technology without requiring extensive technical expertise.
Baidu’s digital human platform, already deployed in Brazil, is another component of its global strategy. The platform has gained traction in the livestreaming space, with reports indicating that 83% of livestreamers during this year’s “Double 11” shopping event in China utilized Baidu’s digital human technology, contributing to a remarkable 91% increase in gross merchandise value (GMV).
Moreover, Baidu’s autonomous ride-hailing service, Apollo Go, has surpassed 17 million rides, operating driverless fleets in 22 cities and claiming the title of the world’s largest robotaxi network. This achievement underscores Baidu’s leadership in the autonomous driving sector and its ability to leverage AI technologies to transform transportation.
In conjunction with the launch of ERNIE 5.0, Baidu also introduced an open-source multimodal model, ERNIE-4.5-VL-28B-A3B-Thinking, under the Apache 2.0 license. This model activates just 3 billion parameters while maintaining a total of 28 billion, utilizing a Mixture-of-Experts (MoE) architecture for efficient inference. Key innovations include dynamic zoom-based visual analysis, support for chart interpretation, document understanding, visual grounding, and temporal awareness in video.
The release of ERNIE-4.5-VL-28B-A3B-Thinking adds pressure on closed-source competitors by providing a viable foundation model for commercial applications without licensing restrictions. This open-source approach aligns with the growing trend of transparency and collaboration in the AI community, allowing developers and organizations to build upon Baidu’s advancements without the constraints typically associated with proprietary models.
Community feedback following the launch of ERNIE 5.0 has been mixed, with some developers reporting issues such as the model’s tendency to invoke tools excessively during SVG generation tasks. Baidu’s prompt response to these concerns, acknowledging the bug and committing to a fix, reflects the company’s increasing emphasis on developer communication and responsiveness as it seeks to attract international users.
As Baidu continues to navigate the competitive landscape of AI, the launch of ERNIE 5.0 represents a strategic escalation in the race for dominance in the foundation model arena. With performance claims that position it alongside the most advanced systems from OpenAI and Google, coupled with a dual strategy of premium APIs and open-source releases, Baidu is signaling its ambition to become not just a domestic leader but a credible global infrastructure provider.
The demand for scalable, multimodal AI solutions is on the rise, and Baidu’s two-track approach may broaden its appeal across both corporate and developer communities. Whether the company’s performance claims hold up under third-party testing remains to be seen, but the breadth of capabilities offered by ERNIE 5.0 and its supporting ecosystem positions Baidu favorably in the next wave of AI deployment.
In conclusion, Baidu’s ERNIE 5.0 launch is a bold statement in the rapidly evolving AI landscape. As enterprises increasingly seek advanced solutions to meet their complex needs, the competition among AI providers will only intensify. Baidu’s commitment to innovation, coupled with its strategic initiatives for global expansion, positions the company as a key player in shaping the future of artificial intelligence. The coming months will be crucial as the industry watches how ERNIE 5.0 performs in real-world applications and how Baidu navigates the challenges and opportunities that lie ahead.
