Baseten, a prominent player in the AI infrastructure landscape recently valued at $2.15 billion, has made a significant leap forward with the launch of its new product, Baseten Training. This innovative platform is designed to empower enterprises to fine-tune open-source AI models while alleviating the operational burdens typically associated with managing GPU clusters, multi-node orchestration, and cloud capacity planning. The introduction of Baseten Training marks a pivotal moment for the company as it seeks to reshape how businesses transition away from reliance on closed-source AI providers like OpenAI and Anthropic.
The announcement, made on Thursday, comes at a crucial juncture in the evolution of enterprise AI adoption. As open-source models developed by organizations such as Meta and Alibaba increasingly rival proprietary systems in terms of performance, companies are feeling the pressure to reduce their dependence on costly API calls to services like OpenAI’s GPT-5 or Anthropic’s Claude. However, the journey from utilizing off-the-shelf open-source models to deploying production-ready custom AI solutions remains fraught with challenges, requiring specialized expertise in machine learning operations, infrastructure management, and performance optimization.
In response to these challenges, Baseten aims to provide the necessary infrastructure while allowing companies to maintain full control over their training code, data, and model weights. This approach is a deliberate shift towards a low-level infrastructure model, informed by lessons learned from previous experiences.
Baseten’s foray into training is not entirely new; the company previously attempted to launch a product called Blueprints approximately two and a half years ago. However, this initial venture failed to gain traction, prompting CEO Amir Haghighat to reflect on the experience as a valuable learning opportunity. “We had created the abstraction layer a little too high,” he explained. The intention was to create a seamless user experience where customers could programmatically select a base model, input their data and hyperparameters, and receive a fully functional model. Unfortunately, users often lacked the intuition to make the right choices regarding base models, data quality, and hyperparameters. When their models underperformed, they attributed the failures to the product itself, leading Baseten to inadvertently find itself in the consulting business rather than focusing on infrastructure.
Recognizing the need for a more grounded approach, Baseten decided to pivot away from Blueprints and concentrate solely on inference, vowing to “earn the right” to expand into training once again. This moment has now arrived, driven by two key market realities: a substantial portion of Baseten’s inference revenue originates from custom models that clients train elsewhere, and competing training platforms have employed restrictive terms of service to lock customers into their inference products.
Haghighat noted, “Multiple companies building fine-tuning products included terms in their service agreements that prevented customers from taking the weights of their fine-tuned models elsewhere.” While he understands the rationale behind such restrictions, he believes that a sustainable business model cannot rely solely on training or fine-tuning. Instead, the true value lies in inference, where the potential for revenue generation exists.
In contrast to competitors, Baseten has adopted a customer-centric approach by allowing clients to retain ownership of their model weights, enabling them to download and utilize them freely. The company’s strategy hinges on the belief that superior inference performance will naturally encourage customers to remain on the platform.
The technical capabilities of Baseten Training set it apart from traditional hyperscalers. Operating at what Haghighat refers to as “the infrastructure layer,” the platform offers multi-node training support across clusters of NVIDIA H100 and B200 GPUs. Key features include automated checkpointing to safeguard against node failures, sub-minute job scheduling, and integration with Baseten’s proprietary Multi-Cloud Management (MCM) system. This MCM system allows Baseten to dynamically provision GPU capacity across multiple cloud providers and regions, providing cost savings to customers while avoiding the lengthy contracts typically associated with hyperscaler deals.
Haghighat emphasized the flexibility of Baseten’s approach, stating, “With hyperscalers, you donāt get to say, āHey, give me three or four B200 nodes while my job is running, and then take it back from me and donāt charge me for it.ā They say, āNo, you need to sign a three-year contract.ā We donāt do that.” This flexibility is particularly appealing to enterprises looking for agile solutions that can adapt to their evolving needs.
The broader trends in cloud infrastructure also favor Baseten’s model, as abstraction layers increasingly facilitate the fluid movement of workloads across providers. For instance, when AWS experienced a major outage recently, Baseten’s inference services remained operational by automatically rerouting traffic to other cloud providersāa capability that has now been extended to training workloads.
Baseten’s observability tooling further enhances its offering, providing per-GPU metrics for multi-node jobs, granular checkpoint tracking, and a refreshed user interface that highlights infrastructure-level events. Additionally, the company has introduced an “ML Cookbook” of open-source training recipes for popular models like Gemma, GPT OSS, and Qwen, aimed at helping users achieve “training success” more efficiently.
Early adopters of Baseten Training have reported impressive results, showcasing the platform’s potential impact on enterprise operations. For example, Oxen AI, a platform focused on dataset management and model fine-tuning, has partnered with Baseten to streamline its infrastructure. CEO Greg Schoeninger articulated the strategic rationale behind this collaboration, stating, “Whenever Iāve seen a platform try to do both hardware and software, they usually fail at one of them. Thatās why partnering with Baseten to handle infrastructure was the obvious choice.”
Oxen built its customer experience entirely on top of Baseten’s infrastructure, utilizing the Baseten CLI to programmatically orchestrate training jobs. This system automatically provisions and deprovisions GPUs, effectively concealing Baseten’s interface behind Oxen’s own. One of Oxen’s customers, AlliumAI, a startup focused on organizing messy retail data, achieved an astounding 84% cost savings through this integration, reducing total inference costs from $46,800 to just $7,530.
Daniel Demillard, CEO of AlliumAI, remarked, “Training custom LoRAs has always been one of the most effective ways to leverage open-source models, but it often came with infrastructure headaches. With Oxen and Baseten, that complexity disappears. We can train and deploy models at massive scale without ever worrying about CUDA, which GPU to choose, or shutting down servers after training.”
Another early customer, Parsed, addresses a different challenge by helping enterprises reduce their reliance on OpenAI through the creation of specialized models that outperform generalist large language models (LLMs) on domain-specific tasks. Operating in mission-critical sectors such as healthcare, finance, and legal services, Parsed prioritizes model performance and reliability.
Charles OāNeill, co-founder and chief science officer of Parsed, shared insights into the company’s experience prior to switching to Baseten. “We were seeing repetitive and degraded performance on our fine-tuned models due to bugs with our previous training provider. On top of that, we were struggling to easily download and checkpoint weights after training runs.” Since transitioning to Baseten, Parsed has achieved a remarkable 50% reduction in end-to-end latency for transcription use cases, spun up HIPAA-compliant EU deployments for testing within 48 hours, and initiated over 500 training jobs. The company also leveraged Baseten’s modified vLLM inference framework and speculative decoding techniques to halve latency for custom models.
O’Neill emphasized the importance of continuous improvement in model performance, stating, “Fast models matter. But fast models that get better over time matter more. A model thatās 2x faster but static loses to one thatās slightly slower but improving 10% monthly. Baseten gives us bothāthe performance edge today and the infrastructure for continuous improvement.”
The interdependence between training and inference is a critical aspect of Baseten’s strategy. The company recognizes that the boundary between these two functions is blurrier than conventional wisdom suggests. For instance, Baseten’s model performance team extensively utilizes the training platform to create “draft models” for speculative decoding, a cutting-edge technique that can significantly accelerate inference. Recently, the company announced it achieved over 650 tokens per second on OpenAI’s GPT OSS 120B modelāa 60% improvement over its initial performanceāby employing EAGLE-3 speculative decoding, which necessitates training specialized small models to work alongside larger target models.
Haghighat noted, “Ultimately, inference and training plug in more ways than one might think. When you do speculative decoding in inference, you need to train the draft model. Our model performance team is a big customer of the training product to train these EAGLE heads on a continuous basis.” This technical interdependence reinforces Baseten’s thesis that owning both training and inference creates defensible value. The company can optimize the entire lifecycle: a model trained on Baseten can be deployed with a single click to inference endpoints pre-optimized for that architecture, complete with deployment-from-checkpoint support for chat completion and audio transcription workloads.
As open-source AI models continue to improve, Baseten’s strategy is rooted in the belief that these advancements will unlock massive enterprise adoption through fine-tuning. Haghighat expressed confidence in the trajectory of both closed and open-source models, stating, “We donāt even need open source to surpass closed models, because as both of them are getting better, they unlock all these invisible lines of usefulness for different use cases.” He pointed to the proliferation of reinforcement learning and supervised fine-tuning techniques that enable companies to take an open-source model and tailor it to meet specific needs, making it “as good as the closed model, not at everything, but at this narrow band of capability that they want.”
This trend is already evident in Baseten’s Model APIs business, launched alongside Training earlier this year to provide production-grade access to open-source models. The company was the first provider to offer access to DeepSeek V3 and R1, and has since added models like Llama 4 and Qwen 3, optimized for performance and reliability. The Model APIs serve as a top-of
