Google Launches Ironwood AI Chips with 4X Performance Boost and Secures Anthropic’s Multi-Billion Dollar Commitment

In a significant advancement for artificial intelligence infrastructure, Google has unveiled its latest generation of Tensor Processing Units (TPUs), dubbed Ironwood, which promises a remarkable fourfold increase in performance compared to its predecessor. This announcement comes alongside a groundbreaking multi-year partnership with Anthropic, an AI safety company known for its Claude family of models, which will gain access to up to one million of these new TPU chips. This deal is not only monumental in scale but also represents one of the largest known commitments to AI infrastructure in history, estimated to be worth tens of billions of dollars.

The introduction of Ironwood TPUs marks a pivotal moment in the evolution of AI technology, as Google positions itself at the forefront of the shift from training AI models to deploying them at scale. This transition, referred to by Google executives as “the age of inference,” reflects a broader industry trend where the focus is increasingly on serving AI models to millions or even billions of users rather than merely training them. The implications of this shift are profound, necessitating infrastructure that can deliver low latency, high throughput, and unwavering reliability—qualities essential for applications such as chatbots, coding assistants, and other AI-driven services that require instantaneous responses.

Ironwood’s architecture is designed to meet these demands head-on. Each Ironwood pod integrates up to 9,216 individual TPU chips, functioning collectively as a single supercomputer. This configuration allows for unprecedented levels of processing power and memory bandwidth, with each pod boasting access to 1.77 petabytes of high-bandwidth memory. Such capabilities enable the Ironwood TPUs to handle complex workloads efficiently, providing a significant advantage over competing solutions.

One of the standout features of the Ironwood architecture is its use of Optical Circuit Switching (OCS) technology, which facilitates dynamic reconfiguration of data traffic. This innovation ensures that when individual components fail or require maintenance—a common occurrence at such scales—the system can automatically reroute traffic within milliseconds, maintaining operational continuity without noticeable disruption to users. This focus on reliability is underscored by Google’s impressive track record, with its fleet-wide uptime for liquid-cooled systems achieving approximately 99.999% availability since 2020, translating to less than six minutes of downtime per year.

The partnership with Anthropic serves as a significant validation of Google’s custom silicon strategy. Anthropic’s commitment to utilizing one million TPU chips underscores the growing demand for robust AI infrastructure capable of supporting advanced models like Claude. Krishna Rao, Anthropic’s chief financial officer, emphasized the importance of this expanded capacity in meeting the exponentially growing demand from their customers, which range from Fortune 500 companies to AI-native startups. The deal also includes provisions for over one gigawatt of compute capacity coming online by 2026, enough to power a small city, further highlighting the scale of this collaboration.

Google’s approach to AI infrastructure is characterized by a vertical integration strategy that encompasses chip design, software development, and application deployment. By building custom silicon tailored specifically for AI workloads, Google aims to achieve superior economics and performance compared to relying solely on off-the-shelf components, such as Nvidia’s dominant GPU offerings. This strategy is not without its challenges, however. Custom chip development requires substantial upfront investment—often amounting to billions of dollars—and the software ecosystem for specialized accelerators lags behind established platforms like Nvidia’s CUDA, which has benefited from over 15 years of developer tools and community support.

Despite these hurdles, Google argues that its integrated approach offers unique advantages. The company points to its history of innovation, noting that the original TPU was instrumental in the development of the Transformer architecture, which has become foundational to modern AI. By fostering close collaboration between model research, software engineering, and hardware development, Google believes it can optimize performance in ways that are unattainable with generic components.

In addition to the Ironwood TPUs, Google has also expanded its Axion processor family, introducing custom Arm-based CPUs designed for general-purpose workloads that support AI applications without requiring specialized accelerators. The new N4A instance type, now entering preview, targets a variety of tasks including microservices, containerized applications, and data analytics, claiming to deliver up to twice the price-performance of comparable x86-based virtual machines. Furthermore, Google is previewing its first bare-metal Arm instance, C4A metal, which provides dedicated physical servers for specialized workloads such as Android development and automotive systems.

The Axion strategy reflects a growing recognition that the future of computing infrastructure must balance the need for specialized AI accelerators with highly efficient general-purpose processors. While TPUs excel at handling the computationally intensive tasks associated with running AI models, Axion-class processors manage essential functions such as data ingestion, preprocessing, application logic, and API serving. This dual approach is designed to optimize the overall performance of AI applications, ensuring that they can operate effectively in real-world environments.

To maximize the utilization of Ironwood and Axion, Google has integrated these technologies into what it calls the AI Hypercomputer—a comprehensive supercomputing system that combines compute, networking, storage, and software to enhance system-level performance and efficiency. According to a recent IDC Business Value Snapshot study, customers leveraging the AI Hypercomputer have reported an average three-year return on investment of 353%, along with a 28% reduction in IT costs and a 55% increase in IT team efficiency.

Software enhancements play a crucial role in harnessing the raw performance of Ironwood and Axion. For instance, the Google Kubernetes Engine now offers advanced maintenance and topology awareness for TPU clusters, enabling intelligent scheduling and resilient deployments. Additionally, the open-source MaxText framework supports advanced training techniques, including Supervised Fine-Tuning and Generative Reinforcement Policy Optimization, further empowering developers to leverage the full potential of these new chips.

Perhaps most notably, Google’s Inference Gateway intelligently load-balances requests across model servers, optimizing critical metrics such as latency and cost. The Inference Gateway can reportedly reduce time-to-first-token latency by an impressive 96% and serving costs by up to 30% through techniques like prefix-cache-aware routing. This capability is particularly beneficial for conversational AI applications, where multiple requests may share context, allowing for more efficient processing and reduced redundancy.

However, the ambitious scale of these developments brings with it significant infrastructure challenges. At the recent Open Compute Project EMEA Summit, Google revealed its plans to implement +/-400 volt direct current power delivery systems capable of supporting up to one megawatt per rack—a tenfold increase from typical deployments. As the demand for machine learning capabilities continues to grow, Google anticipates that more than 500 kW per IT rack will be required before 2030.

To address these power requirements, Google is collaborating with industry leaders such as Meta and Microsoft to standardize electrical and mechanical interfaces for high-voltage DC distribution. The choice of 400 VDC is strategic, leveraging the supply chain established by electric vehicles to achieve greater economies of scale and improved manufacturing efficiency.

Cooling solutions are equally critical, especially as individual AI accelerator chips increasingly dissipate 1,000 watts or more. Google has deployed liquid cooling at GigaWatt scale across more than 2,000 TPU Pods over the past seven years, achieving fleet-wide availability of approximately 99.999%. The company’s fifth-generation cooling distribution unit design will be contributed to the Open Compute Project, further advancing the industry’s capabilities in managing heat dissipation effectively.

As Google embarks on this ambitious journey, it faces a competitive landscape dominated by Nvidia, which currently holds an estimated 80-95% market share in AI accelerators. However, the increasing investment in custom silicon by cloud providers signals a shift towards differentiation in the market. Amazon Web Services has pioneered this approach with its Graviton Arm-based CPUs and Inferentia/Trainium AI chips, while Microsoft has developed its own Cobalt processors and is reportedly working on additional AI accelerators.

The long-term success of Google’s custom silicon strategy will depend on its ability to navigate the inherent challenges of chip development, including the substantial capital requirements and the rapid evolution of AI model architectures. As the industry continues to evolve, questions remain about whether custom silicon can prove economically superior to Nvidia’s GPUs and how emerging model architectures will shape the future of AI infrastructure.

In conclusion, Google’s unveiling of the Ironwood TPUs and the partnership with Anthropic represent a significant milestone in the evolution of AI infrastructure. As the industry transitions from research labs to production deployments serving billions of users, the importance of a robust infrastructure layer—encompassing silicon, software, networking, power, and cooling—cannot be overstated. With Anthropic’s commitment to accessing one million TPU chips, Google’s bet on custom silicon designed specifically for the age of inference appears poised to pay off just as demand reaches its inflection point. The coming months and years will reveal how these advancements will reshape the landscape of AI deployment and infrastructure, potentially setting new standards for performance and efficiency in the field.