Microsoft’s AI chief Mustafa Suleyman is making a familiar argument in an unfamiliar tone: superintelligence is coming soon enough to plan for now, but it won’t arrive as a sudden takeover of the economy—or as a clean, immediate “job apocalypse” that wipes out entire professions on a predictable schedule. In a wide-ranging interview tied to Microsoft Build, Suleyman framed Microsoft’s current AI strategy as a long-term bet on building frontier capability in-house while continuing to rely on OpenAI’s models where they remain best-in-class. He also pushed back on how the industry talks about consciousness, and he offered a definition of AGI, superintelligence, and the singularity that tries to separate marketing language from milestones that can actually be measured.
The result is a message that sounds less like hype and more like systems engineering: build the capacity, define the targets, govern the deployment, and—crucially—judge the outcome by whether it makes people healthier, happier, smarter, and more capable.
A reorganization built around “the frontier”
Suleyman’s first major point was structural. Since joining Microsoft, he says, his role has shifted away from consumer product leadership and toward training frontier models and assembling the internal machinery required to do so. The centerpiece of that shift is what Microsoft calls its “Superintelligence team,” which he describes as being assembled after a long process of renegotiating and reestablishing Microsoft’s relationship with OpenAI.
According to Suleyman, the transition culminated in a new contract completed in October of last year. That agreement, he says, does two things at once: it extends and “cements” the partnership with OpenAI, and it also frees Microsoft to pursue superintelligence independently. In practice, that means Microsoft can keep “buying and licensing” OpenAI models while simultaneously investing in its own frontier training clusters and hiring teams focused specifically on superintelligence.
This is not presented as a break from OpenAI so much as a rebalancing. Suleyman characterizes the partnership as one that evolved naturally as both companies grew. OpenAI, he argues, began as a research lab but moved toward a full-stack posture as it gained traction—building data centers, creating chips, and taking models direct to market through products like ChatGPT Enterprise. Microsoft, meanwhile, is positioned as a company with enormous enterprise distribution: Suleyman claims Microsoft has 493 of the 500 largest companies using its systems and services, including Azure and Microsoft 365/Teams. Over a multi-decade horizon, he says, Microsoft cannot remain structurally dependent on third-party IP for the most valuable technology of all time.
That “sustainability” argument is the backbone of his explanation for why the relationship had to change. It’s not just about competition or pride; it’s about long-term control of the stack. If superintelligence is “just around the corner,” then Microsoft’s logic goes, the company needs to be able to stand on its own two feet rather than merely adapt someone else’s models into production.
The OpenAI split: not a divorce, but a reset
The interview also revisits the public narrative around Microsoft and OpenAI—especially the idea that Microsoft wanted to be the “product company” while OpenAI remained the “research lab.” Suleyman acknowledges that the original division of labor was real, but he argues that the relationship changed because both sides pursued opportunities across the stack.
OpenAI, he says, saw revenue and traction and naturally expanded into consumer products and infrastructure. Microsoft, he adds, is too large and too deeply embedded in enterprise workflows to remain a passive recipient of another lab’s IP forever. The partnership, in his telling, is now in a stage where it must be optimized for different objectives: consumer experiences, enterprise deployments, and the fundamental science mission of superintelligence.
He also addresses the “Intel vs. Microsoft” framing that appeared during the Musk/OpenAI/Altman trial coverage. In that context, Microsoft CEO Satya Nadella made a remark implying Microsoft didn’t want to become the provider while OpenAI captured the platform value. Suleyman says Nadella’s decision is part of a broader internal realization, but he emphasizes that these are slow-moving changes rather than a single meeting triggered by a board incident. He describes it as a gradual accumulation of tension across competitive fronts and a recognition that partnerships don’t last forever.
In other words: the shift wasn’t a sudden betrayal; it was a strategic adjustment to changing incentives and scale.
Frontier training as a budgeted mission
If the reorganization explains the “why,” Suleyman’s next focus is the “how”—and the money behind it. Training frontier models, he concedes, is expensive. But he says Microsoft made the decision early enough that it informed contract negotiations and was resolved in October. He doesn’t describe a dramatic CFO moment; instead, he frames it as a planned investment with a long runway.
To make the case concrete, he points to Microsoft’s Maia 200 chip and the ability to manufacture and ship it at lower cost than competing accelerators inside Microsoft’s own clusters. He also claims that co-design between silicon and models yields additional performance-per-watt improvements. The implication is that Microsoft isn’t only buying compute; it’s trying to own the full optimization loop—hardware, model design, and training efficiency—so that the company can iterate faster and deliver better outcomes for its priority use cases like agentic coding and developer tooling.
This is where Suleyman’s “self-sufficiency mission” becomes more than a slogan. It’s a claim that owning and controlling the stack enables end-to-end co-optimization, which then justifies the investment.
But he also insists that this doesn’t mean abandoning OpenAI. He rejects the idea that Microsoft is “free to be on your own” in the sense of leaving the partnership behind. Instead, he says Microsoft will continue running well beyond 2030 with OpenAI models still producing leading performance. He cites GPT-5.5 as an outstanding model and points to Codex-like capabilities and cybersecurity models as powering much of what Microsoft does today.
So the picture is hybrid by design: Microsoft builds its own frontier capacity while continuing to deploy OpenAI models where they remain superior.
Decision-making cycles and squads: how you run a frontier lab
One of the more operational parts of the interview is Suleyman’s description of how Microsoft’s Superintelligence organization makes decisions. He says the company still uses a six-to-eight-week cycle rhythm, with an in-person one-week meetup at the end of each cycle for retrospectives and planning. He argues that quarterly planning becomes blurry and abstract, while six to eight weeks is the optimal window for “clear, fortifiable missions.”
He also describes a squad structure: mixed interdisciplinary subgroups focused on specific missions, run by a DRI (directly responsible individual) who is often an individual contributor rather than a traditional manager. Suleyman emphasizes the separation between the manager role (coaching, unblocking, career growth) and the DRI role (execution). He claims rotating DRIs every two or three cycles helps prevent burnout and keeps the organization nimble.
This matters because frontier model training is not just a technical challenge; it’s a coordination problem. If you’re trying to “hill climb” toward better models, you need a culture that can run experiments quickly without losing coherence.
Superintelligence: not here yet, but on a measurable trajectory
Suleyman’s discussion of superintelligence is where the interview becomes most philosophical—but he tries to anchor it in empirical trends. He doesn’t claim superintelligence is already here. Instead, he argues that what’s happening now is “log-linear hill climbing” across modalities: each order of magnitude increase in compute and incremental increases in data correlate with benchmark improvements.
He ties this to a broader observation: the same general-purpose architecture has been applied with vastly more compute over time, and it has worked across audio, image, text, code, and other prediction tasks. From there, he extrapolates that additional orders of magnitude of compute will continue to drive compounding progress inside other environments.
But he also identifies two capabilities that would be necessary for the next leap beyond parity with humans:
First, models must be able to invent new knowledge rather than merely extrapolate from existing data.
Second, they must have the capacity to self-improve—accelerating the process of deciding which hypotheses to pursue, generating training data, and potentially even innovating on architecture.
He suggests that applying more compute may already get systems to parity on many tasks, citing recent coding progress as evidence. Yet he treats full superintelligence as still an open question, especially because it requires learning in novel out-of-distribution domains from scratch—something current agents are not fully doing.
This is a key nuance in his argument. He’s bullish on near-term capability acceleration, but he’s not claiming that today’s systems already meet the strict definition of superintelligence.
How do you measure “human-level” in chat?
Suleyman’s approach to measurement is also revealing. He argues that coding is easier to validate because code either runs or fails. But he pushes back on the idea that other domains are impossible to evaluate. Even in coding, he says, quality involves nuance: extensibility, usefulness in production, and whether the output matches the intended app or website—not just whether it executes.
When the conversation turns to chat, Suleyman claims that many people are having long, meaningful conversations with AIs at human-level performance. He lists qualities like emotional intelligence, accuracy, minimized hallucinations, and grounding in real-world observations. He then offers his own personal metric: asking his assistant for a daily briefing summarizing Teams and email conversations, document updates, and recommended actions—something he says is better than what his chief of staff can produce.
He also argues that emotional support is one of the most popular use cases for chatbots, and he treats that as a robust indicator of usefulness. When pressed on whether “human-level” should mean more than functional helpfulness, he defines chat as an interactive exchange between two parties
