In a significant move that underscores the evolving landscape of artificial intelligence, Notion has undertaken a comprehensive overhaul of its technology stack to support the next generation of agentic AI. This transformation is not merely an upgrade; it represents a fundamental shift in how productivity software can harness advanced reasoning models to operate at enterprise scale. Released in September, Notion 3.0 marks a pivotal moment for the company as it seeks to redefine user interactions with AI-powered tools.
Traditionally, AI workflows have relied on explicit, step-by-step instructions, often requiring users to engage in few-shot learning to guide the system through tasks. However, the emergence of advanced reasoning models has changed the game. These models are capable of understanding tool definitions, identifying available resources, and planning subsequent actions autonomously. Sarah Sachs, Notion’s head of AI modeling, articulated this shift, stating, “Rather than trying to retrofit into what we were building, we wanted to play to the strengths of reasoning models. We’ve rebuilt a new architecture because workflows are different from agents.”
The decision to rebuild from the ground up was not taken lightly. Many organizations would hesitate to undertake such a monumental task, fearing the risks associated with overhauling their tech stack. Yet, Notion recognized that to effectively support agentic AI at scale, a fresh approach was necessary. The company has already seen widespread adoption of its platform, with 94% of Forbes AI 50 companies utilizing Notion, alongside a total user base of 100 million, which includes notable clients like OpenAI, Cursor, Figma, Ramp, and Vercel.
One of the most significant changes in Notion 3.0 is the transition from rigid prompt-based workflows to a unified orchestration model. This new architecture allows for modular sub-agents that can autonomously search, plan, and execute tasks across various platforms, including Notion itself, Slack, and the broader web. Each agent is designed to use tools contextually, determining whether to search within Notion or another platform based on the task at hand. This capability enables the model to perform successive searches until relevant information is found, allowing it to convert notes into proposals, create follow-up messages, track tasks, and update knowledge bases seamlessly.
In previous iterations, such as Notion 2.0, the focus was primarily on having AI perform specific tasks. This required the team to meticulously consider how to prompt the model for each scenario. However, with the introduction of version 3.0, users can assign tasks to agents, which can then take action and perform multiple tasks concurrently. This re-orchestration allows for a more fluid interaction between users and AI, where the system can self-select the appropriate tools rather than relying on explicit prompts for every scenario.
The evolution of reasoning models has been rapid, with these systems becoming significantly better at learning to utilize tools and following chain-of-thought (CoT) instructions. Sachs noted that this advancement allows agents to be “far more independent” and capable of making multiple decisions within a single workflow. The engineering implications of this shift were profound, necessitating a departure from traditional methods in favor of a more dynamic and adaptable architecture.
To ensure the accuracy and reliability of its AI outputs, Notion has adopted a rigorous evaluation framework that combines deterministic tests, vernacular optimization, human-annotated data, and LLMs-as-a-judge. This multifaceted approach allows the team to identify discrepancies and inaccuracies effectively. By bifurcating the evaluation process, Notion can isolate the sources of errors, thereby minimizing unnecessary hallucinations—instances where the AI generates incorrect or misleading information.
Sachs emphasized the importance of simplifying the architecture itself, which facilitates easier modifications as models and techniques continue to evolve. “We optimize latency and parallel thinking as much as possible,” she explained, noting that this leads to significantly improved accuracy. The models are grounded in data sourced from both the web and the Notion connected workspace, ensuring that the information provided is both relevant and trustworthy.
Latency, a critical factor in user experience, is treated as a contextual issue by Notion. The company recognizes that different types of queries warrant different response times. For instance, users may expect immediate answers for straightforward questions, such as basic arithmetic, while they might be more patient when engaging with complex reasoning tasks that require deeper analysis. In some cases, Notion’s agents can perform up to 20 minutes of autonomous work across numerous websites and files, allowing users to focus on other tasks while the AI executes in the background.
This nuanced understanding of user expectations regarding latency is crucial for product design. Sachs posed an intriguing question: “How slow can you go before people abandon the model?” This inquiry highlights the delicate balance between providing thorough, reasoned responses and maintaining user engagement through timely interactions.
Notion’s commitment to using its own product—often referred to as “dogfooding”—is another cornerstone of its development strategy. Employees actively engage with the platform, generating training and evaluation data while providing real-time feedback through a thumbs-up-thumbs-down mechanism. This internal feedback loop is invaluable, as it allows the team to quickly identify areas for improvement and iterate on features based on actual user experiences.
However, Sachs acknowledged that relying solely on internal feedback could lead to biases. To mitigate this risk, Notion collaborates with trusted design partners who are well-versed in AI and granted early access to new capabilities. This external perspective is essential for ensuring that the product meets the diverse needs of its user base and does not become overly tailored to the preferences of its internal teams.
Continuous internal testing is also vital for monitoring progress and ensuring that models do not regress over time. Sachs explained that maintaining fidelity in the development process is crucial, as it helps the team understand whether their latency remains within acceptable bounds. Many companies fall into the trap of focusing too heavily on retrospective evaluations, which can obscure their understanding of ongoing improvements. Notion, on the other hand, views evaluations as a litmus test for development, using them to gauge both forward-looking progression and regression proofing.
As Notion embarks on this ambitious journey, several key takeaways emerge for other tech leaders looking to navigate the complexities of integrating agentic AI into their operations. First and foremost, organizations should not shy away from rebuilding their systems when foundational capabilities change. Notion’s willingness to fully re-engineer its architecture demonstrates the potential rewards of embracing innovation.
Additionally, treating latency as a contextual issue rather than a one-size-fits-all metric can lead to more effective user experiences. By optimizing response times based on the specific use case, companies can enhance user satisfaction and engagement.
Finally, grounding AI outputs in trustworthy, curated enterprise data is essential for ensuring accuracy and fostering user trust. As Sachs advises, “Be willing to make the hard decisions. Be willing to sit at the top of the frontier, so to speak, on what you’re developing to build the best product you can for your customers.”
In conclusion, Notion’s bold decision to overhaul its tech stack and embrace agentic AI represents a significant milestone in the evolution of productivity software. By prioritizing autonomy, accuracy, and user experience, Notion is setting a new standard for how organizations can leverage AI to enhance their workflows. As the company continues to innovate and adapt to the rapidly changing landscape of artificial intelligence, it serves as a blueprint for others seeking to responsibly and dynamically operationalize agentic AI in connected, permissioned enterprise workspaces. The future of productivity tools is here, and Notion is leading the charge.
