In the early days of enterprise generative AI, the mood inside many companies was almost celebratory. Leadership teams urged employees to experiment broadly, to “use it everywhere,” and to treat AI like a new general-purpose tool rather than a tightly controlled project. The logic was straightforward: if you wait for perfect ROI models, you’ll miss the learning curve; if you move fast enough, value will reveal itself.
But as the novelty wore off, the bill arrived—often faster than expected. TechCrunch’s video conversation with NEA partner Tiffany Luck captures a pattern that has become increasingly familiar across the corporate landscape: enterprises are still figuring out how to measure AI ROI in a way that survives contact with real-world usage, procurement constraints, and operational governance. And in some cases, the internal enthusiasm that once fueled rapid adoption has been replaced by cost controls, license reductions, and a more cautious approach to deployment.
The story isn’t simply that AI is expensive. It’s that the early adoption model—encouraging broad usage without fully anticipating consumption behavior—creates a mismatch between ambition and budgeting. When AI tools are treated like a free-floating productivity layer, usage can scale in unpredictable ways. People don’t just run a few workflows; they test prompts, iterate on drafts, ask follow-up questions, and explore edge cases. Even when each individual interaction seems small, the aggregate can quickly exceed annual forecasts.
Luck points to this tension as a core reason why enterprises are struggling to translate AI experimentation into reliable financial outcomes. The challenge is not only technical. It’s organizational and economic: companies need to align incentives, governance, and measurement so that AI spending maps to business impact rather than curiosity.
What makes this moment especially instructive is that the fallout has been visible. Uber reportedly blew through its annual AI budget in a few months. Some organizations have reportedly cut Claude licenses for parts of their org. Meta reportedly killed its internal leaderboard. Each of these examples reflects a different facet of the same underlying problem: when AI adoption is driven by momentum rather than structured value creation, the system eventually hits constraints—cost ceilings, access limitations, and internal pressure to justify spend.
To understand why this happens, it helps to look at how AI usage differs from traditional software adoption. With many enterprise tools, usage patterns are relatively stable. Seats are purchased, features are enabled, and the organization can estimate demand based on headcount and historical behavior. Generative AI is different. It behaves more like a usage-based service where the “unit” is not a seat but an interaction. That means the number of prompts, the length of outputs, the number of retries, and the degree of iterative refinement all influence cost.
In practice, this creates a new kind of budgeting problem. A company might forecast AI costs based on a limited pilot group, then expand access to broader teams. But expansion doesn’t just increase the number of users—it changes the nature of usage. Teams discover new ways to use the tool, and those uses often involve more back-and-forth than expected. A marketing team might start with simple copy drafts and then move into campaign variations, localization, and compliance checks. A customer support team might begin with summarization and then shift into multi-step resolution assistance. A developer team might start with code generation and then expand into debugging loops and refactoring suggestions. Each step increases token consumption.
This is why “AI enthusiasm” can be a misleading metric. Enthusiasm is real, but it doesn’t automatically produce measurable outcomes. In the early phase, the organization learns what the tool can do. In the later phase, it must decide what it should do, for whom, and under what constraints. That transition is where many enterprises are currently stuck.
One unique angle on the ROI question is that ROI is not a single number—it’s a portfolio. Some AI use cases reduce costs directly, such as automating repetitive tasks or improving efficiency in operations. Others increase revenue indirectly, such as improving conversion rates through better personalization or accelerating product development cycles. Still others reduce risk, such as improving compliance review or lowering the likelihood of errors. If a company tries to force every use case into one ROI framework too early, it may end up with a distorted picture: the most visible wins might be delayed, while the costs show up immediately.
Luck’s framing suggests that enterprises are still building the muscle memory for this portfolio approach. They’re learning how to connect AI usage to outcomes that finance teams recognize. That includes defining success metrics that are specific enough to be actionable but broad enough to capture real value. For example, “time saved” is not always enough. A company needs to know whether time saved translates into throughput, quality improvements, reduced rework, faster cycle times, or better customer outcomes.
There’s also the question of attribution. When AI is introduced into a workflow, it rarely replaces a process in a clean, linear way. Instead, it changes how people work. A writer might use AI to draft faster, but the final output still requires human editing. A developer might use AI to generate code, but the team still runs tests, reviews changes, and handles integration. The value is distributed across steps, and it can be difficult to isolate what portion of performance improvement is attributable to AI versus other concurrent initiatives.
This is where governance becomes part of ROI. Without guardrails, AI usage can become chaotic: people experiment freely, outputs vary in quality, and the organization struggles to standardize evaluation. Governance isn’t just about compliance; it’s about creating consistent conditions under which measurement is possible. If every team uses different prompts, different models, and different evaluation methods, comparing results becomes nearly impossible. That makes it harder to justify spend and easier for leadership to revert to cost controls.
The reported examples of budget blowouts and license cuts illustrate how quickly governance can become a lever. When costs rise beyond expectations, companies often respond by tightening access or reducing licenses. That can be necessary, but it also risks undermining the very experimentation that produced early learning. If the organization clamps down too aggressively, it may stop discovering new high-value workflows. The result is a cycle: experiment broadly, overspend, restrict access, lose momentum, then restart with narrower pilots.
Meta’s reported decision to kill its internal leaderboard is a particularly telling signal. Leaderboards are often used to encourage adoption and reward usage. But they can also create perverse incentives. If the metric is “how much you use AI,” teams may optimize for activity rather than impact. People might generate more content, run more prompts, or participate in challenges without necessarily improving business outcomes. Over time, leadership may conclude that the internal incentive structure is misaligned with ROI.
Uber’s budget story points to another common issue: scaling without a consumption model. If a company expands AI access faster than it can model token usage and cost drivers, it can burn through budgets quickly. This is not necessarily a sign of poor planning; it can be a sign that the organization underestimated how quickly usage patterns would evolve once AI became part of everyday work.
Similarly, cutting Claude licenses for parts of an organization suggests that procurement and vendor strategy are now intertwined with ROI. Enterprises are not only deciding whether AI works—they’re deciding which tools to fund, which teams to prioritize, and how to manage vendor spend across multiple models and providers. In a market where model capabilities differ and pricing structures vary, the “best” AI solution for one team may not be the best for another. That means ROI is also a function of fit: the right model for the right task, with the right guardrails and evaluation.
So what does “figuring out AI ROI” actually look like in practice? It usually involves several shifts that enterprises are still working through.
First, companies are moving from broad experimentation to use-case selection. Instead of encouraging everyone to use AI for everything, they identify a smaller set of workflows where AI can plausibly deliver measurable gains. These are often processes with clear inputs and outputs, such as document summarization, first-draft generation, ticket triage, knowledge base search, or code assistance. The key is that these workflows can be evaluated consistently.
Second, enterprises are adopting cost-aware architectures. This includes techniques like caching, limiting output lengths, using smaller models for simpler tasks, routing requests to different models based on complexity, and implementing guardrails that reduce unnecessary retries. In other words, ROI is increasingly engineered into the system rather than merely measured after the fact.
Third, organizations are building evaluation frameworks that go beyond subjective impressions. Many early pilots relied on “it feels better” feedback. Now, teams are developing rubric-based assessments, human review sampling, automated quality checks, and outcome tracking. The goal is to determine not just whether AI produces plausible text, but whether it improves accuracy, reduces turnaround time, lowers error rates, or increases customer satisfaction.
Fourth, companies are aligning AI usage with governance and training. If employees are not trained on effective prompting, verification practices, and escalation paths, the organization may see inconsistent results. Training can improve both quality and efficiency, which in turn affects cost. Governance also includes data handling policies—what can be sent to external models, what must remain internal, and how sensitive information is protected. These policies can slow adoption initially, but they reduce risk and make measurement more credible.
Fifth, enterprises are redefining ROI to include risk reduction and operational resilience. Some benefits are hard to quantify in immediate dollars but still matter. For example, improving compliance review can prevent costly incidents. Enhancing incident response with AI-assisted analysis can reduce downtime. Even if these benefits are not captured in a simple spreadsheet, they influence executive decisions and long-term strategy.
A unique takeaway from Luck’s comments is that the ROI conversation is not just about proving AI pays off—it’s about proving AI can be managed. The early phase of adoption treated AI as a capability to deploy. The current phase treats AI as a system to operate. That shift changes what “success” means. It’s no longer enough to demonstrate that AI can generate good outputs. Companies must demonstrate that they can control costs, maintain quality, and deliver repeatable outcomes across teams.
This is why the tension between experimentation
