Don’t Just Talk About AI. Measure Business Outputs. Here’s How.

By Bob Morse and Dario Fanucchi
Last year felt like the Year of the AI Pilot. Companies bought LLM subscriptions, managers checked on employee usage, and coffee chats abounded with the “AI wrote my memo” motif.
Looking around today, there is widespread disappointment with the impact of these AI pilots. Add to this the recent sell-off in SaaS stocks, and the question is no longer “Are we using AI?” but rather “Is this thing working?”
AI is an invention that is in the process of becoming an innovation. An invention is a new capability; it is not an innovation until it has a business model. In that light, experimentation last year was the sensible move.
It is becoming clear now that the form that innovation takes will be AI systems trusted with real decisions — what Peter Drucker would call executives, and what are today referred to as agentic AI.
As we turn to the question at hand, Is this thing working?, we can look to one of Drucker’s intellectual disciples for a framework to take us forward. Andy Grove, the legendary former CEO of Intel, turned Drucker’s writings into a hard-nosed, pragmatic approach to managing knowledge-worker organizations. His book, “High Output Management,” provides the classic framework for measuring the outputs of middle managers. This is not an easy thing to measure. But Grove is relentless in insisting it can and must be measured.
As we address the question of whether AI agents are delivering tangible value, we have to shift our focus away from activities, anecdotes and initiatives. These are inputs.
Grove argues that organizations must instead focus on outputs. If we try to think like Grove, we would first define the business outcome we wanted to achieve, and then measure our agentic AI only by whether this performance metric is better.
A mathematical approach
As we began working on this several years ago across our software portfolio, I had the great good fortune to meet Dario Fanucchi, a mathematician who was using AI to solve real-world problems in a very similar way. He is also co-founder and CTO of Isazi 1, a decade-old, 70-plus-person team of mathematicians and engineers who have completed hundreds of projects for leading companies around the world.
His approach to these has a singular focus: improving core business metrics.
Isazi came to the same idea of measuring outputs, although starting from the field of mathematics rather than organizational behavior. The idea is to approach AI projects as though they are mathematical optimization problems: Define a target measure (such as throughput or working capital), ask what variables influence that metric, and model the mechanism by which the target measure is moved.
Then all initiatives are aligned to this target measure, and success is measured by its improvement. This aligns well with how AI models are built and improved: benchmarks and evals are always the core measure of success. Here, these evals are directly aligned to business metrics.
You must begin with the output you want to measure. And then you watch that output measurement, as a gauge, and see how long it takes until that gauge is reading changes, how much it changes, in what direction, and whether it sustains.
The time it takes to see (and sustain) a material movement is called “Time To Production.” Our theory on why so many pilots fail is that companies tend to pick an AI tool and a pilot duration and qualitatively check in with users at the end of that time.
While we at Strattam and Isazi appreciate experiments and pilots, we have found that results are best when that process is reversed. We choose the output we want to see improved, vary the AI tools until one moves the dial, and measure the time it takes to change the output positively and in a sustainable way. The shorter the Time To Production, the better.
A real-world example
Let me share an example.
One of Strattam’s portfolio companies, Trax Technologies, is in the business of helping very large multinationals manage their global shipping. A key part of the offering is ensuring that freight bills are complete, match the contract, are approved for payment, and are properly accounted for.
Trax works across all geographies and all shipping modes, with thousands of carriers. Discrepancies between the bill and the shipper contract are common. Handling those “exceptions” at scale is a key part of the service, and historically, Trax has had a large in-house team that resolves those.
In 2024, it identified AI’s ability to resolve some of those exceptions as a key opportunity and developed the AI Audit Optimizer in-house. The output goal was clear: the fraction of exceptions resolved without human intervention.
The first quarter after its release, the Trax AI Audit Optimizer resolved some 826,000 exceptions that otherwise would have required human intervention. That was a good start, but not worth writing home about just yet.
In Q2, however, the system remained stuck at that same level, rather than improving. So Trax rapidly experimented to see what would improve outcomes. In Q3, the company discovered that a human prompt engineer interacting with the system made a big difference. As a result, in Q4, resolved exceptions tripled to 2.5 million.
Now we’re talking.
With the output gauge firmly in mind, Trax is moving forward by adjusting interaction points of the prompt engineer and the system. It used data from successful and unsuccessful resolutions to retrain the system. The company also set quarterly goals; next quarter, it will aim for the Trax AI Audit Optimizer to resolve more than any previous quarter.
This story shows how studying an output gauge allowed the company to tune and adapt the AI tooling to deliver the outcomes that actually matter. Trax is intent on fixing its customers’ problems so it can earn market share. Its use of AI helped it do that, and its output measurements prove the real-world value of the AI innovation.
Measure what matters
Amidst all the hype, we all care that our companies actually adapt, actually deliver customer value, and actually succeed. We know that we cannot keep doing what we are doing as we have been doing it, that our futures may well depend on our ability to adapt. But this is different from actually adapting.
To adapt successfully, resist the urge to buy tools and run pilots and tell anecdotes and report on activities. Those are just inputs. Instead, determine the outcome measurement that matters, and watch it like a hawk to see if AI is delivering cold hard business results. If it’s not, change your AI until the dial moves. Drawing on the time-tested wisdom of Drucker and Grove in this way, you’ll ensure AI earns its keep at your firm.
Bob Morse co-founded Strattam Capital in 2014 and is managing partner. He has served on numerous private and public technology company boards, and currently is a director of CloudHesive, Contegix, Daxtra Technologies, Green Security, Resource Navigation and Trax Group. Previously, he was a partner and member of the investment committee at Oak Hill Capital Partners. He also worked at GCC Investments and Morgan Stanley. Morse serves on the board of directors of Austin PBS and as member of the advisory board for the HMTF Center for Private Equity Finance at The University of Texas at Austin McCombs School of Business. He attended Princeton University, graduating summa cum laude with a B.S.E., and Stanford Graduate School of Business, where he earned his MBA and was an Arjay Miller Scholar. Morse lives in Austin.
Dario Fanucchi contributed to this article. He is chief technology officer at Isazi, a Johannesburg-based applied artificial intelligence firm purpose-built to deliver production-grade AI software solutions for clients. Fanucchi has excelled academically in the fields of computer science, mathematics and physics throughout his career.
Related reading:
- Invention To Innovation: Making Sense Of AI’s Moment
- Avoiding The ‘First Board Meeting Surprise’ Problem
- The Rise Of The AI Executive
Illustration: Dom Guzman
Isazi has a strategic partnership with Strattam Capital, the author’s firm, to embed applied AI across its portfolio.↩
Stay up to date with recent funding rounds, acquisitions, and more with the Crunchbase Daily.
67.1K Followers

Facts Only

* Companies bought LLM subscriptions and managers monitored employee usage during 2023.
* There’s widespread disappointment with the impact of these AI pilots.
* Experimentation last year was sensible in the context of LLMs becoming an invention.
* Innovation requires a business model, not just a new capability.
* Agentic AI – AI systems trusted with real decisions – is the future.
* Drucker’s and Grove’s ideas are relevant for measuring middle manager outputs.
* Organizations must focus on outputs, not activities.
* Define a target measure (throughput, working capital) and model how to influence it.
* Align initiatives to the target measure and measure success through performance metrics.
* “Time To Production” is critical – the duration to see sustainable improvements.
* Trax Technologies used AI Audit Optimizer to resolve shipping exceptions.

Executive Summary

The article explores the current state of Artificial Intelligence adoption within businesses, particularly focusing on the disappointing outcomes following initial excitement surrounding AI pilot programs. It argues that companies’ early experimentation with Large Language Models (LLMs) was driven by hype rather than a clear understanding of how to implement and measure their effectiveness. The piece suggests that the future of AI lies in “agentic AI” – systems capable of making real decisions, drawing upon the frameworks developed by Peter Drucker and Andy Grove. The author advocates for a shift from measuring activities and initiatives to focusing on tangible business outcomes, using a mathematical approach to define targets, model influences, and then measure improvements. The article highlights the importance of “Time To Production” – the duration it takes for an AI solution to deliver measurable results – and critiques companies’ tendency to rely on qualitative check-ins at the end of pilot programs. Ultimately, the article urges a pragmatic, data-driven approach to AI implementation, prioritizing measurable outputs and adapting AI tools until they demonstrably improve key business metrics.

Full Take

Patterns detected: ARC-0024 Ambiguity – The article heavily relies on framing “AI pilots” as a phase of “disappointment,” creating a binary expectation for immediate impact. This frames the situation as a failure, potentially discouraging further experimentation without acknowledging the early stage of AI adoption. It’s a classic “motte-and-bailey” tactic – defining the problem in a way that makes failure seem inevitable.
The narrative also employs ARC-0043 Motte-and-Bailey, using Drucker and Grove as pillars of authority without directly engaging with the critiques of their management theories. These figures are presented as unquestionable wisdom, shielding the argument from deeper scrutiny. The author's framing of "agentic AI" as a clear, defined future aligns with the trend of tech companies positioning their AI as the *solution* to a problem—without fully unpacking the ethical and societal implications of such systems.
The entire piece operates around a systemic bias – an assumption that all organizations should strive for linear, measurable outputs. This ignores the inherent complexity and unpredictability of organizational change and the possibility that some businesses may benefit from more exploratory, iterative approaches to AI. The core paradigm is a narrow, efficiency-focused view of business, shaped by the historical dominance of Grove’s hard-nosed approach. This neglects the potential value of AI in fostering creativity, innovation, and social good, where purely quantitative metrics may be insufficient.
Furthermore, the “Time To Production” metric itself reveals a subtle form of psychological manipulation. It creates a sense of urgency and pressure, implying that failure to achieve rapid results is a sign of inadequate implementation or a flawed AI tool. This is a deliberate tactic to drive a rapid cycle of testing and adjustment, potentially masking underlying issues of strategic misalignment or poor data quality.
Implications: The narrative subtly reinforces a utilitarian view of human work – reducing employees to metrics and tools, potentially dehumanizing the workplace. If organizations prioritize solely measurable outputs, it risks neglecting the intangible benefits of human creativity, collaboration, and intuition. The relentless focus on efficiency could stifle innovation and lead to a narrow, transactional understanding of value.
Bridge Questions: How might different industries or organizational cultures respond to this approach? Are there alternative ways to measure success with AI that don’t rely solely on quantitative metrics? What responsibility do technology companies have to educate businesses about the limitations of data-driven decision-making?
Counterstrike Scan: If a coordinated influence campaign were leveraging this narrative, they would likely amplify the “Time To Production” metric to create maximum pressure on businesses to demonstrate immediate ROI. They would frame any perceived failures as evidence of poor leadership or inadequate investment in AI. They'd likely create a "benchmark" of successful AI deployments to shame companies that aren't achieving these unrealistic targets.

Sentinel — Likely Human

Confidence

This article presents a cautiously optimistic view of AI's impact, employing a structured framework of established business theories and a heavily measured case study. While the content appears well-reasoned, the stylistic elements and reliance on broad attribution suggest a high probability of AI assistance in its construction.

Signals Detected

High hedging density – frequent use of ‘it’s worth noting,’ ‘one could argue,’ ‘to be fair’ creates a cautious, somewhat artificial tone. Sentence length is relatively consistent, leaning toward longer sentences, a pattern more typical of human writing.

The framing of the argument as ‘Is this thing working?’ and the reliance on Drucker and Grove’s theories, while seemingly insightful, feels overly constructed and lacks genuine passion or unique perspective. It presents a 'both sides' argument that is not typical of journalistic writing.

The text relies heavily on attribution without specific details – ‘experts say,’ ‘studies show’ – a common tactic to avoid detailed verification and create a sense of authority, typical of synthetic content.

The Trax Technologies example, while presented as a real-world case study, relies on specific quantitative data (826,000 exceptions) presented without methodological context or a source for verification. The rapid experimentation and tripling of resolution rates feel engineered for illustrative purpose.

Human Indicators

The inclusion of personal anecdotes and detailed biographical information about the authors and contributors creates a sense of human voice.

The emphasis on pragmatic, measurable outputs aligns with a common business-oriented approach to problem-solving.