Return on Tokens (ROT)
Source studied: "Return on Tokens (ROT)", Not Boring by Packy McCormick, co-written with Markie Wagner, published 10 Jun 2026. Forwarded to the Pinnacle investment team by Luther Chiam (senior, investment team) on 11 Jun 2026. This note captures the essay's full argument; quotes are preserved where the wording carries the point. Any of my own framing is marked (added context). Companion bear-case reading: Peak Cheap, The AI Boom Isn't 2000, It's 2008 (the financing/earnings-bubble risk view of the same AI build-out). [!tip] One-sentence thesis The AI industry's dominant metric, how many tokens you spend ("tokenmaxxing") , is a vanity metric; the real question is Return on Tokens (ROT): does the token spend create more value than it costs? The author's answer: stop using probabilistic Agents as the thing that runs a business, and instead use AI as a compiler that turns business processes into cheap, deterministic code.
Abbreviations used in this note (spelled out on first use):
- AI: Artificial Intelligence.
- Token: the unit of text an AI language model reads/writes and is billed on; roughly ¾ of a word. "Spending tokens" = paying an AI lab per unit of model usage.
- ROT: Return on Tokens (the essay's coined metric).
- TDR: Thinking-Doing Ratio (also coined here).
- CapEx / OpEx: Capital Expenditure / Operating Expenditure.
- FCF: Free Cash Flow.
- VC: Venture Capital.
The big picture: a business as evolving software#
The essay opens with a worldview that everything else hangs on.
The idea#
- The promise of AI is to turn a business into software so it can evolve through "millions of tiny iterations." The claim: "You cannot build perfection, only discover it", beautiful, complex things emerge from trial-and-error over time, not from being designed top-down.
- Capitalism is framed as "organizational evolution." Millions of firms compete; some thrive, some die; each firm itself evolves (people, processes, products come and go). An experiment becomes a process; a process becomes "a web of tacit knowledge."
- The author's mission ("Good Quest"): accelerate this evolution so every business can reach its ideal form, claimed to be worth "trillions of dollars."
[!tip] Intuition Hold this frame: the end-goal is not "AI that acts like a smart human employee." It is a business that behaves like a living piece of software, constantly editing and testing itself toward its ideal form. Every later argument serves that frame.
Why the current implementation is "the wrong way"#
The author argues the market's first attempt at deploying AI is "too wasteful, too forgetful, and too imprecise", not the endgame. The rest of the essay diagnoses why.
Tokenmaxxing: the cycle's dumb metric#
The idea#
- Tokenmaxxing = literally maximizing the number of tokens a person or organization spends, tracked on leaderboards and rewarded with prizes. The author calls it "a mass delusion, something like a commercial form of AI psychosis" and a "lab-grown supermeme."
- The mechanism (why everyone did it): the AI labs sell tokens, so the incentive to spend cascaded down the org chart, "The market incentivized companies to spend tokens, so boards incentivized leaders… so leaders incentivized managers… so managers incentivized employees… Nobody had an incentive to say that the tokens aren't doing useful stuff." This is a classic principal-agent / misaligned-incentive problem.
How the labs structured it (the worked example)#
The essay's concrete illustration of the incentive trap:
- Anthropic and OpenAI ship Agents: their coding/work products Claude Code / Cowork and Codex: into customer companies. (An Agent here = an AI system that autonomously takes actions/uses tools to do work, not just chat.)
- Think of these Agents as "digital [lab] employees with no-limit [customer] credit cards": the more they spend, the "better" they look under tokenmaxxing, but the spend flows back to the lab (the Agent's "real employer").
- A customer (the example uses KPMG, a Big Four professional-services firm) signs a token commit: a committed spend level in exchange for discounts, then encourages staff to use Agents for everything, with dashboards everywhere. Employees who burn the most tokens get crowned "AI Innovators."
The killer retort: "Skill Issue"#
- Skeptics who asked "are the Agents doing anything useful? … show me something useful they've built" were dismissed with "Skill Issue": i.e. if you can't get value, the fault is your lack of skill, not the tool.
- The author's analogy: claims of huge value were like "some people had a super hot girlfriend at summer camp but you've never met her", asserted, never demonstrated. Doubters were told they belonged to the "Permanent Underclass."
The accuracy tell (the number that matters)#
A recurring real conversation the author reports:
- AI-team lead: "We've made a ton of progress… We spent $50 million on tokens… Usage is up. We've built 3,000 Agents. We shipped 10 million lines of code."
- Author: "did you measure accuracy for the fraud Agent?" → "Yeah… it's about 50%."
[!warning] The core problem in one number 50% accuracy on a production task (fraud detection) when "People are at 99%!" Activity metrics (spend, Agents built, lines of code) were celebrated while the metric that determines whether the work is usable was ignored.
The trigger: subscription → consumption pricing#
- Crucially, "all of this happened… right as the labs switched from subscription-based to consumption-based revenue models." Under a flat subscription, heavy usage is subsidised; under consumption (pay-per-token) pricing, every token is billed, so "token usage, and therefore lab revenue, went parabolic."
- (Added context: this is the structural reason ROT suddenly matters, when usage was a fixed monthly fee, waste was invisible; once each token has a price, waste shows up directly in the profit-and-loss statement.)
The spell breaks (who called it out)#
The author lists the moment the consensus flipped (the Mean Girls "raise your hand if you feel personally victimized" scene, one hand, then all hands):
- Uber broke the spell: its Chief Technology Officer (CTO) said Uber had burned through its entire 2026 Claude Code token budget by April; its Chief Operating Officer (COO) said AI spend was getting "harder to justify" because the link between AI consumption and shipped features "is not there yet."
- A consultant: a client "accidentally burned half a billion dollars on Claude Code."
- Amazon shut down its AI leaderboard.
- Legora CTO Jacob Lauritzen: token leaderboards "lead to tokenmaxxing, which is people just burn tokens just to look good. That's a really stupid way to do anything."
- Ramp's Veeral Patel coined the "Token Casino": "useful software wrapped in mechanics that make spend feel like progress. It starts with the oldest trick in the book: abstract the money."
- Palantir CEO Alex Karp: tokenmaxxing is like "a porn addiction."
- OpenAI's Sam Altman (a "token vendor" himself) admitted on CNBC that companies are unsure "how long do I have to wait for it to really show up in revenue… and how long to really get the costs under control", calling it a "huge issue."
Every cycle has its dumb metric (the historical pattern)#
This is the essay's most useful mental model for investors: markets repeatedly fixate on an activity proxy, reward it, over-build it, yet the genuine winners are the ones who later convert that activity into returns/cash flow.
| Era | The "dumb metric" the market rewarded | Who actually generated returns from it | |, -|, -|, -| | Mid-19th century | Miles of railroad track laid (proxy for future monopoly), built redundantly along the same routes | Vanderbilt's New York Central; the Pennsylvania Railroad | | Turn of the 21st c. (dot-com) | Eyeballs (web traffic / page views) | Google, Facebook: "converted eyeballs to cashflow better than anyone has ever converted anything to cashflow" | | 2010s | Top-line gross revenue (growth at all costs) | Uber: turned top-line growth into market dominance and $10 billion of 2025 free cash flow; counter-example WeWork delivered the metric but not the returns | | This cycle (2025-26) | Tokenmaxxing (tokens spent) | TBD, the open question of the essay |
[!tip] Investor takeaway The metric is not inherently worthless, track miles, eyeballs, revenue, and tokens all can become valuable. "The question is always: can the thing generate returns?" For tokens, that question is ROT. (Added context: this is a directly transferable lens for evaluating any AI-exposed holding, see The AI Value Chain and Holdings Master Table.)
The ROT metric itself#
Core definition & formula#
Tokens must be held to the same standard as any other business investment (a machine, a hire): they must return more value than they cost.
Return on Tokens (ROT) = (Value of Output − Cost of Tokens) / Cost of Tokens × 100
- Value of Output: the economic value the token spend actually produced.
- Cost of Tokens: what you paid the AI lab for that usage.
- Expressed as a percentage; a positive ROT means the AI work created net value, a negative ROT means it destroyed value (you spent more on tokens than the output was worth).
[!tip] Intuition, it is just Return on Investment for AI spend ROT is Return on Investment (ROI) applied to AI tokens. (Added context: ROI = (gain − cost) / cost. The essay's contribution is insisting the AI line item be evaluated like any other capital deployment, not as a vanity score.)
The two levers#
There are exactly two ways to raise ROT:
- Create more valuable output with the tokens (raise the numerator); or
- Spend fewer / cheaper tokens for the same output (cut the denominator).
- Ideally both, "spend less to create more value."
Because output value is harder to measure, companies attacked the easier lever first: spending less.
Lever 1, spend less: routing (a good start, not the endgame)#
The idea#
- Routing = sending each task to the cheapest model that can do it well: use Anthropic's / OpenAI's best (most expensive) frontier models only for "the really big brain stuff," and do most work with cheap Chinese open-source models.
- Coinbase CEO Brian Armstrong's framing (quoted): demand for intelligence is near infinite, but ~80% of workloads will run on ~99%-cheaper models within 12-18 months, while ~20% stays on latest-generation models where raw "IQ" matters (e.g. scientific breakthroughs).
- Evidence cited: the OpenRouter model-usage rankings show the shift toward Chinese models appearing in lockstep with the move to consumption-based pricing, i.e. once people paid per token, they immediately started economising. (OpenRouter = a service that routes requests across many models; its rankings reveal what the market actually uses.)
[!warning] But routing is only a partial fix "Agents spending tokens, American or Chinese, to figure everything out from scratch is not endgame, either." Cheaper tokens still waste money if the architecture is wrong. The deeper fix is the next section.
Lever 1, deeper, the cheapest thing of all is code#
The idea#
- "Because you know what's cheaper than Chinese models? Code." Deterministic code is both cheaper and "a better fit for most economically valuable work."
- Historical lesson, "When Computers Were Human": before "computer" meant a machine, it meant a person who did repetitive calculations (e.g. missile trajectories, business profits, the essay cites NASA's human computers). ~50 years ago we handed those repetitive tasks to software, which "ran more reliably than even the most reliable human computer. It made no mistakes. It answered instantly."
- Determinism defined: "Enter the same numbers and same formulae in the same cells in an Excel spreadsheet anywhere in the world, at any time, and it spit out the same number." Deterministic = same input → same output, every time. This is exactly what probabilistic AI Agents do not guarantee.
- The forgotten lesson: with Agents we "forgot" this and started "throwing these pseudo-humans at everything because everyone else was." Agents suit some tasks but are "not the right shape for a lot of others." Hence negative ROT: "All the dashboards have been dashboarded, and now they're sending Agents to do software's job."
Lever 1, the structural diagnosis, why Agents have negative ROT#
The essay gives three structural reasons Agents are the wrong architecture for most work (note: most, not all).
1. Agents can't hit the "nines" of quality long-running work needs#
- Agents improvise and are "spawned fresh onto repetitive tasks like every day is their first day on the job", i.e. no persistent memory of the task, which hurts consistent accuracy.
- "For new features, prototypes, or dashboards, 80% accuracy is fine. For the real repetitive work on which the economy runs, like fraud detection or underwriting decisions, 80% accuracy is 0% usable."
- "Nines of quality" = how many 9s in your accuracy/uptime (99%, 99.9%, 99.99%…). (Added context: each extra 9 is an order-of-magnitude fewer errors; mission-critical back-office processes demand several 9s, which improvising Agents don't reliably hit.)
2. Engineers don't know what to build because they don't do the work#
- Process-driven work = "a combination of written rules, which Agents can ingest, and then like 3,000 tacit rules and sub-rules that live in people's heads," in offices far from "the engineers' San Francisco desks."
- Tacit knowledge = undocumented know-how in workers' heads, never written down.
- "AI can only evolve what it can touch, which is why it's been great at coding but has largely failed to do useful things in the enterprise." (Coding is fully visible to the AI; a Nebraska claims-processor's tacit rules are not.)
3. The original sin: there are no goals#
- "If people have no goals then the Agent has no goals, and then the thing achieves no end." Without a goal to hill-climb against, "code… decays into slop in the limit because there's no purifying force to evaluate what's good and bad."
- Hill-climbing = an optimisation metaphor: to improve you must be able to measure whether each change moved you "uphill" (better) or "downhill" (worse). No goal = no measure = no improvement, just drift.
- The laziness trap: tokenmaxxing rewards setting Agents loose on vague instructions; they "spin on a vague instruction… bring something back that's decent but not perfect, and then go out and spin some more", "more token spend without delivering any value… a fast track to negative ROT."
The central reframe: AI is a Compiler, not a Runtime#
This is the essay's thesis sentence and the most important idea to retain.
The idea#
Software has two phases:
- Thinking (compile): take the goals and requirements for what software should do and turn them into code a computer can run.
- Doing (run / runtime): every time the thing is needed, the code runs cheaply and deterministically, over and over.
- A compiler (computer-science term) converts human-written instructions into machine-runnable code. The essay generalises it: "you can also think of a software company or a software engineer as a compiler", they convert goals/requirements into code, which customers then run repeatedly.
- This is the magic of zero-marginal-cost software: code that cost millions to build is sold for $20/month at huge margins, because building (thinking) happens once and running (doing) is nearly free.
The mistake vs. the correct model#
| | Wrong way (today's default) | Right way (the essay's claim) | |, -|, -|, -| | What the Agent replaces | Both the software company and the software (the Agent thinks and does, forever) | Only the software company: the Agent takes goals/requirements in English and compiles them into deterministic code | | Who does the "doing" | The Agent, improvising every run (expensive, ~80%) | Code, running the same steps every time (cheap, deterministic, many 9s) | | When AI is invoked | Constantly | Only when the rules change (then it re-compiles) | | Cost character | OpEx: recurring per-run token spend | CapEx: a one-time "build" cost, then near-zero to run |
[!tip] The one line to memorise "Thinking is expensive but happens rarely. Doing is cheap and happens forever. Agents should do the thinking, code should do the doing." Practical rule: "use humans to figure out the rules, use AI to turn the rules into code, and then run that code forever at near-zero token cost, only bringing the AI back in when the rules change." Or bluntly: "Why would you use a prompt to add two numbers? Just write a line of Python, dog."
Thinking-Doing Ratio (TDR)#
- Coined metric: TDR = ratio of "thinking" work to "doing" work in an AI deployment.
- Current AI implementations run at roughly 1000:1 thinking-to-doing: "San Francisco is a Thinking town" (Anthropic's hats literally say "thinking").
- The author's claim: "Silicon Valley built AI assuming work is mostly thinking, but work is mostly doing." So the ratio is backwards.
- The exception: chat / customer support, "where you genuinely don't know what comes next", improvisation is appropriate there (and even those Agents escalate hard cases to humans). "Almost nothing else in a business looks like constant improvisation."
The memorable framing#
- "Agents are the BlackBerry of doing", a transitional technology, not where work lands long-term. In five years most work "will get done in the deterministic code that they [Agents] write," not in the Agents themselves.
- "Everyone thinks the thing that is going to change in the world is that AI is going to become a person, but the real change is that a business is going to become a piece of software."
The company behind the thesis: Poetic#
The essay is also a pitch for the author's company. Worth knowing as an investment-team data point (a private company attacking enterprise AI deployment).
What Poetic claims to be#
- "A new class of software: adaptive like AI, reliable like code", the "antidote to tokenmaxxing: software that tokenminns itself" ("tokenminn" = minimise tokens, the opposite of tokenmaxx).
- Method = AI as compiler, in practice:
- Learn the business: ingest all written processes, then go on-site ("Nebraska or Providence or wherever the work is done"), sit on workers' shoulders, and ask "What did you just do?" / "Why did you do that?" hundreds of times to extract the thousands of tacit rules.
- Compile to code: turn that knowledge into deterministic code, which becomes the runtime. When the world is stable, the code runs identically every time.
- Evolve on change: when the world changes, the system "learns, regenerates, tests itself against the objective, and then runs the new code until the world changes again."
- Claimed result: "100x less token usage and nines of accuracy on complex tasks", each token does ~100x more work, correctly. Tokens are spent only when the world changes → measurable positive ROT.
Backtesting / shadow runs (why this is powerful)#
- Running in production gives a record of every step for every case (dispute, underwriting case, insurance claim). So every process change becomes testable: "The answer to 'what if' is known after minutes of backtesting. Run both scenarios in shadow, compare outcomes, decide which is better."
- Backtesting = re-running a proposed rule against historical data to see what would have happened. Shadow run = running a new process alongside the live one without acting on it, to compare outcomes risk-free.
- Consequence: "When impact is entirely known, there is little risk… Change simply becomes a choice." Example: "What if we approved every dispute under $25?", answerable in minutes. Humans (the "process lead") then just choose the best outcome and hill-climb toward the ideal process.
- Division of labour: "Humans exist to define what good looks like, not how to get there."
Customers & validation (named in the essay)#
| Company | What it is | Validation quote | |, -|, -|, -| | AIG (American International Group) | global insurance | CEO Peter Zaffino: Poetic "achieved 99%+ quality outcomes on multi-hour processes, delivering real enterprise value." | | SoFi | digital personal-finance / fintech | named customer | | Chime | US neobank | named customer |
- Team: mostly engineers, "a lot of them ex-Palantir," who spend weeks on-site. Branded "Social Engineers", "Engineers who understand people, business, and AI will rule the world." (Contrast with engineers "who spend all day at a desk prompting Claude.")
The closing worldview (evolution)#
- The endgame: "Every business is not just a piece of software; it's a piece of software constantly editing, testing, evaluating changes," evolving at the "highest frame-rate possible" toward its "most correct form." Tokens are spent to turn the business into code and evolve it, not to run it: "We tokenminn to ROTmaxx."
- The evolution analogy: life went from ocean slime → fish → lizards → monkeys → humans through trial and error over billions of years. "We don't want to wait billions of years for businesses to evolve… and Agents won't build them. You cannot build the butterfly. Beautiful, ideal, complex things can only emerge through evolution."
- (Added context: the rhetorical move is to recast AI's role from "improvising worker" to "evolutionary selection pressure", AI's job is to mutate, test, and select better versions of a business's code, while humans set the fitness function, i.e. "what good looks like.")
Common pitfalls / what to watch for (critical reading)#
- This is a vendor essay. The framework (ROT, compiler-not-runtime) is genuinely useful, but the conclusion conveniently sells Poetic. Treat the "Agents are the wrong architecture" claim as a strong thesis, not settled fact.
- "Most work, not all." The author repeatedly hedges: chat/support and genuinely novel/creative work still suit Agents. The claim is about repetitive, process-driven back-office work.
- Survivorship in the historical analogy. Railroads/eyeballs/revenue had big winners and big wipe-outs (WeWork). The metric becoming valuable is not guaranteed, it is the open question.
- Numbers are illustrative. The $50M, $500M, "1000:1 TDR," "100x," "50% accuracy" figures are rhetorical/anecdotal, not audited, use them as directional, not precise.
Exam / desk focus, why this matters for PCA SOF#
[!tip] Connect this to the portfolio The Strategic Opportunities Fund is heavily AI/tech-exposed (see The AI Value Chain). ROT is a lens for stress-testing those positions:
- The bull case for compute names: NVIDIA, Micron Technology, TSMC, Marvell Technology, rests on "demand for intelligence is near infinite." ROT/routing is the bear risk: if 80% of workloads move to 99%-cheaper models and to deterministic code, token (and therefore compute) demand growth could decelerate or shift away from frontier chips toward cheap inference.
- The "AI value capture" debate for software names, Microsoft, Salesforce, ServiceNow, Snowflake, Datadog, is exactly the ROT question: are customers getting positive Return on Tokens, or cutting AI budgets (the Uber signal)?
- Power/energy names: Constellation Energy, Vistra, are an indirect token bet (AI data-centre electricity demand); the same demand-deceleration risk applies.
- Palantir is referenced approvingly (Poetic's team is ex-Palantir; the "go on-site, learn the business, compile to code" model is Palantir-like Forward-Deployed Engineering). Relevant to the enterprise-AI-services thesis.
The transferable question for any AI-exposed holding: what is the customer's Return on Tokens, and is it rising or about to be cut?
Linked Notes#
- Related Readings: Peak Cheap, The AI Boom Isn't 2000, It's 2008 (the financing/ earnings-bubble counterpart).
- Related Themes: The AI Value Chain · Enterprise Software & Agentic AI.
- Related Thesis / Risk: PCA SOF Investment Thesis · Risk Master Note.
- Related Holdings: NVIDIA · Microsoft · Salesforce · Datadog.