AI Gross Margins: Why Your Best Feature Loses Money

Ship a great AI feature and something strange happens to the P&L. Usage climbs, customers rave, the demo lands every time — and the better it performs, the more money it quietly burns. That's the part nobody warns you about. In the old SaaS world, your best feature was almost free to run. In the AI world, your best feature is your biggest variable cost. AI gross margins are the new product problem, and pretending they belong to finance is how teams sleepwalk into a business that grows revenue and loses money at the same time.

Here's the uncomfortable shape of it.

The token tax is real, and it's big

Classic software companies run gross margins in the 80–90% range because the marginal cost of one more user is rounding-error close to zero. AI breaks that. Every prompt, every agent step, every generation triggers real compute. The result is a structural haircut: AI product builders expect average gross margins of about 52% in 2026, per ICONIQ's State of AI survey — 20 to 30 points below the SaaS baseline most boards still anchor on.

Call it the token tax. One teardown of the trend puts it bluntly: 20 to 30 margin points evaporate because of a single product decision. And this isn't a startup-only problem. Salesforce's Agentforce blew past $800 million in ARR while processing nearly 20 trillion tokens. At that volume, a few cents per thousand tokens becomes a line item with its own zip code.

The most cited cautionary tale is GitHub Copilot, which reportedly cost Microsoft up to $80 per heavy user per month against a $10 price — roughly a $20 average loss per seat. A product people genuinely love, sold at a loss, because the pricing model and the cost model were designed in different decades.

Why this is a product problem, not a finance one

It's tempting to file this under "the CFO's job." Isolate AI COGS, tweak the model, move on. But every lever that actually moves AI gross margins lives inside product decisions:

Which model you call for which job.
How much context you stuff into every request.
Whether the feature is metered or all-you-can-eat.
Whether the output is good enough that users don't retry three times.

Finance can't fix any of those from a spreadsheet. They're roadmap choices, interface choices, and prompt-architecture choices. A PM who decides "let's just send the whole document to the biggest model on every keystroke" has made a margin decision whether they know it or not. It's the slower-burning cousin of vibe coding's bill coming due — different invoice, same lesson.

Where AI gross margins actually leak

In my experience the damage clusters in four places:

Over-modeling. Routing trivial calls to a flagship model because it's the default. Most steps in an agent don't need your most expensive brain.
Context bloat. Re-sending the same system prompt, schema, and history on every turn, at full price.
Free retries. When quality is shaky, users re-run the same request until it works — and you pay each time.
Flat pricing on variable cost. A per-seat plan where your heaviest 5% of users burn half the tokens.

Notice that three of the four are things product owns outright.

Five moves to win the margin back

1. Meter the work, not the chair. Per-seat pricing assumes everyone costs the same to serve. Under AI, they emphatically don't. Companies on pure seat-based pricing run gross margins roughly 40% lower than those with usage or outcome components, and around 92% of AI companies now use some mixed model. This is exactly why per-seat pricing is dying: the cost of serving a user is no longer flat, so the price can't be either.

2. Cache like your margin depends on it — because it does. A stable system prompt and repeated context are free money left on the table. Anthropic's prompt caching makes cache reads 10% of the base input price — a 90% discount on the tokens you send over and over. For a product with a fixed instruction set, that's often a one-week engineering job that pays for itself in days.

3. Right-size every call. Build a routing layer: a cheap model for classification and extraction, the flagship only for the reasoning that genuinely needs it. The goal isn't the smartest possible answer on every call — it's the cheapest call that clears the quality bar.

4. Make quality a cost strategy. Every retry is a double charge: you paid for the bad answer and you'll pay for the redo. Better evals, tighter grounding, and stricter output formats don't just delight users — they cut your token bill. Reliability is a margin lever, not just a trust one.

5. Price to the outcome where you can. The cleanest fix is charging for the result — a resolved ticket, a booked meeting, a finished document — instead of access. The catch: AI completes complex tasks only 50–60% of the time, so 40–50% of your compute earns nothing under a pure outcome model. That's why hybrids win — a platform fee that covers baseline cost, plus usage or outcome pricing stacked on top.

The takeaway

AI gross margins are not a back-office detail you reconcile at quarter-end. They're designed — into your model routing, your context windows, your retry rates, and your pricing page — by the same people who design the product. The teams that win the next few years won't be the ones with the flashiest demo. They'll be the ones who can ship something users love and still keep most of the dollar it earns.

So ask it on your next feature spec, before the thing ships and not after: what does this cost to run at scale, and who decided?