ai gross margins are running about 52% in 2026, up from 41% in 2024 but still far from the 75-80% that mature saas earns. every $1m of ai revenue drags about $230k of inference compute with it. that line is variable. it scales with usage. it can't be cut without cutting the product.
inference is the new payroll. it's the largest line below revenue, it grows with the business, and it's the line that decides whether your beautiful arr chart is actually a beautiful business.
what the numbers actually look like
a16z's 2026 ai gross margin report puts the spread roughly here:
- mature saas (b2b, no ai): 75-80% gross margin. cogs is mostly support, hosting, payment processing.
- ai-powered saas (ai is a feature): 60-65% gross margin. some inference cost, but it's not the main motion.
- ai-first products (chat, agents, generation): 45-55% gross margin. inference is the largest line item below revenue.
- agent products with autonomous tool use: 30-45% gross margin. every customer interaction triggers cascading llm calls.
if you're building in the bottom two tiers, your gross margin is structurally lower than the saas comp set, and it isn't going up by itself. founders sometimes assume "model costs will drop, gross margin will catch up." model costs do drop. but you also use more of them as the product gets more capable. net effect: gross margin doesn't catch up. it stays in this band.
the uncomfortable consequence
your best customer is also your most expensive one.
in mature saas, the most engaged customer is the highest-margin one. they pay you monthly, use the product more, expand their seats. cost to serve barely moves.
in ai-first saas, the most engaged customer is the one running 10x the inference. they use the product more, which means more model calls, more tool invocations, more tokens. your cost-to-serve scales linearly with engagement.
a customer who pays you $500/month and runs 100 prompts a day costs you $30/month in inference. fine. 60% margin on that customer.
the same customer six months later, deeply engaged, running 2,000 prompts a day with agent loops, now costs you $400/month in inference. you're at 20% margin on the customer you love most.
founders catch this late because the headline arr keeps growing. revenue is up, customers are happy, the demo crushes. nobody's looking at the inference line per customer because it's buried in the cloud bill.
what to model that you weren't modeling
three things, all of them painful to set up but useful to know.
cost per customer per month, broken out by inference.
not your blended cogs. the per-customer line. the top 10% of users by inference cost will look very different from the bottom 90%. you need that distribution to make pricing decisions.
marginal cost of the most expensive user actions.
some product actions cost almost nothing, like showing a ui or fetching from cache. some cost a lot, like generating a 100k-token response with multiple tool calls. knowing which 5% of actions drive 50% of inference cost is what lets you optimize.
inflection point on usage-based pricing.
flat-rate subscriptions work great until your best customer's inference cost exceeds their subscription. at $50/month flat, a power user costing $80/month in inference is a net loss. the answer is either usage-based pricing, tiered pricing, or a hard usage cap. nobody likes any of those, but the alternative is losing money on your biggest fans.
the pricing models that actually work
three approaches for ai-first pricing in 2026.
-
flat-rate with tiered usage caps. $20 / $50 / $200 per month, each tier with a usage allotment. simple. familiar. easy to land. risk: power users who hit caps churn.
-
flat platform fee + usage-based. $99/month base plus $0.05 per unit. accounts for the bimodal distribution. most customers pay close to the platform fee; power users pay more. risk: complicates billing.
-
outcome-based. $x per successful agent task, $y per qualified lead. aligned with value. very high gross margin when it works. risk: outcome attribution is hard.
anthropic, openai, and most enterprise ai vendors landed on a mix of (1) and (2) by mid-2026. consumer ai companies moved to (1). the verticals are experimenting with (3).
what gross margin tells you that arr doesn't
a $5m arr ai company with 50% gross margin generates $2.5m of gross profit. that's the dollars that pay for everything else — payroll, marketing, ops, the office, your salary. it's also the line investors compare to your burn.
burn multiple ignores gross margin and looks at net cash burned per dollar of new arr. but a healthy burn multiple at a low gross margin is a different conversation than a healthy burn multiple at high gross margin. with 80% gross margin, the new arr you add fuels a high-margin engine. with 50% gross margin, you have to add a lot more arr to fund the same level of operations.
rough rule: an ai company at 50% gross margin needs to be growing top-line ~50% faster than the equivalent saas company to generate the same operating leverage. that's the whole story.
mature saas earns 78 cents on the dollar, ai-first earns 52, and the discipline that closes the gap is knowing the inference line per customer before the burn multiple tells you it's too late.
how zift handles this
zift reconciles your bank, stripe, and ad spend in real time. for ai-first companies, that includes pulling the major cloud and inference invoices (aws, gcp, anthropic, openai) so cogs is visible weekly, not quarterly. gross margin per period, in the briefing, with the line item that moved it.
if you're a finance lead who needs this with cost-allocation by customer or product line, zift handles that too.
inference is the new payroll. you can't run a startup without knowing payroll. you can't run an ai startup without knowing inference. the math has changed. the discipline has to.
