the founder pulls up the dashboard. blended gross margin is 70%. arr is up 22% quarter-over-quarter. the board deck writes itself. then someone asks for cost-per-customer broken out by decile. the meeting changes.

ai pricing in 2026 has a familiar shape — a flat $20 seat, sometimes $50, sometimes $200, with "unlimited" or "fair use" buried in the fine print. it's the saas pricing playbook that works for everything except ai. saas seats have flat cost. ai seats have power-law cost. one of the two ends in tears.

most ai companies built their pricing page in 2023 when the median user was the only kind of user. the heavy tail showed up in 2024 and broke the math. the founders who noticed early changed pricing in 2025. the ones who didn't are explaining to investors why margin keeps drifting down.

blended looks fine. the decile tells the truth.

the headline 70% gross margin is real arithmetic. total revenue minus total cogs divided by total revenue. it's also a blended average across a usage distribution that isn't anywhere near normal.

ai usage follows a power law, and inference is the new payroll line on an ai startup's p&l. in most flat-priced ai tools, the top 5% of users by inference consume 60-80% of total inference cost. the median user uses the product like a search engine — five prompts a day, maybe twenty. the top decile runs agent loops, long-context chats, and recursive tool calls all day. their cost-to-serve is 20-40x the median.

at a $20 flat seat, the median user costs you $3/month and you keep $17 in margin. the top decile user costs you $80/month and you lose $60. blend them together and the spread looks healthy because most of the customer count is in the cheap bucket. the dollars are in the expensive one.

gross margin · blended vs decile$20 flat ai seat

blended gm

70^%

the slide

median user gm

85^%

d1-d5

d9 gm

40^%

profitable, barely

d10 gm

-300^%

$60 loss / mo

the d10 line is the company. four percent of customers driving negative ninety percent of inference profit. the dashboard you check every monday doesn't see them because they're averaged in with the median.

the four ways out, and what each one costs

four pricing patterns are doing the work in 2026. each fixes the power-law problem differently. each has a tax.

usage caps inside the flat tier. the Cursor approach. "pro is $20/mo and includes 500 fast requests. anything past that is metered." the pricing page stays familiar, the heavy user self-selects up, revenue per customer goes up. churn ticks up slightly among the customers who were unprofitable anyway. default move for a reason.

credit packs. the ChatGPT-style approach, also Replit's agent model. flat tier gives you a credit balance. heavy actions burn credits faster. cleaner to communicate than caps because the balance is visible in real time. friction is that customers don't know what an action costs until they do it.

bring-your-own-key. the Aider and dev-tools approach. customer connects their own Anthropic or OpenAI account. you charge a flat platform fee. inference cost disappears from your p&l. gross margin jumps 30 points overnight. only works for technical buyers, and you lose volume-discount leverage with the model providers.

per-task pricing. the Devin approach. "$X per completed task." aligned with value. high gross margin when it works. brutal when the customer disputes whether a task was completed. forecasting becomes hard because revenue is per-event, not per-seat. most defensible long-term and the hardest to bootstrap.

most founders pick one. the better ones run two — flat with caps for the median, BYOK or per-task for the heavy tail. whichever you choose, the choice is the margin, because inference margin is a pricing decision, not an engineering one.

why most founders default to caps

three reasons.

one. caps preserve the familiar pricing page. the customer sees $20/mo and a feature list. the cap is one line. funnel conversion stays close to what saas funnels are tuned for. credit balances and per-task convert worse on the first purchase.

two. caps are reversible. you can soften a cap with a free overage, lift it for an enterprise customer, or remove it if the math improves. credit packs and per-task are harder to unwind because customers anchor to them quickly.

three. caps preserve the median-user economics that made the funnel work. the median customer was profitable at $20 flat. the cap doesn't touch them. only the heavy tail feels it, and the heavy tail was the math problem.

the flat $20 ai seat looks like saas pricing — it isn't, because saas pricing was designed for a cost line that doesn't move with usage, and the entire premise breaks the moment usage decides cost.

the founder excuse that keeps the trap closed

"if we cap usage, our power users will churn to a competitor with no cap." sometimes. the competitor with no cap is also losing money on those users. one of you fixes pricing first. the other goes out of business.

the better framing — at a healthy gross margin, you can spend more on r&d, ship faster, raise more cheaply. the company with broken pricing can't do any of that. the power user who churned to a competitor with worse pricing usually comes back six months later when the competitor either fixes their pricing or runs out of cash.

how zift handles this

zift pulls your stripe revenue, your Anthropic and OpenAI invoices, and your cloud bill into one view — gross margin blended, plus the same number broken out by customer decile. every monday you see whether the heavy tail is moving against you and which customers are on the wrong side of the cost line. before you debate the next pricing page revision, you see the decile.

if you're a finance lead at an ai-first company with multiple products or pricing tiers, zift handles that too.

flat pricing on variable cost is a strategy. it's a strategy for going bankrupt while reporting great unit economics.

The Token Pricing Trap: Fixed Price on Variable Cost

blended looks fine. the decile tells the truth.

the four ways out, and what each one costs

why most founders default to caps

the founder excuse that keeps the trap closed

how zift handles this

Finance reports to you, not the other way.

The Token Pricing Trap: Fixed Price on Variable Cost

blended looks fine. the decile tells the truth.

the four ways out, and what each one costs

why most founders default to caps

the founder excuse that keeps the trap closed

how zift handles this

More on this topic.

Marketplace Take Rate Is the Only Number That Matters. GMV Is Vanity.

The Investor Reference Call That Passes on You

Eval Cost Is the AI Line Item Nobody Budgets For

Finance reports to you, not the other way.