Dollars per Task: The New Metric That Makes the On-Premises Case
On June 16, 2026, Artificial Analysis added Cost per Task to its model intelligence index — a single number that folds token price and tokens used into the real bill you pay per unit of work. It exposes a gap the per-token price lists hide: the best open model costs $0.04 per task, the best proprietary model most organizations can actually buy costs $1.78, and on hardware you own the figure is zero. That gap is the on-premises case, stated in the only unit a buyer actually cares about.
The Metric That Prices What You Actually Buy
Per-token pricing is a unit cost, not a bill. Two models listed at the same price per million tokens can cost wildly different amounts to finish the same job, because the smarter one burns fewer tokens to get there and the chattier one burns more. For two years buyers have been comparing frontier models on a metric that does not map to what they pay. The capability leaderboard told you which model was smartest. It did not tell you which model was cheapest to put to work.
Cost per Task fixes that. It multiplies a model's per-token price by the tokens that model actually consumes to complete a benchmark task, and reports the result in dollars. One number. The real cost of one unit of work. Artificial Analysis rolled it into Intelligence Index v4.1 on June 16, 2026, alongside Time per Task and Tokens per Task. The index itself weights nine evaluations toward agentic work — Agents 34% / Coding 24% / Science 24% / General 18%, with a 95% confidence interval of ±1. Add the cost dimension and the leaderboard stops being a capability chart and becomes a buying chart, which is what a buyer needed all along.
That reframing matters because the open-weights frontier has caught up on capability. Once you measure both axes at once — how smart, and how much per task — the case for running these models yourself stops being ideological and becomes arithmetic.
What the Leaderboard Now Shows
The v4.1 table reads top to bottom in index points, with the new cost and time columns laid bare. The proprietary models sit above the open ones on raw intelligence; the open models sit below them on price by an order of magnitude:
| Model | Intelligence Index | Cost / Task | Time / Task | Type |
|---|---|---|---|---|
| Claude Fable 5 | 60 | $3.25 | — | Proprietary (unavailable in US) |
| Claude Opus 4.8 (max) | 56 | $1.78 | 6.4 min | Proprietary |
| GPT-5.5 (xhigh) | 55 | $0.99 | 3.7 min | Proprietary |
| Gemini 3.1 Pro Preview | 46 | — | 1.6 min | Proprietary |
| Grok 4.3 (high) | — | — | 1.5 min | Proprietary |
| Claude Sonnet 4.6 (max) | — | — | 13.5 min | Proprietary |
| DeepSeek V4 Pro (max) | 44 | $0.04 | — | Open weights |
| MiniMax-M3 | 44 | — | — | Open weights |
| Kimi K2.6 | 43 | — | — | Open weights |
Read it carefully. Fable 5 tops the index at 60 — and the US Commerce Department forced Anthropic to disable it for all foreign nationals on June 12, 2026, four days before this index shipped. Strip away the model most organizations can no longer use, and the best model you can actually buy is Claude Opus 4.8 at index 56, costing $1.78 per task. GPT-5.5 (xhigh) sits one point back at 55 for $0.99. Then the open models: DeepSeek V4 Pro, MiniMax-M3, and Kimi K2.6 cluster at 43 to 44 — within roughly twelve index points of the available flagship — and DeepSeek V4 Pro does it for $0.04 per task.
The honest read is that the open-weights frontier is no longer a capability compromise. It is within a dozen points of the best model you can buy, and it costs between twenty-five and forty-five times less per task. The capability gap that once justified proprietary pricing has collapsed. On the one metric that maps to a bill, the open models win — and that is before you run them on hardware you own, where the cost-per-task column does not shrink but disappear.
GLM-5.2 and Kimi K2.7 Code shipped in June 2026 — after this index was published — so they don't yet carry an Artificial Analysis score. Both publish vendor benchmarks at the open-weight coding frontier (GLM-5.2: SWE-bench Pro 62.1%, FrontierSWE 74.4%; Kimi K2.7 Code: beats Opus 4.8 on MCP Mark Verified). When AA scores them, expect them to land at or above the current open cluster — making the case here stronger, not weaker. See the side-by-side: Open Source AI Models Compared.
The Catch With Even the Cheapest Cloud Model
DeepSeek V4 Pro at $0.04 per task is the best value in the table by a wide margin — and it is still a cloud API. That matters for three reasons that have nothing to do with price, and they are the reasons the cost-per-task number is not the whole story.
A cloud API is a meter. Every task is $0.04 out of pocket, forever, scaling linearly with usage. At a thousand tasks a day that is $40 a day; at ten thousand it is $400. The rate is set by a vendor that can change it — and the entire history of frontier model pricing is the rate going up, not down, as the newest tier arrives.
The data leaves your network. DeepSeek's API runs on infrastructure subject to China's National Intelligence Law, which can compel cooperation with state intelligence work — the same sovereignty problem that hangs over Z.ai's GLM API. For a business handling customer data, proprietary code, or anything compliance-sensitive, that architecture is not acceptable no matter how cheap the model is.
Access is a lease, not ownership. A vendor can pull an endpoint for regulation, geopolitics, or business reasons — as Anthropic did with Fable 5 on June 12, 2026. When the endpoint goes, your workflow breaks overnight, and the $0.04-per-task efficiency you built a process around is gone.
The open weights are the escape hatch. DeepSeek V4 Pro, GLM-5.2, MiniMax-M3, and Kimi K2.7 Code all ship downloadable weights under permissive licenses. The cheapest way to use them is also the way that solves all three problems at once: run them yourself.
On-Premises: Where Cost per Task Goes to Zero
On hardware you own, the cost-per-task column does not shrink — it disappears. There is no per-token price to multiply by tokens used. The marginal cost of a task is the electricity to run hardware you already paid for. The capability is the model's; the economics are yours.
Zero Per-Token Cost
Run a thousand tasks or a million. The marginal cost is the same: zero. No usage tiers, no surprise invoices. Cost per task falls to $0 — the only entry in the Artificial Analysis table that does.
Immune to Price Hikes
When the next flagship launches at double the price, cloud users pay. When the $200 tier becomes $400, subscribers pay. On-premises users download the next open model and keep running. The hardware is already paid for; the model is free.
Your Data Stays Yours
No prompts, documents, or code leave your network. No National Intelligence Law, no CLOUD Act exposure, no vendor that can be compelled to produce your data. The cost savings are real; the sovereignty is irreplaceable.
The Model Is Yours to Keep
The weights are on your hardware. No government can revoke them and no vendor can deprecate them — the way Anthropic just deprecated Fable 5 for foreign nationals. You are not renting capability from a geopolitically exposed provider. You own it.
A Faraday Scale cluster — three Mac Studios with M4 Max chips and 128GB of unified memory each, $29,999 US with a full year of support — runs GLM-5.2 at INT4, DeepSeek V4 Pro and Kimi K2.7 Code at 2-bit, and MiniMax-M3 with room to spare, all at zero per-token cost. Amortized over three years that is about $833 a month, fixed, regardless of how many tasks you run; the hardware typically serves four to five. That is the cluster that turns the cost-per-task leaderboard's most important number — the $0 — from a slogan into your operating model.
The Math for a Real Team
Put a 10-person engineering team behind the cloud numbers. Claude Code Max at $200 a seat is $2,000 a month. GPT-5.5 Codex for the complex tasks the seats can't handle adds roughly $2,500. And the per-task meter that Artificial Analysis v4.1 just made explicit — call it about 800 tasks a month on Claude Opus 4.8 at $1.78 each — adds another $1,400. Cloud stack: roughly $5,900 a month, every month, climbing with usage and with whatever the vendors charge next.
The same team on a Faraday Scale cluster: $833 a month amortized. Same frontier-class open models. Zero per-token meter. Data stays in the building.
That is about $60,800 a year you stop paying cloud vendors — for infrastructure that runs the cost-per-task winners at $0 per task and keeps your data under your control. The metric Artificial Analysis just published is the same math a buyer does in their head; on-premises is the answer that has been hiding in plain sight behind the per-token price list.
References
[1] Artificial Analysis. (2026). "Intelligence Index v4.1: Cost per Task, Time per Task, Tokens per Task." Released June 16, 2026. Index methodology: 9 evaluations (Agents 34% / Coding 24% / Science 24% / General 18%), 95% CI ±1.
[2] Faraday Machines. (2026). "Open Source AI Models Compared: Kimi K2.7, GLM-5.2, MiniMax-M3 & DeepSeek." June 17, 2026. Available at: faradaymachines.com/open-source-model-comparison
[3] Faraday Machines. (2026). "GLM-5.2 Reached the Frontier — and It Can Cut Your Token Bill to Zero." June 17, 2026. Available at: faradaymachines.com/glm-5-2-open-source-sota-cuts-token-costs
[4] Faraday Machines. (2026). "DeepSeek V4: The Most Versatile Model You Can Run." Available at: faradaymachines.com/deepseek-v4-flash-pro-on-premises
[5] Faraday Machines. (2026). "Cloud AI Prices Are Only Going One Direction." Available at: faradaymachines.com/cloud-ai-prices-rising
[6] Anthropic. (2026). Fable 5 availability restricted for foreign nationals by US Commerce Department, June 12, 2026.
Run the Cost-per-Task Winners on Hardware You Own
Artificial Analysis v4.1 made the meter explicit — and the open-weights frontier wins it. GLM-5.2, DeepSeek V4 Pro, MiniMax-M3, and Kimi K2.7 Code run on a Faraday Machines cluster at $0 per task, with no vendor lock-in and data that never leaves your building.
Get a Fixed-Cost Quote