Open Source AI Models Compared: Kimi K2.7, GLM-5.2, MiniMax-M3 & DeepSeek
Which open-weight model is right for your use case? We compare coding performance, agentic workflows, compliance, the new Artificial Analysis cost-per-task metric, and on-premises deployment across the leading models of 2026.
Quick Comparison Table
| Model | Type | Parameters | Self-host footprint | Context | Input / 1M | Output / 1M | SWE-bench Pro | Best For |
|---|---|---|---|---|---|---|---|---|
| GLM-5.2 | Open | 753B total 40B active |
~380 GB (INT4) |
1M | $1.40 | $4.40 | 62.1% | Enterprise, long-context coding |
| MiniMax-M3 | Open | 428B total 23B active |
~214 GB (INT4) |
1M | $0.60 | $2.40 | 59.0% | Multimodal, 1M context |
| Kimi K2.7 Code | Open | 1.1T total 32B active |
~577 GB (INT4) |
256K | $0.95 | $4.00 | — | Agentic coding, long-horizon dev |
| DeepSeek V4 Pro | Open | 1.6T total 49B active |
~800 GB (INT4) |
1M | $0.44 | $0.87 | — | Cost-per-task leader, reasoning |
| Qwen 3.7 | Proprietary | Undisclosed closed weights |
N/A (proprietary) |
1M | $2.50 | $7.50 | — | Long-horizon agents (cloud only) |
| GPT-5.5 Codex | Proprietary | 1T+ Undisclosed |
N/A (proprietary) |
1M+ | $2.50 | $15.00 | ~60%+ | Complex coding, dev tools |
Pricing reflects API rates (OpenRouter and official vendor APIs) as of June 2026. "Self-host footprint" is the memory required to run each model quantized to INT4 on a Mac Studio cluster — the practical on-premises metric, since MoE models only load their active parameters per forward pass. Open-weight models can also be self-hosted on Faraday Machines hardware with zero per-token fees. SWE-bench Pro measures real-world software engineering capability across 1,865 tasks. GLM-5.2, MiniMax-M3, Kimi K2.7 Code, and DeepSeek V4 Pro are the current open-weights frontier; Qwen 3.7 dropped its open weights and is now API-only.
Cost per Task: The New Metric That Makes the On-Premises Case
On June 16, 2026, Artificial Analysis released Intelligence Index v4.1, adding three buyer-facing metrics — Cost per Task, Time per Task, and Tokens per Task — to its agentic-weighted index (9 evals; Agents 34% / Coding 24% / Science 24% / General 18%; 95% CI ±1). Cost per Task folds token price and tokens used into a single number: the real bill you pay per unit of work. The gap it exposes is the gap on-premises closes.
| Model | Intelligence Index | Cost / Task | Time / Task | Type |
|---|---|---|---|---|
| Claude Fable 5 | 60 | $3.25 | — | Proprietary (unavailable in US) |
| Claude Opus 4.8 (max) | 56 | $1.78 | 6.4 min | Proprietary |
| GPT-5.5 (xhigh) | 55 | $0.99 | 3.7 min | Proprietary |
| Gemini 3.1 Pro Preview | 46 | — | 1.6 min | Proprietary |
| Grok 4.3 (high) | — | — | 1.5 min | Proprietary |
| Claude Sonnet 4.6 (max) | — | — | 13.5 min | Proprietary |
| DeepSeek V4 Pro (max) | 44 | $0.04 | — | Open weights |
| MiniMax-M3 | 44 | — | — | Open weights |
| Kimi K2.6 | 43 | — | — | Open weights |
GLM-5.2 and Kimi K2.7 Code shipped in June 2026 — after this index was published — so they don't yet carry an Artificial Analysis score. Both publish vendor benchmarks that place them at the open-weight coding frontier (GLM-5.2: SWE-bench Pro 62.1%, FrontierSWE 74.4%; Kimi K2.7 Code: beats Opus 4.8 on MCP Mark Verified). See the full breakdown: Dollars per Task: the new metric that makes the on-premises case.
Choose by Use Case
I need a coding assistant
Recommended: GLM-5.2
GLM-5.2 is the current open-weight coding frontier, posting 62.1% on SWE-bench Pro and 74.4% on FrontierSWE. Its 1M-token context window lets you feed entire codebases for analysis. Z.ai ships it under a clean MIT license — the most permissive of any frontier model — which makes it ideal for healthcare, finance, and government deployments where license compatibility matters.
I need agentic workflows
Recommended: Kimi K2.7 Code
Kimi K2.7 Code is trained from the ground up for multi-step tool use and autonomous workflows. Its native agentic design handles web browsing, document synthesis, and complex task chains without external orchestration frameworks. For legal research, competitive intelligence, or any workflow requiring 10+ step reasoning, Kimi K2.7 Code delivers the most reliable open-weight performance.
I need the lowest cost per task
Recommended: DeepSeek V4 Pro
DeepSeek V4 Pro is the cost-per-task leader on Artificial Analysis v4.1: $0.04 per task, roughly 45× cheaper than Claude Opus 4.8 ($1.78) while sitting within ~12 index points of the available flagship. For high-volume reasoning and coding workloads, it's the model that makes the on-premises math impossible to ignore.
I need SOTA coding performance
Recommended: GPT-5.5 Codex
While proprietary, GPT-5.5 Codex remains the leader on complex software engineering tasks, scoring ~60%+ on SWE-bench Pro. OpenAI's Codex specialization shows in real-world development workflows. If you need the absolute best code generation and have budget for API costs, Codex is the choice — though GLM-5.2 now closes most of that gap on open hardware.
On-Premises Deployment Guide
Hardware Requirements
GLM-5.2
40B active parameters per forward pass. At INT4 the ~753B weights need ~380 GB — a tight fit on the 3-unit Scale (384 GB), with the 2-unit Growth (256 GB) comfortable at 2-bit. MIT-licensed, so it's the model most teams standardize on.
Recommended: Faraday Scale (3 × Mac Studio, 384 GB) at INT4, or Growth at 2-bit
MiniMax-M3
23B active parameters and ~214 GB at INT4 fit comfortably in the 2-unit Growth (256 GB), with room for KV cache and a 1M-token context. The most memory-efficient frontier open model in the lineup.
Recommended: Faraday Growth (2 × Mac Studio, 256 GB) at INT4
Kimi K2.7 Code
32B active parameters, but ~1.1T total means ~577 GB at INT4 — too large for a 2-unit cluster. At 2-bit (~340 GB) it fits the 3-unit Scale (384 GB); at INT4 you'll want 4+ units. Best for agentic workloads where decode time matters more than peak density.
Recommended: Faraday Scale (384 GB) at 2-bit, or 4+ units at INT4
DeepSeek V4 Pro
49B active parameters with ~1.6T total — ~800 GB at INT4, ~400 GB at 2-bit. The cost-per-task leader wants a 4-unit cluster at INT4 or the 3-unit Scale at 2-bit. For single-cluster deployments, DeepSeek V4 Flash Pro is the lighter sibling.
Recommended: 4+ units at INT4, or Faraday Scale at 2-bit
Qwen 3.7 & GPT-5.5 Codex
Both are proprietary and API-only — no weights to self-host. Qwen 3.7 dropped the open weights its predecessors shipped. For sovereign deployments, run GLM-5.2, Kimi K2.7 Code, MiniMax-M3, or DeepSeek V4 Pro on hardware you own.
Scaling Strategy
Add Mac Studio units linearly to increase capacity while maintaining the same $0 per-token cost:
Each additional unit doubles inference capacity while maintaining the same per-token cost ($0).
Quantization Options
Modern quantization techniques have dramatically improved INT4 quality. All the open-weight models above work well at INT4, especially with recent advances in quantization algorithms.
Model-Specific Recommendations
- GLM-5.2: Excellent INT4 support — ~380 GB on a 3-unit Scale. MIT license means no restrictions on commercial INT4 deployments.
- MiniMax-M3: ~214 GB at INT4 fits a single 2-unit Growth cluster. Community license permits commercial use with authorization.
- Kimi K2.7 Code: 2-bit (~340 GB) on a 3-unit Scale is the sweet spot; INT4 (~577 GB) needs 4+ units. Modified-MIT open weights.
- DeepSeek V4 Pro: Largest footprint (~800 GB INT4). 2-bit on the Scale cluster, or standardize on DeepSeek V4 Flash Pro for single-cluster deployments.
Integration Patterns
Standard API Interfaces
All the open-weight models above support standard API-compatible interfaces for easy integration:
- OpenAI-compatible: Drop-in replacement for OpenAI SDK with custom endpoint. Works with existing tools like LangChain, LlamaIndex.
- Claude SDK: Compatible with Anthropic's Python/Node.js SDKs. Easy migration from Claude models.
- Ollama API: Native support for Ollama-based tooling and ecosystem.
- Direct inference: PyTorch/TensorFlow serving for custom applications with maximum control.
Pre-configured Integrations
We provide ready-to-use integrations for your development workflow:
- VS Code extensions
- JetBrains IDE plugins
- Terminal tools (CLI)
- HTTP/gRPC endpoints
Cost Comparison: API vs On-Premises
10-Person Engineering Team (Monthly)
Cloud API stack
Claude Code Max (10 seats @ $200): $2,000
GPT-5.5 Codex API (complex tasks): $2,500
Per-task meter, Opus 4.8 (~800 tasks @ $1.78): $1,400
Total: ~$5,900/month
On-Premises Open Models
Faraday Scale ($29,999 US, 3×128 GB): $833/month
Hardware includes:
- 3 x Mac Studio M4 Max (384 GB unified)
- Runs GLM-5.2, MiniMax-M3, Kimi K2.7 & DeepSeek V4 Pro
- 12 months of support
- Zero per-token fees
Total: $833/month
That's ~$5,067/month you stop paying cloud vendors — for infrastructure that keeps your data private, lets you run the cost-per-task winners (GLM-5.2, DeepSeek V4 Pro) at ~$0/task, and gives you complete control over which models you run.
These Organizations Run Open-Weight Models in Production
The models Faraday Machines ships — Kimi, DeepSeek, Qwen, GLM — power real workloads at real organizations. From Fortune 500 development teams to sovereign AI infrastructure.
Choose Your Open Source Model Today
Run GLM-5.2, Kimi K2.7, MiniMax-M3, and DeepSeek V4 Pro on hardware you own. Unlimited inference, zero per-token costs, complete data sovereignty.
Schedule a Model Selection Consultation