Cohere Command A+ vs Faraday Machines
Cohere Command A+ scores 37 on the Artificial Analysis Intelligence Index. Every model Faraday Machines ships — Kimi K2.6 (54), DeepSeek V4 Flash (47), Qwen 3.6 Plus (50), GLM-5.1 (51) — outperforms it by 10 to 17 points. Here are the benchmarks.
At a Glance
| Model | Type | Intelligence Index | Context | Params (Active) | License | Best For |
|---|---|---|---|---|---|---|
| Cohere Command A+ | Open | 37 | 128K in / 64K out | 218B (25B active) | Apache 2.0 | Enterprise RAG, citations |
| DeepSeek V4 Flash | Open | 47 | 1M | 284B (13B active) | MIT | Cost-efficient inference, 1M context |
| Qwen 3.6 Plus | Open | 50 | 1M | Hybrid MoE | Apache 2.0 | Tool calling, MCP integration |
| GLM-5.1 | Open | 51 | 200K | 744B (~40B active) | MIT | Regulated industries, multilingual |
| Kimi K2.6 | Open | 54 | 256K | 1T (~32B active) | Modified MIT | Agentic workflows, complex coding |
Intelligence Index scores from Artificial Analysis v4.0 (May 2026). Pink rows = Cohere Command A+. Green rows = Models available on Faraday Machines.
The Intelligence Gap
Cohere Command A+ is a capable model — especially for its 25B active-parameter footprint. Its native citation grounding, 48-language support, and efficient W4A4 quantization are genuine engineering achievements. But capability is relative, and on the Artificial Analysis Intelligence Index, the gap is stark:
Seventeen points on the Intelligence Index isn't a marginal difference. It's the gap between a model that can handle routine tasks and models that can reason through complex, multi-step problems — the kind your business actually needs AI for.
Software Engineering: The Benchmark That Matters Most
If you're evaluating AI for coding, developer tooling, or technical automation, SWE-bench Verified is the gold standard. It measures whether an AI agent can resolve real GitHub issues end-to-end. Cohere Command A+ has no published SWE-bench score. The models on Faraday Machines all do:
Cohere Command A+'s 25% on Terminal-Bench Hard (agentic coding) isn't in the same league as Kimi K2.6's 66.7% or DeepSeek V4 Pro's 93.5% on LiveCodeBench. If your team uses AI for development work, Cohere Command A+ will produce meaningfully worse results.
Hallucination and Accuracy
For business applications, a model that confidently gives wrong answers is worse than one that says "I don't know." Cohere Command A+ achieves 86% on the AA-Omniscience Non-Hallucination benchmark — decent, but behind the Faraday Machines models:
DeepSeek V4 Pro: 94% Non-Hallucination
8 points ahead of Cohere Command A+. When accuracy matters for legal, financial, or medical use cases, this gap compounds with every query.
DeepSeek V4 Flash: 96% Non-Hallucination
10 points ahead. The budget-oriented Flash model is actually more reliable than Cohere Command A+ on factual accuracy.
Cohere Command A+: 86% Non-Hallucination
14% of factual claims may be incorrect. For RAG use cases — Cohere's stated strength — this means one in seven cited facts could be wrong.
Missing SWE-bench Scores
Cohere Command A+ has no published SWE-bench Verified or LiveCodeBench scores. When a model doesn't publish on the industry-standard coding benchmark, it's reasonable to assume the results don't flatter it.
Architecture Comparison
| Cohere Command A+ | DeepSeek V4 Flash | Qwen 3.6 Plus | GLM-5.1 | Kimi K2.6 | |
|---|---|---|---|---|---|
| Intelligence Index | 37 | 47 | 50 | 51 | 54 |
| Total Parameters | 218B | 284B | Hybrid MoE | 744B | ~1T |
| Active Parameters | 25B | 13B | Hybrid MoE | ~40B | ~32B |
| Context Window | 128K / 64K | 1M | 1M | 200K | 256K |
| License | Apache 2.0 | MIT | Apache 2.0 | MIT | Modified MIT |
| Min Hardware (quantized) | 2× H100 | 2× H100 | 2× H100 | 8× H100 | 4× H100 |
| On Faraday Machines | — | ✓ | ✓ | ✓ | ✓ |
What Cohere Command A+ Gets Right
Fair comparison means acknowledging strengths. Cohere Command A+ has genuine advantages:
Native Citation Grounding
Every factual claim links to an explicit source span — not RAG-on-top, but trained into the model. This is a real differentiator for enterprise RAG applications and regulated document workflows.
48-Language Support
Up from 23 in Command A, this is the broadest multilingual coverage among the models compared here. If you need Romanian, Vietnamese, or Bengali, Cohere Command A+ has you covered.
Efficient Inference
375 tokens/second at W4A4, 113ms time-to-first-token. With only 25B active parameters, it's fast and lean. For high-throughput, low-latency tasks where reasoning depth matters less, this is an advantage.
But these strengths are narrow. Citation grounding matters for RAG, but Faraday's models outperform on the accuracy benchmark that underpins citation quality. 48-language support is impressive, but GLM-5.1 and Qwen 3.6 both offer strong multilingual coverage with higher reasoning scores. Fast inference is valuable, but DeepSeek V4 Flash matches that efficiency with a 10-point Intelligence Index lead.
The On-Premises Reality: Choice Matters
Both Cohere Command A+ and the Faraday Machines models can run on-premises. Cohere Command A+'s Apache 2.0 license is fully permissive — no revenue caps, no use-case restrictions. That's genuinely good for the ecosystem. But on-premises is only half the equation. The other half is which model you run:
Cohere Command A+ On-Premises
- Intelligence Index 37 — tied with Claude 4.5 Haiku
- No SWE-bench or LiveCodeBench scores published
- 25% on Terminal-Bench Hard (agentic coding)
- 86% non-hallucination — behind V4 Flash (96%)
- 128K input, 64K output context
- Locked to a single model from a single company
- Cohere's roadmap determines your capability
- Canadian company, but US-hosted by default
Faraday Machines On-Premises
- Intelligence Index 47–54 — 10 to 17 points ahead
- SWE-bench Verified: 77–80% across all models
- Terminal-Bench Hard: 55–67% (Kimi K2.6 leads)
- Non-hallucination up to 96% (V4 Flash)
- Up to 1M context (DeepSeek V4, Qwen 3.6)
- Choose the best model for each task — swap freely
- Your roadmap, your capability ceiling
- Canadian data stays in Canada, on your hardware
Running a model on-premises means controlling where your data goes. But it doesn't control what your AI can do. That's determined by which model you choose. Faraday Machines gives you four frontier models — and the freedom to swap to whatever comes next.
What You're Really Giving Up
Choosing Cohere Command A+ because it's from a Canadian company is understandable. But consider what that choice costs you in capability:
17 Points of Reasoning
The gap between Cohere Command A+ (37) and Kimi K2.6 (54) on the Intelligence Index represents the difference between a model that can handle basic tasks and one that can reason through complex, multi-step business problems. That's not a marginal gap — it's a capability ceiling.
No Coding Benchmarks
Cohere Command A+ has no published SWE-bench Verified or LiveCodeBench scores. If you're evaluating AI for software development, code review, or technical automation, this absence is a red flag. The models on Faraday Machines all publish and perform well.
Vendor Lock-In
Cohere Command A+ is Cohere's only competitive open-weight model. If you build your infrastructure around it, your capability ceiling is tied to Cohere's roadmap. Faraday Machines lets you swap between Kimi K2.6, DeepSeek V4, Qwen 3.6, and GLM-5.1 as your needs evolve — or as better models become available.
Context Window Limitations
Cohere Command A+'s 128K input / 64K output context is adequate for most tasks, but both DeepSeek V4 Flash and Qwen 3.6 Plus offer 1M context windows. For document analysis, codebase reasoning, and long-form generation, the difference between processing 128K and 1M tokens is transformational.
References
[1] Artificial Analysis. (2026). "Intelligence Index Leaderboard." Available at: artificialanalysis.ai
[2] Artificial Analysis. (2026). "Cohere launches open weights model Command A+." Available at: artificialanalysis.ai
[3] VentureBeat. (2026). "Cohere cracks lossless quantization and native citations with Command A+." Available at: venturebeat.com
[4] Artificial Analysis. (2026). "DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash." Available at: artificialanalysis.ai
[5] BenchLM.ai. (2026). "Command A+ Benchmarks." Available at: benchlm.ai
[6] Particula Tech. (2026). "DeepSeek V4 vs Kimi K2.6 vs GLM-5.1: Open-Weight Coding Tested." Available at: particula.tech
[7] Lushbinary. (2026). "Best Open-Source LLMs for AI Agents: DeepSeek V4 vs Kimi K2.6 vs Qwen 3.6 vs GLM 5.1." Available at: lushbinary.com
Run the Models That Outperform Cohere Command A+
Kimi K2.6 (54), DeepSeek V4 Flash (47), Qwen 3.6 Plus (50), GLM-5.1 (51) — all on Faraday Machines hardware, on-premises, with no data leaks and no vendor lock-in.
Schedule Consultation