Cohere Command A+ vs Faraday Machines

Cohere Command A+ scores 37 on the Artificial Analysis Intelligence Index. Every model Faraday Machines ships — Kimi K2.6 (54), DeepSeek V4 Flash (47), Qwen 3.6 Plus (50), GLM-5.1 (51) — outperforms it by 10 to 17 points. Here are the benchmarks.

May 29, 2026

At a Glance

Model	Type	Intelligence Index	Context	Params (Active)	License	Best For
Cohere Command A+	Open	37	128K in / 64K out	218B (25B active)	Apache 2.0	Enterprise RAG, citations
DeepSeek V4 Flash	Open	47	1M	284B (13B active)	MIT	Cost-efficient inference, 1M context
Qwen 3.6 Plus	Open	50	1M	Hybrid MoE	Apache 2.0	Tool calling, MCP integration
GLM-5.1	Open	51	200K	744B (~40B active)	MIT	Regulated industries, multilingual
Kimi K2.6	Open	54	256K	1T (~32B active)	Modified MIT	Agentic workflows, complex coding

Intelligence Index scores from Artificial Analysis v4.0 (May 2026). Pink rows = Cohere Command A+. Green rows = Models available on Faraday Machines.

The Intelligence Gap

Cohere Command A+ is a capable model — especially for its 25B active-parameter footprint. Its native citation grounding, 48-language support, and efficient W4A4 quantization are genuine engineering achievements. But capability is relative, and on the Artificial Analysis Intelligence Index, the gap is stark:

+17

Kimi K2.6 — Intelligence Index 54 vs Cohere Command A+'s 37. The #1 open-weights reasoning model, 17 points ahead.

+14

GLM-5.1 — Intelligence Index 51 vs Cohere Command A+'s 37. Best cost-efficiency on the benchmark, MIT license.

+13

Qwen 3.6 Plus — Intelligence Index 50 vs Cohere Command A+'s 37. Strong tool calling, 1M context window.

+10

DeepSeek V4 Flash — Intelligence Index 47 vs Cohere Command A+'s 37. At $0.14/$0.28 per 1M tokens, the best price-to-performance ratio available.

Seventeen points on the Intelligence Index isn't a marginal difference. It's the gap between a model that can handle routine tasks and models that can reason through complex, multi-step problems — the kind your business actually needs AI for.

Software Engineering: The Benchmark That Matters Most

If you're evaluating AI for coding, developer tooling, or technical automation, SWE-bench Verified is the gold standard. It measures whether an AI agent can resolve real GitHub issues end-to-end. Cohere Command A+ has no published SWE-bench score. The models on Faraday Machines all do:

SWE-bench Verified

77–80%

Kimi K2.6, DeepSeek V4 Flash, Qwen 3.6, GLM-5.1

Terminal-Bench Hard

25%

Cohere Command A+ — vs Kimi K2.6 at 66.7% and DeepSeek V4 Pro at 67.9%

LiveCodeBench v6

69–72%

GLM-5.1 at 72.4%, Kimi K2.6 at 69.8%, Qwen 3.6 Plus at 71.3%

Cohere Command A+'s 25% on Terminal-Bench Hard (agentic coding) isn't in the same league as Kimi K2.6's 66.7% or DeepSeek V4 Pro's 93.5% on LiveCodeBench. If your team uses AI for development work, Cohere Command A+ will produce meaningfully worse results.

Hallucination and Accuracy

For business applications, a model that confidently gives wrong answers is worse than one that says "I don't know." Cohere Command A+ achieves 86% on the AA-Omniscience Non-Hallucination benchmark — decent, but behind the Faraday Machines models:

DeepSeek V4 Pro: 94% Non-Hallucination

8 points ahead of Cohere Command A+. When accuracy matters for legal, financial, or medical use cases, this gap compounds with every query.

DeepSeek V4 Flash: 96% Non-Hallucination

10 points ahead. The budget-oriented Flash model is actually more reliable than Cohere Command A+ on factual accuracy.

Cohere Command A+: 86% Non-Hallucination

14% of factual claims may be incorrect. For RAG use cases — Cohere's stated strength — this means one in seven cited facts could be wrong.

Missing SWE-bench Scores

Cohere Command A+ has no published SWE-bench Verified or LiveCodeBench scores. When a model doesn't publish on the industry-standard coding benchmark, it's reasonable to assume the results don't flatter it.

Architecture Comparison

	Cohere Command A+	DeepSeek V4 Flash	Qwen 3.6 Plus	GLM-5.1	Kimi K2.6
Intelligence Index	37	47	50	51	54
Total Parameters	218B	284B	Hybrid MoE	744B	~1T
Active Parameters	25B	13B	Hybrid MoE	~40B	~32B
Context Window	128K / 64K	1M	1M	200K	256K
License	Apache 2.0	MIT	Apache 2.0	MIT	Modified MIT
Min Hardware (quantized)	2× H100	2× H100	2× H100	8× H100	4× H100
On Faraday Machines	—	✓	✓	✓	✓

What Cohere Command A+ Gets Right

Fair comparison means acknowledging strengths. Cohere Command A+ has genuine advantages:

Native Citation Grounding

Every factual claim links to an explicit source span — not RAG-on-top, but trained into the model. This is a real differentiator for enterprise RAG applications and regulated document workflows.

48-Language Support

Up from 23 in Command A, this is the broadest multilingual coverage among the models compared here. If you need Romanian, Vietnamese, or Bengali, Cohere Command A+ has you covered.

Efficient Inference

375 tokens/second at W4A4, 113ms time-to-first-token. With only 25B active parameters, it's fast and lean. For high-throughput, low-latency tasks where reasoning depth matters less, this is an advantage.

But these strengths are narrow. Citation grounding matters for RAG, but Faraday's models outperform on the accuracy benchmark that underpins citation quality. 48-language support is impressive, but GLM-5.1 and Qwen 3.6 both offer strong multilingual coverage with higher reasoning scores. Fast inference is valuable, but DeepSeek V4 Flash matches that efficiency with a 10-point Intelligence Index lead.

The On-Premises Reality: Choice Matters

Both Cohere Command A+ and the Faraday Machines models can run on-premises. Cohere Command A+'s Apache 2.0 license is fully permissive — no revenue caps, no use-case restrictions. That's genuinely good for the ecosystem. But on-premises is only half the equation. The other half is which model you run:

Cohere Command A+ On-Premises

Intelligence Index 37 — tied with Claude 4.5 Haiku
No SWE-bench or LiveCodeBench scores published
25% on Terminal-Bench Hard (agentic coding)
86% non-hallucination — behind V4 Flash (96%)
128K input, 64K output context
Locked to a single model from a single company
Cohere's roadmap determines your capability
Canadian company, but US-hosted by default

Faraday Machines On-Premises

Intelligence Index 47–54 — 10 to 17 points ahead
SWE-bench Verified: 77–80% across all models
Terminal-Bench Hard: 55–67% (Kimi K2.6 leads)
Non-hallucination up to 96% (V4 Flash)
Up to 1M context (DeepSeek V4, Qwen 3.6)
Choose the best model for each task — swap freely
Your roadmap, your capability ceiling
Canadian data stays in Canada, on your hardware

Running a model on-premises means controlling where your data goes. But it doesn't control what your AI can do. That's determined by which model you choose. Faraday Machines gives you four frontier models — and the freedom to swap to whatever comes next.

What You're Really Giving Up

Choosing Cohere Command A+ because it's from a Canadian company is understandable. But consider what that choice costs you in capability:

17 Points of Reasoning

The gap between Cohere Command A+ (37) and Kimi K2.6 (54) on the Intelligence Index represents the difference between a model that can handle basic tasks and one that can reason through complex, multi-step business problems. That's not a marginal gap — it's a capability ceiling.

No Coding Benchmarks

Cohere Command A+ has no published SWE-bench Verified or LiveCodeBench scores. If you're evaluating AI for software development, code review, or technical automation, this absence is a red flag. The models on Faraday Machines all publish and perform well.

Vendor Lock-In

Cohere Command A+ is Cohere's only competitive open-weight model. If you build your infrastructure around it, your capability ceiling is tied to Cohere's roadmap. Faraday Machines lets you swap between Kimi K2.6, DeepSeek V4, Qwen 3.6, and GLM-5.1 as your needs evolve — or as better models become available.

Context Window Limitations

Cohere Command A+'s 128K input / 64K output context is adequate for most tasks, but both DeepSeek V4 Flash and Qwen 3.6 Plus offer 1M context windows. For document analysis, codebase reasoning, and long-form generation, the difference between processing 128K and 1M tokens is transformational.

References

[1] Artificial Analysis. (2026). "Intelligence Index Leaderboard." Available at: artificialanalysis.ai

[2] Artificial Analysis. (2026). "Cohere launches open weights model Command A+." Available at: artificialanalysis.ai

[3] VentureBeat. (2026). "Cohere cracks lossless quantization and native citations with Command A+." Available at: venturebeat.com

[4] Artificial Analysis. (2026). "DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash." Available at: artificialanalysis.ai

[5] BenchLM.ai. (2026). "Command A+ Benchmarks." Available at: benchlm.ai

[6] Particula Tech. (2026). "DeepSeek V4 vs Kimi K2.6 vs GLM-5.1: Open-Weight Coding Tested." Available at: particula.tech

[7] Lushbinary. (2026). "Best Open-Source LLMs for AI Agents: DeepSeek V4 vs Kimi K2.6 vs Qwen 3.6 vs GLM 5.1." Available at: lushbinary.com

Run the Models That Outperform Cohere Command A+

Kimi K2.6 (54), DeepSeek V4 Flash (47), Qwen 3.6 Plus (50), GLM-5.1 (51) — all on Faraday Machines hardware, on-premises, with no data leaks and no vendor lock-in.

Schedule Consultation

Already using Cohere? See how Faraday Machines adds the edge advantage →

Free security assessment and deployment guidance