Cohere Command A+ vs Faraday Machines

Cohere Command A+ scores 37 on the Artificial Analysis Intelligence Index. Every model Faraday Machines ships — Kimi K2.6 (54), DeepSeek V4 Flash (47), Qwen 3.6 Plus (50), GLM-5.1 (51) — outperforms it by 10 to 17 points. Here are the benchmarks.

At a Glance

Model Type Intelligence Index Context Params (Active) License Best For
Cohere Command A+ Open 37 128K in / 64K out 218B (25B active) Apache 2.0 Enterprise RAG, citations
DeepSeek V4 Flash Open 47 1M 284B (13B active) MIT Cost-efficient inference, 1M context
Qwen 3.6 Plus Open 50 1M Hybrid MoE Apache 2.0 Tool calling, MCP integration
GLM-5.1 Open 51 200K 744B (~40B active) MIT Regulated industries, multilingual
Kimi K2.6 Open 54 256K 1T (~32B active) Modified MIT Agentic workflows, complex coding

Intelligence Index scores from Artificial Analysis v4.0 (May 2026). Pink rows = Cohere Command A+. Green rows = Models available on Faraday Machines.

The Intelligence Gap

Cohere Command A+ is a capable model — especially for its 25B active-parameter footprint. Its native citation grounding, 48-language support, and efficient W4A4 quantization are genuine engineering achievements. But capability is relative, and on the Artificial Analysis Intelligence Index, the gap is stark:

+17
Kimi K2.6 — Intelligence Index 54 vs Cohere Command A+'s 37. The #1 open-weights reasoning model, 17 points ahead.
+14
GLM-5.1 — Intelligence Index 51 vs Cohere Command A+'s 37. Best cost-efficiency on the benchmark, MIT license.
+13
Qwen 3.6 Plus — Intelligence Index 50 vs Cohere Command A+'s 37. Strong tool calling, 1M context window.
+10
DeepSeek V4 Flash — Intelligence Index 47 vs Cohere Command A+'s 37. At $0.14/$0.28 per 1M tokens, the best price-to-performance ratio available.

Seventeen points on the Intelligence Index isn't a marginal difference. It's the gap between a model that can handle routine tasks and models that can reason through complex, multi-step problems — the kind your business actually needs AI for.

Software Engineering: The Benchmark That Matters Most

If you're evaluating AI for coding, developer tooling, or technical automation, SWE-bench Verified is the gold standard. It measures whether an AI agent can resolve real GitHub issues end-to-end. Cohere Command A+ has no published SWE-bench score. The models on Faraday Machines all do:

SWE-bench Verified
77–80%
Kimi K2.6, DeepSeek V4 Flash, Qwen 3.6, GLM-5.1
Terminal-Bench Hard
25%
Cohere Command A+ — vs Kimi K2.6 at 66.7% and DeepSeek V4 Pro at 67.9%
LiveCodeBench v6
69–72%
GLM-5.1 at 72.4%, Kimi K2.6 at 69.8%, Qwen 3.6 Plus at 71.3%

Cohere Command A+'s 25% on Terminal-Bench Hard (agentic coding) isn't in the same league as Kimi K2.6's 66.7% or DeepSeek V4 Pro's 93.5% on LiveCodeBench. If your team uses AI for development work, Cohere Command A+ will produce meaningfully worse results.

Hallucination and Accuracy

For business applications, a model that confidently gives wrong answers is worse than one that says "I don't know." Cohere Command A+ achieves 86% on the AA-Omniscience Non-Hallucination benchmark — decent, but behind the Faraday Machines models:

DeepSeek V4 Pro: 94% Non-Hallucination

8 points ahead of Cohere Command A+. When accuracy matters for legal, financial, or medical use cases, this gap compounds with every query.

DeepSeek V4 Flash: 96% Non-Hallucination

10 points ahead. The budget-oriented Flash model is actually more reliable than Cohere Command A+ on factual accuracy.

Cohere Command A+: 86% Non-Hallucination

14% of factual claims may be incorrect. For RAG use cases — Cohere's stated strength — this means one in seven cited facts could be wrong.

Missing SWE-bench Scores

Cohere Command A+ has no published SWE-bench Verified or LiveCodeBench scores. When a model doesn't publish on the industry-standard coding benchmark, it's reasonable to assume the results don't flatter it.

Architecture Comparison

Cohere Command A+DeepSeek V4 FlashQwen 3.6 PlusGLM-5.1Kimi K2.6
Intelligence Index3747505154
Total Parameters218B284BHybrid MoE744B~1T
Active Parameters25B13BHybrid MoE~40B~32B
Context Window128K / 64K1M1M200K256K
LicenseApache 2.0MITApache 2.0MITModified MIT
Min Hardware (quantized)2× H1002× H1002× H1008× H1004× H100
On Faraday Machines

What Cohere Command A+ Gets Right

Fair comparison means acknowledging strengths. Cohere Command A+ has genuine advantages:

Native Citation Grounding

Every factual claim links to an explicit source span — not RAG-on-top, but trained into the model. This is a real differentiator for enterprise RAG applications and regulated document workflows.

48-Language Support

Up from 23 in Command A, this is the broadest multilingual coverage among the models compared here. If you need Romanian, Vietnamese, or Bengali, Cohere Command A+ has you covered.

Efficient Inference

375 tokens/second at W4A4, 113ms time-to-first-token. With only 25B active parameters, it's fast and lean. For high-throughput, low-latency tasks where reasoning depth matters less, this is an advantage.

But these strengths are narrow. Citation grounding matters for RAG, but Faraday's models outperform on the accuracy benchmark that underpins citation quality. 48-language support is impressive, but GLM-5.1 and Qwen 3.6 both offer strong multilingual coverage with higher reasoning scores. Fast inference is valuable, but DeepSeek V4 Flash matches that efficiency with a 10-point Intelligence Index lead.

The On-Premises Reality: Choice Matters

Both Cohere Command A+ and the Faraday Machines models can run on-premises. Cohere Command A+'s Apache 2.0 license is fully permissive — no revenue caps, no use-case restrictions. That's genuinely good for the ecosystem. But on-premises is only half the equation. The other half is which model you run:

Cohere Command A+ On-Premises

  • Intelligence Index 37 — tied with Claude 4.5 Haiku
  • No SWE-bench or LiveCodeBench scores published
  • 25% on Terminal-Bench Hard (agentic coding)
  • 86% non-hallucination — behind V4 Flash (96%)
  • 128K input, 64K output context
  • Locked to a single model from a single company
  • Cohere's roadmap determines your capability
  • Canadian company, but US-hosted by default

Faraday Machines On-Premises

  • Intelligence Index 47–54 — 10 to 17 points ahead
  • SWE-bench Verified: 77–80% across all models
  • Terminal-Bench Hard: 55–67% (Kimi K2.6 leads)
  • Non-hallucination up to 96% (V4 Flash)
  • Up to 1M context (DeepSeek V4, Qwen 3.6)
  • Choose the best model for each task — swap freely
  • Your roadmap, your capability ceiling
  • Canadian data stays in Canada, on your hardware

Running a model on-premises means controlling where your data goes. But it doesn't control what your AI can do. That's determined by which model you choose. Faraday Machines gives you four frontier models — and the freedom to swap to whatever comes next.

What You're Really Giving Up

Choosing Cohere Command A+ because it's from a Canadian company is understandable. But consider what that choice costs you in capability:

17 Points of Reasoning

The gap between Cohere Command A+ (37) and Kimi K2.6 (54) on the Intelligence Index represents the difference between a model that can handle basic tasks and one that can reason through complex, multi-step business problems. That's not a marginal gap — it's a capability ceiling.

No Coding Benchmarks

Cohere Command A+ has no published SWE-bench Verified or LiveCodeBench scores. If you're evaluating AI for software development, code review, or technical automation, this absence is a red flag. The models on Faraday Machines all publish and perform well.

Vendor Lock-In

Cohere Command A+ is Cohere's only competitive open-weight model. If you build your infrastructure around it, your capability ceiling is tied to Cohere's roadmap. Faraday Machines lets you swap between Kimi K2.6, DeepSeek V4, Qwen 3.6, and GLM-5.1 as your needs evolve — or as better models become available.

Context Window Limitations

Cohere Command A+'s 128K input / 64K output context is adequate for most tasks, but both DeepSeek V4 Flash and Qwen 3.6 Plus offer 1M context windows. For document analysis, codebase reasoning, and long-form generation, the difference between processing 128K and 1M tokens is transformational.

References

[1] Artificial Analysis. (2026). "Intelligence Index Leaderboard." Available at: artificialanalysis.ai

[2] Artificial Analysis. (2026). "Cohere launches open weights model Command A+." Available at: artificialanalysis.ai

[3] VentureBeat. (2026). "Cohere cracks lossless quantization and native citations with Command A+." Available at: venturebeat.com

[4] Artificial Analysis. (2026). "DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash." Available at: artificialanalysis.ai

[5] BenchLM.ai. (2026). "Command A+ Benchmarks." Available at: benchlm.ai

[6] Particula Tech. (2026). "DeepSeek V4 vs Kimi K2.6 vs GLM-5.1: Open-Weight Coding Tested." Available at: particula.tech

[7] Lushbinary. (2026). "Best Open-Source LLMs for AI Agents: DeepSeek V4 vs Kimi K2.6 vs Qwen 3.6 vs GLM 5.1." Available at: lushbinary.com

Run the Models That Outperform Cohere Command A+

Kimi K2.6 (54), DeepSeek V4 Flash (47), Qwen 3.6 Plus (50), GLM-5.1 (51) — all on Faraday Machines hardware, on-premises, with no data leaks and no vendor lock-in.

Schedule Consultation
Already using Cohere? See how Faraday Machines adds the edge advantage →
Free security assessment and deployment guidance