MiniMax M3 Research Brief

2026-06-03

Bottom line

MiniMax M3, released June 1, 2026, is the most ambitious launch yet from the Shanghai-based lab. A 229.9B-parameter sparse Mixture-of-Experts model claiming to be the first open-weight system to combine frontier coding performance, a 1-million-token context window, and native multimodality in one architecture. Its headline technical innovation is MiniMax Sparse Attention (MSA), which the company says cuts per-token compute at 1M context to roughly 1/20 of its predecessor while delivering 9.7× faster prefill and 15.6× faster decode.

Vendor-reported benchmarks place it ahead of GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro (59.0%) and above Claude Opus 4.7 on BrowseComp (83.5). But nearly all benchmark figures are company-run with agent scaffolding, independent verification is pending, and promised open weights had not shipped as of launch day. The pricing is aggressively low-roughly $0.30 per million input tokens during a launch promotion-making it 15× cheaper than Claude Opus 4.7 on input. Data-sovereignty concerns under China’s National Intelligence Law and unresolved licensing terms add material caveats for production use.

Key findings

Finding: M3 is architecturally distinct from its M2-series predecessors. It reintroduces sparse attention after MiniMax abandoned the approach for the M2 generation, citing production-readiness concerns. MSA uses Grouped Query Attention (GQA) with dynamic block-level selection over uncompressed key-value caches-a design MiniMax claims is ~4× faster than open-source sparse-attention alternatives. Source. RITS Shanghai analysis, Lushbinary developer guide, MiniMax official blog..
Finding: Vendor-reported benchmark scores cluster around coding and agentic tasks rather than general reasoning sweep. M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas, and 83.5 on BrowseComp. Notably, it trails the more recently released Claude Opus 4.8 (69.2% on SWE-Bench Pro, 74.6% on Terminal-Bench 2.1) because MiniMax’s comparisons use the earlier Opus 4.7 baseline. Source. TechTimes critique, The Decoder, MiniMax official release..
Finding: MiniMax accompanied the model launch with dramatic long-horizon autonomy demonstrations. A 12-hour autonomous reproduction of an ICLR 2025 paper (18 commits, 23 figures) and a 24-hour CUDA kernel optimization session that raised NVIDIA Hopper FP8 utilization from 7.6% to 71.3% across 147 benchmark submissions. These are vendor-reported demonstrations, not controlled independent evaluations. Source: RITS Shanghai analysis, The Decoder, Knightli.
Finding: One independent hands-on review tested M3 on five real app builds via OpenCode. Results were 3/5 to 5/5 depending on task boundedness: a dungeon-crawler game scored perfectly, while a complex multi-layer SaaS authentication service scored 3/5 due to missing data connections. The reviewer concluded M3 excels at scoped, well-specified builds but needs iteration on complex multi-component architectures. Source: Promptslove review.
Finding: Benchable.Ai’s independent baseline evaluation shows M3 has perfect reliability (100% success rate) and strong coding/math accuracy (95%). Ranks in only the 24th percentile for speed and 36th percentile for pricing cost across models, indicating it isn't the fastest or cheapest in absolute terms. The evaluation also lists the context window as 524K tokens, contradicting the official 1M claim. Source: benchable.Ai.

Background

MiniMax is a Shanghai-based AI lab founded in 2021 and listed on the Hong Kong Stock Exchange on January 9, 2026. It's best known internationally for the Hailuo video-generation family and the M-series language models (M1 in mid-2025, then M2, M2.1, M2.5, and M2.7). The M3 release on June 1, 2026, was teased in late May alongside an M2-series technical report published on May 27, 2026. Skyler Miao, MiniMax’s Head of Engineering, publicly previewed the architecture before launch. The release was covered by VentureBeat and ChinaPulse, but several independent analysts noted that Western press largely missed the architecture story during the May 26–28 window.

Current state

M3 is live via the MiniMax API (OpenAI-compatible endpoint), OpenRouter, and MiniMax Code (the company’s agent product). It supports standard and priority service tiers, with a thinking mode toggle. Subscription token plans run $20/month (Plus, ~1.7B tokens), $50/month (Max, ~5.1B tokens), and $120/month (Ultra, ~9.8B tokens). Open weights and a full technical report were promised on Hugging Face and GitHub within about ten days of launch-targeting around June 11, 2026-but had not been published as of the June 1 launch date. The license terms were also unpublished; prior models used a modified-MIT license with commercial restrictions that required written authorization for derivative use.

Technical or implementation details

Architecture: Sparse Mixture-of-Experts with 229.9 billion total parameters, 9.8 billion active parameters per token, and 256 fine-grained experts.
Attention: MiniMax Sparse Attention (MSA) replaces full attention with a two-stage process. A lightweight index branch selects relevant KV-cache blocks, and the main attention layer computes only over those selected blocks. MSA operates on uncompressed key-values (unlike DeepSeek’s Multi-head Latent Attention) and reorganizes the GPU computation pattern into a “KV outer, gather Q” pass that MiniMax says achieves contiguous memory access and is ~4× faster than open-source sparse-attention implementations.
Context: Up to 1 million tokens (guaranteed minimum 512K); max output 512K tokens.
Modalities: Native multimodal training “from step zero” on interleaved text, image, and video data at 100-trillion-token scale; supports desktop computer use.
Speed claims: At 1M context, ~9.7× faster prefill, ~15.6× faster decode, and ~1/20 the per-token compute versus the prior generation. Output speed is about 100 tokens/sec.
Long-horizon training: Built using an interactive user-simulator framework that mimics real developer collaboration patterns (clarifying requirements, task-switching, feedback loops) rather than single-turn instruction following.

Evidence, comparisons, and related context

Predecessor (M2.7): Released March 18, 2026, M2.7 was a text-only model with 200K context, full attention, and SWE-Bench Pro 56.2%. M3 expands context 5×, adds multimodality, and lifts SWE-Bench Pro to 59.0% while keeping promotional pricing identical ($0.30/$1.20 per million tokens). Source: Lushbinary M3 vs M2.7 guide.
Competitor landscape: M3’s closest competitors are Chinese open-weight labs-DeepSeek (V4 series) and Alibaba’s Qwen (3.6/3.7). DeepSeek V4-Pro is priced even lower ($0.87/M output with promo) and scores strongly on reasoning benchmarks. Qwen has dominated global open-source downloads (>1 billion cumulative by March 2026). M3’s claimed differentiator is the unique combination of frontier coding + 1M context + native multimodality in one open-weight model. Source: Forbes analysis, FelloAI.
Sparse attention prior work: MiniMax explicitly abandoned sparse attention for the entire M2 generation (M2 through M2.7), publishing an engineering blog stating that “efficient attention still has some way to go before it can definitively beat full attention.” M3 represents a public self-correction, betting that the infrastructure has matured. Competing sparse approaches include DeepSeek’s NSA and general block-sparse methods like MoBA, though direct comparisons aren't yet available. Source: FelloAI, Humphrey Theodore analysis.

Limitations and critiques

Unverified benchmarks: Every major benchmark figure was produced by MiniMax on its own infrastructure, often using agent scaffolding (Claude Code, Mini-SWE-Agent, Terminus). Independent scores from Artificial Analysis and LMArena were pending at launch. Source: TechTimes, The Decoder.
Comparison framing: MiniMax compared M3 against Claude Opus 4.7, but Anthropic released Opus 4.8 on May 25, 2026-one week before M3. On directly comparable evaluations, Opus 4.8 leads M3 by meaningful margins (SWE-Bench Pro 69.2% vs. 59.0%; Terminal-Bench 2.1 74.6% vs. 66.0%; OSWorld-Verified 83.4% vs. 70.0%). Source: TechTimes.
Open-weight promise unfulfilled at launch: Weights and technical report weren't available on release day. If they follow the M2.7 precedent, commercial use may require written authorization, limiting true open-source utility. Source: TechTimes, FelloAI.
Data sovereignty and legal risk: As a Shanghai-headquartered company, MiniMax is subject to China’s 2017 National Intelligence Law, which requires cooperation with state intelligence requests. The U.S. House Committees on Homeland Security and China announced a joint investigation on April 29, 2026, into risks from Chinese AI models naming MiniMax alongside DeepSeek and Moonshot AI. In February 2026, Anthropic publicly alleged that MiniMax conducted industrial-scale distillation against Claude. Separately, MiniMax faces an active copyright lawsuit from Disney, Universal, and Warner Bros over Hailuo AI training data. Source: TechTimes.
Real-world speed and context discrepancies: Benchable.Ai lists the context window as 524K tokens (not 1M) and ranks M3 in the 24th percentile for speed, suggesting practical performance may differ from marketing claims. Source: benchable.Ai.

Open questions

Will the promised open weights and technical report ship by mid-June 2026, and under what license terms?
Will independent benchmark services (Artificial Analysis, LMArena, DeepSWE) confirm MiniMax’s vendor-reported scores?
How does MSA perform in production at full 1M-token context compared to DeepSeek’s NSA and other sparse-attention methods?
What is the real-world latency and throughput for sustained agentic workloads, given benchable.Ai’s low speed percentile?
How will U.S. And EU regulatory scrutiny affect enterprise adoption of MiniMax’s API and open weights?

Practical takeaways

Test before committing: M3 is worth piloting for long-context coding agents, autonomous browsing, and multimodal document analysis-especially given the low API cost-but wait for independent benchmark verification and your own evals before betting production workloads on it.
Budget against standard pricing: The launch promo ($0.30/$1.20) matches M2.7’s old pricing, but the standard rate ($0.60/$2.40) is 2× higher. Model costs against the non-promotional rate for long-term planning.
Data-sovereignty check: don't route proprietary source code, customer data, or regulated information through the MiniMax API without evaluating jurisdiction and compliance risk. If the open weights ship under usable terms, self-hosting may mitigate this concern.
Prompt specificity matters: Hands-on testing shows M3 succeeds on bounded, interaction-heavy tasks (games, focused tools) but struggles on first-pass complex multi-layer SaaS builds. Write detailed prompts specifying data flow and component wiring.

Sources used

Models - MiniMax API Docs - https://platform.minimax.io/docs/release-notes/models
MiniMax M3: Frontier Coding, 1M Context, and Sparse Attention (RITS Shanghai) - https://rits.shanghai.nyu.edu/ai/minimax-m3-frontier-coding-1m-context-and-sparse-attention/
MiniMax M3 Specs, Benchmarks, and Pricing (FelloAI) - https://felloai.com/minimax-m3/
MiniMax M3 Developer Guide: Benchmarks & Pricing (Lushbinary) - https://lushbinary.com/blog/minimax-m3-developer-guide-benchmarks-pricing-msa-architecture/
MiniMax M3 Previews a New Sparse-Attention Architecture (Humphrey Theodore) - https://www.humphreytheodore.com/writing/minimax-m3-sparse-attention-china-frontier-architecture
MiniMax: MiniMax M3 - AI Model Details & Benchmarks (benchable.Ai) - https://benchable.ai/models/minimax/minimax-m3-20260531
GitHub - 47thtechcorner/RayCodes_Minimax-M3 - https://github.com/47thtechcorner/RayCodes_Minimax-M3
MiniMax M3 Claims 15x Faster Decoding at 1M Tokens (AI Weekly) - https://aiweekly.co/alerts/minimax-m3-claims-15x-faster-decoding-at-1m-tokens
MiniMax M3 Open-Weight Coding Model: Frontier Claims, Unverified Benchmarks (TechTimes) - https://www.techtimes.com/articles/317532/20260601/minimax-m3-open-weight-coding-model-frontier-claims-unverified-benchmarks.htm
MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders (The Decoder) - https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/
MiniMax M3 Review: I Built 5 Apps in One Go and Here Is What Happened (Promptslove) - https://promptslove.com/blog/minimax-m3-review/
MiniMax M3 Released: Coding Agents, 1M Context, and Native Multimodality (Knightli) - https://knightli.com/en/2026/06/01/minimax-m3-coding-agent-1m-context/
MiniMax M3 vs M2.7: What Changed & Upgrade Guide (Lushbinary) - https://lushbinary.com/blog/minimax-m3-vs-m2-7-whats-new-upgrade-guide/
China's DeepSeek V4 And Qwen Reshape The Open-Source AI Race (Forbes) - https://www.forbes.com/sites/jonmarkman/2026/04/28/chinas-deepseek-v4-and-qwen-reshape-the-open-source-ai-race/