xAI Grok Build 0.1: Research Brief on Release, Benchmarks, and User Sentiment
Bottom line
XAI released Grok Build 0.1 - a coding-specific model powering the Grok Build CLI - on May 20, 2026, with public API access opening May 29, 2026. It's positioned as a fast, agentic coding model with native MCP support, up to 8 parallel subagents, and a local-first CLI design.
At $1 per million input tokens and $2 per million output tokens, its API pricing undercuts Claude Sonnet 4.7 by roughly 7.5× on output costs. But the most commonly cited benchmark - 70.8% on SWE-Bench Verified - belongs to the deprecated predecessor model grok-code-fast-1, not grok-build-0.1. XAI has not published an official SWE-Bench score for the new model. Independent analysts view Grok Build as an architecturally interesting but early-beta entrant with a real accuracy gap to established leaders, sparse documentation. No cheap individual tier for CLI access ($99–299/month). Partner integrations (Kilo Code, OpenRouter, Vercel) report strong hands-on results, but broader community sentiment is still forming.
Key findings
-
Finding: The 70.8% SWE-Bench Verified score universally cited in launch coverage belongs to the deprecated
grok-code-fast-1model, notgrok-build-0.1. XAI has not published a SWE-Bench score for the new model. Source: Codersera, Lab51, Medium critique Why it matters: Every comparison showing a "~17 point gap" behind Claude Opus 4.7 and GPT-5.5 is comparing the old xAI coder against current frontier models. The true standing of Grok Build 0.1 is unverified. -
Finding: Grok Build 0.1 costs $1/m input and $2/m output via API - far cheaper than Claude Sonnet 4.7 ($3/$15) and GPT-4.1 ($2/$8) - making it economically viable for high-volume agentic loops. Source: xAI official docs, ChatForest, DevOps.Com Why it matters: For teams running automated code-review pipelines or multi-step agents, the output-token cost advantage compounds quickly. A Kilo Code test built a working webhook service for $1.65 total.
-
Finding: The model features built-in, always-on reasoning that can't be disabled, a 256K-token context window, text+image input, and native MCP integration via
"type": "mcp"in the tools array. Source: xAI official announcement, ChatForest technical guide, DevOps.Com Why it matters: The non-configurable reasoning adds latency and cost on simple calls but may improve quality on complex tasks. Native MCP lowers switching costs for teams already using Claude Code tooling. -
Finding: Independent hands-on testing by Kilo Code found zero tool-calling failures across a long agentic run, ~120 tokens/second throughput. Sensible architectural choices (Standard Webhooks headers, AES-GCM encryption, SSRF guard). Review flags included a security issue (GET endpoint returning encrypted secrets), non-constant-time signature comparison, and thin integration-test coverage. Source: Kilo Code blog Why it matters: The model handles long agentic loops reliably in controlled tests, but the code still needs human review before production.
-
Finding: The CLI is gated behind SuperGrok Heavy ($99/month introductory, then ~$299/month), roughly 15× the cost of Claude Code Pro or Codex CLI via ChatGPT Plus. API access requires no subscription. Source: Medium critique, Codersera, Lab51 Why it matters: xAI is targeting funded teams and enterprises, not solo developers, with the bundled agent experience. The cheap API is the real on-ramp for individuals.
Background
XAI, founded by Elon Musk in March 2023 and now a SpaceX subsidiary following a February 2026 all-stock merger, has shipped Grok models at an unusually fast cadence - six flagship generations in roughly 29 months. Grok Build 0.1 is the company's first coding-specific model and CLI agent, entering a market defined by Anthropic's Claude Code (launched 2024), OpenAI's Codex CLI, and Anysphere's Cursor.
Grok Build 0.1 succeeds the deprecated grok-code-fast-1 model (retired May 15, 2026). Where the predecessor was optimized for low-latency autocomplete, Build 0.1 is designed for long-horizon agentic workflows. Planning, multi-file editing, tool use, and iterative debugging. It's available both as the engine inside xAI's Grok Build CLI and as a standalone API model.
Current state
As of early June 2026:
- Model availability: Public beta API since May 29, 2026. Model ID
grok-build-0.1with aliasesgrok-code-fast-1andgrok-code-fast(requests to the old ID are routed to the new model). - CLI availability: Early beta launched May 14, 2026. Requires SuperGrok Heavy subscription ($99 intro / $299 list). No cheap individual tier.
- Ecosystem integrations: Kilo Code, Cursor, OpenClaw, Hermes Agent, OpenCode, OpenRouter, Vercel AI Gateway. Also integrated into Notion AI as of June 2, 2026.
- Regional API availability: us-east-1 and eu-west-1.
- Documentation: Sparse. XAI's official docs list only context window (256K) and pricing; no model card, architecture whitepaper, or benchmark page.
Technical or implementation details
| Attribute | Value |
|---|---|
| Model ID | grok-build-0.1 |
| Context window | 256,000 tokens |
| Input modalities | Text, image |
| Output | Text only |
| API pricing (input) | $1.00 / 1M tokens |
| API pricing (output) | $2.00 / 1M tokens |
| Cached input | $0.20 / 1M tokens |
| Throughput | 100+ tokens/second (xAI claim); ~120 t/s observed in Kilo Code tests |
| Reasoning | Built-in, always active, non-configurable |
| Function calling | OpenAI-compatible format |
| MCP support | Native ("type": "mcp"); xAI servers connect on your behalf |
| Structured outputs | Yes (response_format) |
| Parallel agents | Up to 8 subagents in isolated Git worktrees (CLI feature) |
Architecture claims: One technical guide (ChatForest) describes Grok Build 0.1 as a 314B parameter Mixture-of-Experts model. This figure has not been confirmed by xAI official documentation and may be conflated with the original open-weight Grok-1 release, which was also 314B MoE. No primary-source model card exists as of this writing.
Rate limits (Tier 0 default): 1,800 requests/minute, 10 million tokens/minute.
Evidence, comparisons, and related context
Benchmark landscape:
| Model / Agent | SWE-Bench Verified | Context | Entry CLI Cost |
|---|---|---|---|
| Codex CLI (GPT-5.5) | 88.7% (vendor-reported) | ~1M | ~$20/mo (ChatGPT Plus) |
| Claude Code (Opus 4.7) | 87.6% (vendor-reported) | 1M | $20/mo (Pro) |
| Grok Build (grok-code-fast-1, deprecated) | 70.8% (vendor-reported) | 256K | $99–299/mo (SuperGrok Heavy) |
| Grok Build 0.1 (current) | Not published by xAI | 256K | API: pay-per-use; CLI: $99–299/mo |
Independent benchmark results (benchable.Ai baseline suite):
- Coding: 95.0% (90th percentile)
- Reasoning: 96.0%
- General Knowledge: 99.5%
- Instruction Following: 60.0% (53rd percentile) - a notable weakness
- Reliability: 100% success rate
- Speed: 33rd percentile (moderate)
Real-world agent benchmarks (Kilo Code / PinchBench):
- PinchBench v2: 88.9% average score (#7 of 50 official models)
- Terminal Bench 2.0: 50.6% completion at $30.70 cost per attempt
- Top tasks: 100% on log analysis, calendar creation, commit messages, Dockerfile optimization, earnings analysis
Market context: The agentic coding CLI category converged rapidly in 2025–2026. Claude Code, Codex CLI, Cursor, and Google's Antigravity all arrived at a similar blueprint: terminal surface, plan-before-execute, approval gates, MCP tool access, and delegated agents. The model itself is becoming interchangeable; the competitive difference is shifting to harness design, workflow fit, and cost structure. MCP, now stewarded by the Linux Foundation's Agentic AI Foundation, is the de facto tool-integration standard.
Limitations and critiques
- Benchmark opacity: xAI has not published a SWE-Bench Verified score for
grok-build-0.1. The 70.8% figure repeated across dozens of articles refers to the deprecated predecessor. Without an independent or official evaluation, claims about Grok Build's accuracy are unverified. Source gap: no independent LiveBench, Aider, or SWE-bench evaluation ofgrok-build-0.1was found. - Early-beta stability: Reviewers report sparse documentation, rate limits, transparency gaps around exact context handling, and hallucinated edits under heavy load (e.G., corrupted Dockerfiles after ambiguous prompts). Source: Lab51, aicerts.Ai via Lab51 citation
- High CLI barrier: No $20 individual tier. The bundled CLI agent is priced for enterprise teams, not solo developers. Source: Medium critique, Codersera
- Security/compliance gaps: "Local-first" design keeps source code on the developer's machine, but xAI's compliance paperwork for Grok Build specifically is thinner than marketing suggests. SOC 2 Type 2 and Zero Data Retention exist on the Enterprise API, but the CLI beta lacks proven controls. Source: Lab51
- Instruction-following weakness: benchable.Ai measured 60% accuracy on complex multi-layered instructions, placing it in the 53rd percentile.
- Always-on reasoning overhead: The non-configurable reasoning adds token cost and latency even on simple calls where thinking adds no value. Source: ChatForest
- Partner bias: The most detailed positive hands-on reviews come from Kilo Code, an integration partner that benefits from Grok Build adoption.
Open questions
- What is Grok Build 0.1's actual, independently verified SWE-Bench Verified score?
- What is the confirmed architecture (total parameters, active parameters per token, training corpus, fine-tuning methodology)? XAI has published no model card.
- How does Grok Build 0.1 perform on third-party leaderboards such as LiveBench, Aider, or BigCodeBench?
- What is actual user retention after the first week? Early beta enthusiasm often decays as edge cases appear.
- Will Arena Mode (automated output ranking) ship broadly and close the practical accuracy gap?
- How will xAI's pricing evolve after the six-month $99 introductory period, and will a cheaper individual tier appear?
Practical takeaways
- For API builders: Grok Build 0.1 is worth a 10-minute experiment. Swap your OpenAI SDK base URL to
https://api.x.ai/v1, change the model togrok-build-0.1, and compare quality and cost on a real task. The pricing is genuinely aggressive for output-heavy agentic loops. - For CLI users: Treat Grok Build as a pilot, not a production default. Run it alongside Claude Code or Codex CLI for two weeks on your actual codebase. Enable branch protections and sandbox experiments given the early-beta status.
- For budget-conscious teams: The output-token cost advantage is real. A Kilo Code test built a full webhook service for $1.65 - cheaper than a single attempt on many frontier models. If your workflow is high-volume and scoped, the API economics are compelling.
- For regulated/code-sensitive teams: The local-first architecture is a genuine differentiator, but verify compliance documentation before letting it touch production repositories.
Sources used
- Grok Build 0.1 on API | xAI - https://x.ai/news/grok-build-0-1
- Models | xAI Docs - https://docs.x.ai/developers/models
- May 15, 2026 Model Retirement | xAI Docs - https://docs.x.ai/developers/migration/may-15-retirement
- XAI Opens Grok Build 0.1 to Developers via API - DevOps.Com - https://devops.com/xai-opens-grok-build-0-1-to-developers-via-api/
- Grok Build 0.1 API: MCP-Native Agentic Coding Without the X Subscription - ChatForest - https://chatforest.com/builders-log/xai-grok-build-0-1-public-api-mcp-native-reasoning-builder-guide/
- XAI: Grok Build 0.1 - Benchmarks, Pricing & Performance - kilo.Ai - https://kilo.ai/models/xai-grok-build-0-1
- XAI: Grok Build 0.1 - AI Model Details & Benchmarks - benchable.Ai - https://benchable.ai/models/x-ai/grok-build-0.1-20260520
- We Asked Grok Build 0.1 to Plan and Build a Webhook Service - blog.Kilo.Ai - https://blog.kilo.ai/p/we-asked-grok-build-01-to-plan-and
- Grok Build 0.1 Website Experiments: Round Two - blog.Kilo.Ai - https://blog.kilo.ai/p/grok-build-01-website-experiments
- The Quiet Arrival of Grok Build 0.1 in a Wild Week for the xAI Empire - blog.Kilo.Ai - https://blog.kilo.ai/p/the-quiet-arrival-of-grok-build-01
- Grok Build Review: What xAI's New Coding Agent Actually Does, and Where It Falls Short - Lab51 - https://lab51.io/grok-build-review-what-xais-new-coding-agent-actually-does-and-where-it-falls-short/
- Grok Build: What xAI Actually Shipped (And What’s Overstated) - Medium - https://medium.com/@candemir13/grok-build-what-xai-actually-shipped-and-whats-overstated-f98961b0d301
- Grok Build vs Claude Code vs Codex CLI: 2026 Benchmarks - Codersera - https://codersera.com/blog/grok-build-vs-claude-code-vs-codex-cli-2026/
- Claude Code vs. Cursor vs. Codex vs. Antigravity - six months in - The New Stack - https://thenewstack.io/claude-code-vs-cursor-vs-codex-vs-antigravity-2026/
- MCP in 2026: How Anthropic's Model Context Protocol Won the Agent-Tool Standard - BirJob - https://www.birjob.com/blog/mcp-protocol-2026
- Grok Versions - Mungomash - https://mungomash.com/ai/grok/versions/
Research completed on 2026-06-07. All claims are source-grounded; benchmark and pricing data may shift as xAI updates documentation and independent evaluations are published.