xAI Grok Build 0.1: Research Brief on Release, Benchmarks, and User Sentiment

2026-06-07

Bottom line

XAI released Grok Build 0.1 - a coding-specific model powering the Grok Build CLI - on May 20, 2026, with public API access opening May 29, 2026. It's positioned as a fast, agentic coding model with native MCP support, up to 8 parallel subagents, and a local-first CLI design.

At $1 per million input tokens and $2 per million output tokens, its API pricing undercuts Claude Sonnet 4.7 by roughly 7.5× on output costs. But the most commonly cited benchmark - 70.8% on SWE-Bench Verified - belongs to the deprecated predecessor model grok-code-fast-1, not grok-build-0.1. XAI has not published an official SWE-Bench score for the new model. Independent analysts view Grok Build as an architecturally interesting but early-beta entrant with a real accuracy gap to established leaders, sparse documentation. No cheap individual tier for CLI access ($99–299/month). Partner integrations (Kilo Code, OpenRouter, Vercel) report strong hands-on results, but broader community sentiment is still forming.

Key findings

Finding: The 70.8% SWE-Bench Verified score universally cited in launch coverage belongs to the deprecated grok-code-fast-1 model, not grok-build-0.1. XAI has not published a SWE-Bench score for the new model. Source: Codersera, Lab51, Medium critique Why it matters: Every comparison showing a "~17 point gap" behind Claude Opus 4.7 and GPT-5.5 is comparing the old xAI coder against current frontier models. The true standing of Grok Build 0.1 is unverified.
Finding: Grok Build 0.1 costs $1/m input and $2/m output via API - far cheaper than Claude Sonnet 4.7 ($3/$15) and GPT-4.1 ($2/$8) - making it economically viable for high-volume agentic loops. Source: xAI official docs, ChatForest, DevOps.Com Why it matters: For teams running automated code-review pipelines or multi-step agents, the output-token cost advantage compounds quickly. A Kilo Code test built a working webhook service for $1.65 total.
Finding: The model features built-in, always-on reasoning that can't be disabled, a 256K-token context window, text+image input, and native MCP integration via "type": "mcp" in the tools array. Source: xAI official announcement, ChatForest technical guide, DevOps.Com Why it matters: The non-configurable reasoning adds latency and cost on simple calls but may improve quality on complex tasks. Native MCP lowers switching costs for teams already using Claude Code tooling.
Finding: Independent hands-on testing by Kilo Code found zero tool-calling failures across a long agentic run, ~120 tokens/second throughput. Sensible architectural choices (Standard Webhooks headers, AES-GCM encryption, SSRF guard). Review flags included a security issue (GET endpoint returning encrypted secrets), non-constant-time signature comparison, and thin integration-test coverage. Source: Kilo Code blog Why it matters: The model handles long agentic loops reliably in controlled tests, but the code still needs human review before production.
Finding: The CLI is gated behind SuperGrok Heavy ($99/month introductory, then ~$299/month), roughly 15× the cost of Claude Code Pro or Codex CLI via ChatGPT Plus. API access requires no subscription. Source: Medium critique, Codersera, Lab51 Why it matters: xAI is targeting funded teams and enterprises, not solo developers, with the bundled agent experience. The cheap API is the real on-ramp for individuals.

Background

XAI, founded by Elon Musk in March 2023 and now a SpaceX subsidiary following a February 2026 all-stock merger, has shipped Grok models at an unusually fast cadence - six flagship generations in roughly 29 months. Grok Build 0.1 is the company's first coding-specific model and CLI agent, entering a market defined by Anthropic's Claude Code (launched 2024), OpenAI's Codex CLI, and Anysphere's Cursor.

Grok Build 0.1 succeeds the deprecated grok-code-fast-1 model (retired May 15, 2026). Where the predecessor was optimized for low-latency autocomplete, Build 0.1 is designed for long-horizon agentic workflows. Planning, multi-file editing, tool use, and iterative debugging. It's available both as the engine inside xAI's Grok Build CLI and as a standalone API model.

Current state

As of early June 2026:

Model availability: Public beta API since May 29, 2026. Model ID grok-build-0.1 with aliases grok-code-fast-1 and grok-code-fast (requests to the old ID are routed to the new model).
CLI availability: Early beta launched May 14, 2026. Requires SuperGrok Heavy subscription ($99 intro / $299 list). No cheap individual tier.
Ecosystem integrations: Kilo Code, Cursor, OpenClaw, Hermes Agent, OpenCode, OpenRouter, Vercel AI Gateway. Also integrated into Notion AI as of June 2, 2026.
Regional API availability: us-east-1 and eu-west-1.
Documentation: Sparse. XAI's official docs list only context window (256K) and pricing; no model card, architecture whitepaper, or benchmark page.

Technical or implementation details

Attribute	Value
Model ID	`grok-build-0.1`
Context window	256,000 tokens
Input modalities	Text, image
Output	Text only
API pricing (input)	$1.00 / 1M tokens
API pricing (output)	$2.00 / 1M tokens
Cached input	$0.20 / 1M tokens
Throughput	100+ tokens/second (xAI claim); ~120 t/s observed in Kilo Code tests
Reasoning	Built-in, always active, non-configurable
Function calling	OpenAI-compatible format
MCP support	Native (`"type": "mcp"`); xAI servers connect on your behalf
Structured outputs	Yes (`response_format`)
Parallel agents	Up to 8 subagents in isolated Git worktrees (CLI feature)

Architecture claims: One technical guide (ChatForest) describes Grok Build 0.1 as a 314B parameter Mixture-of-Experts model. This figure has not been confirmed by xAI official documentation and may be conflated with the original open-weight Grok-1 release, which was also 314B MoE. No primary-source model card exists as of this writing.

Rate limits (Tier 0 default): 1,800 requests/minute, 10 million tokens/minute.

Evidence, comparisons, and related context

Benchmark landscape:

Model / Agent	SWE-Bench Verified	Context	Entry CLI Cost
Codex CLI (GPT-5.5)	88.7% (vendor-reported)	~1M	~$20/mo (ChatGPT Plus)
Claude Code (Opus 4.7)	87.6% (vendor-reported)	1M	$20/mo (Pro)
Grok Build (grok-code-fast-1, deprecated)	70.8% (vendor-reported)	256K	$99–299/mo (SuperGrok Heavy)
Grok Build 0.1 (current)	Not published by xAI	256K	API: pay-per-use; CLI: $99–299/mo

Independent benchmark results (benchable.Ai baseline suite):

Coding: 95.0% (90th percentile)
Reasoning: 96.0%
General Knowledge: 99.5%
Instruction Following: 60.0% (53rd percentile) - a notable weakness
Reliability: 100% success rate
Speed: 33rd percentile (moderate)

Real-world agent benchmarks (Kilo Code / PinchBench):

PinchBench v2: 88.9% average score (#7 of 50 official models)
Terminal Bench 2.0: 50.6% completion at $30.70 cost per attempt
Top tasks: 100% on log analysis, calendar creation, commit messages, Dockerfile optimization, earnings analysis

Market context: The agentic coding CLI category converged rapidly in 2025–2026. Claude Code, Codex CLI, Cursor, and Google's Antigravity all arrived at a similar blueprint: terminal surface, plan-before-execute, approval gates, MCP tool access, and delegated agents. The model itself is becoming interchangeable; the competitive difference is shifting to harness design, workflow fit, and cost structure. MCP, now stewarded by the Linux Foundation's Agentic AI Foundation, is the de facto tool-integration standard.

Limitations and critiques

Benchmark opacity: xAI has not published a SWE-Bench Verified score for grok-build-0.1. The 70.8% figure repeated across dozens of articles refers to the deprecated predecessor. Without an independent or official evaluation, claims about Grok Build's accuracy are unverified. Source gap: no independent LiveBench, Aider, or SWE-bench evaluation of grok-build-0.1 was found.
Early-beta stability: Reviewers report sparse documentation, rate limits, transparency gaps around exact context handling, and hallucinated edits under heavy load (e.G., corrupted Dockerfiles after ambiguous prompts). Source: Lab51, aicerts.Ai via Lab51 citation
High CLI barrier: No $20 individual tier. The bundled CLI agent is priced for enterprise teams, not solo developers. Source: Medium critique, Codersera
Security/compliance gaps: "Local-first" design keeps source code on the developer's machine, but xAI's compliance paperwork for Grok Build specifically is thinner than marketing suggests. SOC 2 Type 2 and Zero Data Retention exist on the Enterprise API, but the CLI beta lacks proven controls. Source: Lab51
Instruction-following weakness: benchable.Ai measured 60% accuracy on complex multi-layered instructions, placing it in the 53rd percentile.
Always-on reasoning overhead: The non-configurable reasoning adds token cost and latency even on simple calls where thinking adds no value. Source: ChatForest
Partner bias: The most detailed positive hands-on reviews come from Kilo Code, an integration partner that benefits from Grok Build adoption.

Open questions

What is Grok Build 0.1's actual, independently verified SWE-Bench Verified score?
What is the confirmed architecture (total parameters, active parameters per token, training corpus, fine-tuning methodology)? XAI has published no model card.
How does Grok Build 0.1 perform on third-party leaderboards such as LiveBench, Aider, or BigCodeBench?
What is actual user retention after the first week? Early beta enthusiasm often decays as edge cases appear.
Will Arena Mode (automated output ranking) ship broadly and close the practical accuracy gap?
How will xAI's pricing evolve after the six-month $99 introductory period, and will a cheaper individual tier appear?

Practical takeaways

For API builders: Grok Build 0.1 is worth a 10-minute experiment. Swap your OpenAI SDK base URL to https://api.x.ai/v1, change the model to grok-build-0.1, and compare quality and cost on a real task. The pricing is genuinely aggressive for output-heavy agentic loops.
For CLI users: Treat Grok Build as a pilot, not a production default. Run it alongside Claude Code or Codex CLI for two weeks on your actual codebase. Enable branch protections and sandbox experiments given the early-beta status.
For budget-conscious teams: The output-token cost advantage is real. A Kilo Code test built a full webhook service for $1.65 - cheaper than a single attempt on many frontier models. If your workflow is high-volume and scoped, the API economics are compelling.
For regulated/code-sensitive teams: The local-first architecture is a genuine differentiator, but verify compliance documentation before letting it touch production repositories.

Sources used

Grok Build 0.1 on API | xAI - https://x.ai/news/grok-build-0-1
Models | xAI Docs - https://docs.x.ai/developers/models
May 15, 2026 Model Retirement | xAI Docs - https://docs.x.ai/developers/migration/may-15-retirement
XAI Opens Grok Build 0.1 to Developers via API - DevOps.Com - https://devops.com/xai-opens-grok-build-0-1-to-developers-via-api/
Grok Build 0.1 API: MCP-Native Agentic Coding Without the X Subscription - ChatForest - https://chatforest.com/builders-log/xai-grok-build-0-1-public-api-mcp-native-reasoning-builder-guide/
XAI: Grok Build 0.1 - Benchmarks, Pricing & Performance - kilo.Ai - https://kilo.ai/models/xai-grok-build-0-1
XAI: Grok Build 0.1 - AI Model Details & Benchmarks - benchable.Ai - https://benchable.ai/models/x-ai/grok-build-0.1-20260520
We Asked Grok Build 0.1 to Plan and Build a Webhook Service - blog.Kilo.Ai - https://blog.kilo.ai/p/we-asked-grok-build-01-to-plan-and
Grok Build 0.1 Website Experiments: Round Two - blog.Kilo.Ai - https://blog.kilo.ai/p/grok-build-01-website-experiments
The Quiet Arrival of Grok Build 0.1 in a Wild Week for the xAI Empire - blog.Kilo.Ai - https://blog.kilo.ai/p/the-quiet-arrival-of-grok-build-01
Grok Build Review: What xAI's New Coding Agent Actually Does, and Where It Falls Short - Lab51 - https://lab51.io/grok-build-review-what-xais-new-coding-agent-actually-does-and-where-it-falls-short/
Grok Build: What xAI Actually Shipped (And What’s Overstated) - Medium - https://medium.com/@candemir13/grok-build-what-xai-actually-shipped-and-whats-overstated-f98961b0d301
Grok Build vs Claude Code vs Codex CLI: 2026 Benchmarks - Codersera - https://codersera.com/blog/grok-build-vs-claude-code-vs-codex-cli-2026/
Claude Code vs. Cursor vs. Codex vs. Antigravity - six months in - The New Stack - https://thenewstack.io/claude-code-vs-cursor-vs-codex-vs-antigravity-2026/
MCP in 2026: How Anthropic's Model Context Protocol Won the Agent-Tool Standard - BirJob - https://www.birjob.com/blog/mcp-protocol-2026
Grok Versions - Mungomash - https://mungomash.com/ai/grok/versions/

Research completed on 2026-06-07. All claims are source-grounded; benchmark and pricing data may shift as xAI updates documentation and independent evaluations are published.