xAI Grok Build 0.1: Research Brief on Release, Benchmarks, and User Sentiment

Bottom line

XAI released Grok Build 0.1 - a coding-specific model powering the Grok Build CLI - on May 20, 2026, with public API access opening May 29, 2026. It's positioned as a fast, agentic coding model with native MCP support, up to 8 parallel subagents, and a local-first CLI design.

At $1 per million input tokens and $2 per million output tokens, its API pricing undercuts Claude Sonnet 4.7 by roughly 7.5× on output costs. But the most commonly cited benchmark - 70.8% on SWE-Bench Verified - belongs to the deprecated predecessor model grok-code-fast-1, not grok-build-0.1. XAI has not published an official SWE-Bench score for the new model. Independent analysts view Grok Build as an architecturally interesting but early-beta entrant with a real accuracy gap to established leaders, sparse documentation. No cheap individual tier for CLI access ($99–299/month). Partner integrations (Kilo Code, OpenRouter, Vercel) report strong hands-on results, but broader community sentiment is still forming.

Key findings

  • Finding: The 70.8% SWE-Bench Verified score universally cited in launch coverage belongs to the deprecated grok-code-fast-1 model, not grok-build-0.1. XAI has not published a SWE-Bench score for the new model. Source: Codersera, Lab51, Medium critique Why it matters: Every comparison showing a "~17 point gap" behind Claude Opus 4.7 and GPT-5.5 is comparing the old xAI coder against current frontier models. The true standing of Grok Build 0.1 is unverified.

  • Finding: Grok Build 0.1 costs $1/m input and $2/m output via API - far cheaper than Claude Sonnet 4.7 ($3/$15) and GPT-4.1 ($2/$8) - making it economically viable for high-volume agentic loops. Source: xAI official docs, ChatForest, DevOps.Com Why it matters: For teams running automated code-review pipelines or multi-step agents, the output-token cost advantage compounds quickly. A Kilo Code test built a working webhook service for $1.65 total.

  • Finding: The model features built-in, always-on reasoning that can't be disabled, a 256K-token context window, text+image input, and native MCP integration via "type": "mcp" in the tools array. Source: xAI official announcement, ChatForest technical guide, DevOps.Com Why it matters: The non-configurable reasoning adds latency and cost on simple calls but may improve quality on complex tasks. Native MCP lowers switching costs for teams already using Claude Code tooling.

  • Finding: Independent hands-on testing by Kilo Code found zero tool-calling failures across a long agentic run, ~120 tokens/second throughput. Sensible architectural choices (Standard Webhooks headers, AES-GCM encryption, SSRF guard). Review flags included a security issue (GET endpoint returning encrypted secrets), non-constant-time signature comparison, and thin integration-test coverage. Source: Kilo Code blog Why it matters: The model handles long agentic loops reliably in controlled tests, but the code still needs human review before production.

  • Finding: The CLI is gated behind SuperGrok Heavy ($99/month introductory, then ~$299/month), roughly 15× the cost of Claude Code Pro or Codex CLI via ChatGPT Plus. API access requires no subscription. Source: Medium critique, Codersera, Lab51 Why it matters: xAI is targeting funded teams and enterprises, not solo developers, with the bundled agent experience. The cheap API is the real on-ramp for individuals.

Background

XAI, founded by Elon Musk in March 2023 and now a SpaceX subsidiary following a February 2026 all-stock merger, has shipped Grok models at an unusually fast cadence - six flagship generations in roughly 29 months. Grok Build 0.1 is the company's first coding-specific model and CLI agent, entering a market defined by Anthropic's Claude Code (launched 2024), OpenAI's Codex CLI, and Anysphere's Cursor.

Grok Build 0.1 succeeds the deprecated grok-code-fast-1 model (retired May 15, 2026). Where the predecessor was optimized for low-latency autocomplete, Build 0.1 is designed for long-horizon agentic workflows. Planning, multi-file editing, tool use, and iterative debugging. It's available both as the engine inside xAI's Grok Build CLI and as a standalone API model.

Current state

As of early June 2026:

  • Model availability: Public beta API since May 29, 2026. Model ID grok-build-0.1 with aliases grok-code-fast-1 and grok-code-fast (requests to the old ID are routed to the new model).
  • CLI availability: Early beta launched May 14, 2026. Requires SuperGrok Heavy subscription ($99 intro / $299 list). No cheap individual tier.
  • Ecosystem integrations: Kilo Code, Cursor, OpenClaw, Hermes Agent, OpenCode, OpenRouter, Vercel AI Gateway. Also integrated into Notion AI as of June 2, 2026.
  • Regional API availability: us-east-1 and eu-west-1.
  • Documentation: Sparse. XAI's official docs list only context window (256K) and pricing; no model card, architecture whitepaper, or benchmark page.

Technical or implementation details

Attribute Value
Model ID grok-build-0.1
Context window 256,000 tokens
Input modalities Text, image
Output Text only
API pricing (input) $1.00 / 1M tokens
API pricing (output) $2.00 / 1M tokens
Cached input $0.20 / 1M tokens
Throughput 100+ tokens/second (xAI claim); ~120 t/s observed in Kilo Code tests
Reasoning Built-in, always active, non-configurable
Function calling OpenAI-compatible format
MCP support Native ("type": "mcp"); xAI servers connect on your behalf
Structured outputs Yes (response_format)
Parallel agents Up to 8 subagents in isolated Git worktrees (CLI feature)

Architecture claims: One technical guide (ChatForest) describes Grok Build 0.1 as a 314B parameter Mixture-of-Experts model. This figure has not been confirmed by xAI official documentation and may be conflated with the original open-weight Grok-1 release, which was also 314B MoE. No primary-source model card exists as of this writing.

Rate limits (Tier 0 default): 1,800 requests/minute, 10 million tokens/minute.

Evidence, comparisons, and related context

Benchmark landscape:

Model / Agent SWE-Bench Verified Context Entry CLI Cost
Codex CLI (GPT-5.5) 88.7% (vendor-reported) ~1M ~$20/mo (ChatGPT Plus)
Claude Code (Opus 4.7) 87.6% (vendor-reported) 1M $20/mo (Pro)
Grok Build (grok-code-fast-1, deprecated) 70.8% (vendor-reported) 256K $99–299/mo (SuperGrok Heavy)
Grok Build 0.1 (current) Not published by xAI 256K API: pay-per-use; CLI: $99–299/mo

Independent benchmark results (benchable.Ai baseline suite):

  • Coding: 95.0% (90th percentile)
  • Reasoning: 96.0%
  • General Knowledge: 99.5%
  • Instruction Following: 60.0% (53rd percentile) - a notable weakness
  • Reliability: 100% success rate
  • Speed: 33rd percentile (moderate)

Real-world agent benchmarks (Kilo Code / PinchBench):

  • PinchBench v2: 88.9% average score (#7 of 50 official models)
  • Terminal Bench 2.0: 50.6% completion at $30.70 cost per attempt
  • Top tasks: 100% on log analysis, calendar creation, commit messages, Dockerfile optimization, earnings analysis

Market context: The agentic coding CLI category converged rapidly in 2025–2026. Claude Code, Codex CLI, Cursor, and Google's Antigravity all arrived at a similar blueprint: terminal surface, plan-before-execute, approval gates, MCP tool access, and delegated agents. The model itself is becoming interchangeable; the competitive difference is shifting to harness design, workflow fit, and cost structure. MCP, now stewarded by the Linux Foundation's Agentic AI Foundation, is the de facto tool-integration standard.

Limitations and critiques

  • Benchmark opacity: xAI has not published a SWE-Bench Verified score for grok-build-0.1. The 70.8% figure repeated across dozens of articles refers to the deprecated predecessor. Without an independent or official evaluation, claims about Grok Build's accuracy are unverified. Source gap: no independent LiveBench, Aider, or SWE-bench evaluation of grok-build-0.1 was found.
  • Early-beta stability: Reviewers report sparse documentation, rate limits, transparency gaps around exact context handling, and hallucinated edits under heavy load (e.G., corrupted Dockerfiles after ambiguous prompts). Source: Lab51, aicerts.Ai via Lab51 citation
  • High CLI barrier: No $20 individual tier. The bundled CLI agent is priced for enterprise teams, not solo developers. Source: Medium critique, Codersera
  • Security/compliance gaps: "Local-first" design keeps source code on the developer's machine, but xAI's compliance paperwork for Grok Build specifically is thinner than marketing suggests. SOC 2 Type 2 and Zero Data Retention exist on the Enterprise API, but the CLI beta lacks proven controls. Source: Lab51
  • Instruction-following weakness: benchable.Ai measured 60% accuracy on complex multi-layered instructions, placing it in the 53rd percentile.
  • Always-on reasoning overhead: The non-configurable reasoning adds token cost and latency even on simple calls where thinking adds no value. Source: ChatForest
  • Partner bias: The most detailed positive hands-on reviews come from Kilo Code, an integration partner that benefits from Grok Build adoption.

Open questions

  • What is Grok Build 0.1's actual, independently verified SWE-Bench Verified score?
  • What is the confirmed architecture (total parameters, active parameters per token, training corpus, fine-tuning methodology)? XAI has published no model card.
  • How does Grok Build 0.1 perform on third-party leaderboards such as LiveBench, Aider, or BigCodeBench?
  • What is actual user retention after the first week? Early beta enthusiasm often decays as edge cases appear.
  • Will Arena Mode (automated output ranking) ship broadly and close the practical accuracy gap?
  • How will xAI's pricing evolve after the six-month $99 introductory period, and will a cheaper individual tier appear?

Practical takeaways

  • For API builders: Grok Build 0.1 is worth a 10-minute experiment. Swap your OpenAI SDK base URL to https://api.x.ai/v1, change the model to grok-build-0.1, and compare quality and cost on a real task. The pricing is genuinely aggressive for output-heavy agentic loops.
  • For CLI users: Treat Grok Build as a pilot, not a production default. Run it alongside Claude Code or Codex CLI for two weeks on your actual codebase. Enable branch protections and sandbox experiments given the early-beta status.
  • For budget-conscious teams: The output-token cost advantage is real. A Kilo Code test built a full webhook service for $1.65 - cheaper than a single attempt on many frontier models. If your workflow is high-volume and scoped, the API economics are compelling.
  • For regulated/code-sensitive teams: The local-first architecture is a genuine differentiator, but verify compliance documentation before letting it touch production repositories.

Sources used


Research completed on 2026-06-07. All claims are source-grounded; benchmark and pricing data may shift as xAI updates documentation and independent evaluations are published.