OpenAI’s Answer to Gemini 3 GPT-5.2 arrives, touting variable reasoning and coding performance

Published
Reading time
3 min read
A table compares GPT-5.2's benchmark scores to Claude Opus 4.5 and Gemini 3 Pro in various reasoning tasks.
Loading the Elevenlabs Text to Speech AudioNative Player...

OpenAI launched GPT-5.2 only weeks after its CEO Sam Altman reportedly issued a “code red” alarm in response to Google's Gemini 3.

What’s new: OpenAI added a suite of GPT-5.2 models to ChatGPT and its API: GPT-5.2 Pro for high accuracy (name in the API: gpt-5.2-pro), GPT-5.2 Thinking for multi-step tasks like coding and planning (gpt-5.2), and GPT-5.2 Instant for less-involved tasks (gpt-5.2-chat-latest). The company touts the new models as time savers in professional tasks like producing spreadsheets, presentations, or code.

  • Input/output: Text and images in (up to 400,000 tokens), text out (up to 128,000 tokens)
  • Knowledge cutoff: August 31, 2025
  • Performance: Outstanding results in some reasoning benchmarks; strong results across coding, math, reasoning benchmarks
  • Features: Adjustable reasoning levels including new x-high (extra high) level, reasoning summaries, distillation allowed, tool use via Responses API, context summarization to extend available context via API
  • Availability/price: Via ChatGPT subscription (Plus, Pro, Go, Business, Enterprise) and API. GPT-5.2 Thinking and Instant: $1.75/$0.175/$14 per million input/cached/output tokens. GPT-5.2 Pro: $21/$168 per million input/output tokens.
  • Undisclosed: Parameter counts, architectures, training data and methods

How it works: OpenAI revealed few details about GPT-5.2’s architecture and training but said it made “improvements across the board, including in pretraining.”

  • API users can adjust GPT-5.2’s reasoning across 5 levels: none, low, medium, high, and x-high.
  • For tasks that exceed the input context limit, GPT-5.2 Pro and GPT-5.2 Thinking offer a Responses/compact API endpoint that compresses lengthy conversations rather than truncating them.

Performance: According to the ARC leaderboards, GPT-5.2-Pro set new states of the art on ARC-AGI-1 and AGI-ARC-2 (abstract visual puzzles). It remains neck-and-neck with competitors on other independent tests.

  • On ARC-AGI-2 (abstract visual puzzles designed to resist memorization), GPT-5.2 Pro set to high reasoning (54.2 percent pass@2, $15.72 per task) outperformed GPT-5.2 Thinking set to x-high (52.9 percent pass@2, $1.90 per task). That’s roughly three times the accuracy at a lower cost than GPT-5.1 Thinking set to high (17.6 percent pass@2, $17.6 per task).
  • On the simpler ARC-AGI-1, GPT-5.2 Pro set to x-high set state-of-the-art at (90.5 percent pass@2, $11.65 per task) became the first model to exceed 90 percent, ahead of Gemini 3 Deep Think Preview (87.5 percent pass@2, estimated $44.26 per task) and Claude Opus 4.5 set to thinking with 64,000 tokens of context (80 percent pass@2, $1.47 per task).
  • On the Artificial Analysis’ Intelligence Index, a weighted average of 10 benchmarks, GPT-5.2 set to x-high scored 73, tying Gemini 3 Pro Preview and beating Claude Opus 4.5 (70) and GPT-5.1 set to high reasoning (70). To complete this test, GPT-5.2 set to x-high ($1,294) cost less than Claude Opus 4.5 ($1498) but more than Gemini 3 Pro Preview set to high reasoning ($1,201). It also tied Gemini 3 Pro Preview set to high (62) on Artificial Analysis's Coding Index (an average of LiveCodeBench, SciCode, Terminal-Bench Hard), ahead of Claude Opus 4.5 (60).
  • GPT-5.2 set to x-high (99 percent) led AIME 2025 (competitive math), ahead of GPT-5.1 Codex and Gemini 3 Pro Preview set to high (both 96 percent).

Behind the news: GPT-5.2 arrived as OpenAI faces heightened competitive pressure. CEO Sam Altman had declared a “code-red” emergency — a level of alarm typically related to smoke and fire in a hospital — on December 1, soon after Google launched Gemini 3. He instructed employees to delay plans to add advertisements to ChatGPT and instead focus on improving the models. OpenAI executives deny that GPT-5.2 was rushed.

Why it matters: GPT-5.2’s gains in computational efficiency are stark. One year ago, achieving 88 percent on ARC-AGI-1 cost roughly $4,500 per task. GPT-5.2 Pro achieves 90.5 percent at around $12 per task, roughly 390 times less. Extended reasoning is becoming dramatically more accessible.

We’re thinking: Technical approaches that aren’t economically feasible today, say running hundreds of reasoning attempts per problem or deploying thousands of reasoning-heavy agents, are on track to become surprisingly affordable within a few years.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox