When Size Isn’t Everything: Why Sapient’s 27M-Parameter HRM Matters for Small Models & AGI

Nov 30th, 2025

What is HRM (and why we should care)

Singapore’s Sapient Intelligence introduced the Hierarchical Reasoning Model (HRM) — a 27M-parameter, brain-inspired, multi-timescale recurrent architecture trained with just 1,000 examples and no pre-training. According to the authors (arxiv.org), HRM outperforms GPT-o3-mini and Claude on the ARC-AGI benchmark, a test designed to measure genuine inductive reasoning rather than pattern replication.

The design mirrors cognitive neuroscience: the brain separates slow, global planning from fast, fine-grained execution. HRM encodes these separate timescales directly into its architecture.

Alt text

Empirical Results

Sapient reports:

ARC-AGI: HRM surpasses o3-mini-high, Claude 3.7 (8K), and DeepSeek R1 on Sapient’s internal ARC-AGI evaluations (coverage).
Structured reasoning tasks: Near-perfect results on Sudoku-Extreme and 30×30 Maze-Hard, where chain-of-thought-dependent LLMs typically break down.
Efficiency profile:
- ~1,000 labeled examples
- Zero pre-training
- No chain-of-thought supervision
- Single-pass inference
- Over 90% reduction in compute relative to typical LLM reasoning pipelines (ACN Newswire)

The data suggests that architectural inductive bias can outperform sheer parameter scale.

How HRM Works

Hierarchical, Multi-Timescale Architecture

HRM is composed of two interconnected recurrent modules: Alt text

High-Level Planner
Slow, abstract reasoning. Responsible for decomposing tasks and constructing coarse strategies.
Low-Level Executor
Fast, detail-oriented operations. Executes logical steps required to satisfy the planner’s subgoals.

This coupled dynamic creates effective depth similar to a deep transformer stack, but with orders of magnitude fewer parameters.

No Chain-of-Thought, No Pre-Training

HRM does not rely on chain-of-thought prompting, large-scale text corpora, or instruction tuning. It performs all reasoning internally and outputs only the final answer.

Benefits include:

Lower latency
Lower memory footprint
Fewer multi-step hallucination/error cascades
Simplified deployment on constrained hardware

Data & Compute Efficiency

By learning directly from end-to-end examples, HRM avoids:

Billion-token pre-training cycles
Long-context inference costs
Model distillation or supervised CoT pipelines

Limitations and Open Questions

Although compelling, HRM is not a general-purpose language model:

Domain narrowness: Designed for puzzles and abstract reasoning, not open-domain tasks.
Reproducibility: Independent results are still sparse.
Generalization: Unclear whether this architecture scales to real-world, noisy environments.
Opacity: No emitted reasoning trace makes interpretation and debugging harder.

A recent study argues that a 7M-parameter Tiny Recursive Model (TRM) can surpass HRM on ARC-AGI (arxiv.org), hinting that recursion, not hierarchy, may be the core enabler.

Implications: Are Small Models Catching Up?

HRM signals a possible shift in how we think about reasoning systems:

Compute accessibility: A 27M-parameter reasoning engine can run on laptops, edge devices, and mid-tier servers.
Sparse-data advantages: Scientific reasoning, robotics, and rare-event domains may benefit from models that do not require massive corpora.
Hybrid architectures: A small HRM-like module for symbolic reasoning paired with a large LLM for language grounding and world knowledge.