Blogs


DeepSeek mHC: Fixing the Hidden Chaos in Giant AIs

If you’re anything like me, you’ve probably spent the last few years glued to the whirlwind of AI advancements. From ChatGPT blowing our minds to models getting bigger and smarter by the day, it’s been a wild ride. But every now and then, something comes along that feels like a real paradigm shift – not just more parameters or fancier training data, but a fundamental rethink of how these beasts work under the hood. That’s exactly what DeepSeek’s latest innovation, mHC (short for Manifold-Constrained Hyper-Connections), feels like to me. I stumbled upon their paper right at the start of 2026, and man, it got me excited. It’s not just another incremental tweak; it’s a clever fix to a problem that’s been lurking in neural networks for over a decade.

What the Heck is DeepSeek and Why Should You Care About mHC?

First off, a quick intro to DeepSeek for those who might not be as deep in the AI weeds. DeepSeek is a Chinese AI lab that’s been punching way above its weight class. They’re the folks behind models like DeepSeek-V2 and DeepSeek-Coder, which have consistently outperformed bigger names from OpenAI or Google on certain benchmarks, often at a fraction of the cost. They’re all about efficiency and open-source vibes, which is refreshing in an industry that’s sometimes too secretive.

Now, mHC? It’s their fresh-out-of-the-oven framework, detailed in a paper released on December 31, 2025. The full name is Manifold-Constrained Hyper-Connections, and it’s basically a smarter way to handle the “connections” inside neural networks. If you’ve ever wondered why training massive models can be so unstable – like, why do gradients explode or vanish, causing the whole thing to crash? – mHC tackles that head-on. It’s built on top of something called Hyper-Connections (HC), which was a cool idea from ByteDance in 2025, but HC had some serious flaws. DeepSeek fixed them by adding mathematical “constraints” that keep things stable without sacrificing performance.

Read on →

When AI Quit Whispering and Started Running the Show

Listen to this post

It’s the last day of 2025, and I’m hunched over my laptop in a Bengaluru apartment. The ceiling fans hum low, pushing back the winter chill. The air outside has that fake December bite—cool enough for a light sweater, but not cold like up north. I’ve been deep in this AI mess all year: jumping on calls that drag like bad dates, reading endless Slack chats from tired coders, and sorting through pitch decks that could stack to the moon. Lately, I’ve been tinkering with an agent app right here on my machine—a simple tool to help solo devs juggle tasks, mixing R1 bits with Copilot tricks while the city winds down for the holidays. Remember those wild January chats? Folks swearing AGI would fix everything from sick beds to sock drawers by summer. Nah. It was more like giving a kid the wheel—fun rides, close calls, and a lot of yelling.

But here’s the thing: 2025 didn’t bring the end-of-days robot takeover we joked about. No AIs kicking bosses out of offices. It was the year the nuts and bolts got honest. We cut the fat on power-hungry training. Agents crawled out of chat boxes and into real jobs, handling the boring stuff we used to fake. And the gear? Man, the gear. A trillion bucks thrown at it, leaving data halls wheezing and my power bill 40% higher. As I sip filter coffee—spilling drops on the keys—I feel we’ve tipped over an edge. Not some shiny paradise. Something rawer, more like us with our screw-ups. Let’s sift through the mess and squint at what’s next.

The Wake-Up Call: DeepSeek R1 Kills the “Spend Big” Lie

Think back: January starts with the same old hype. OpenAI rolls out a beefed-up version. Anthropic tweaks Claude to fix bugs and toss in lame jokes. Tech hotshots burn money like candy—$200 million for one “super model,” $500 million for “fast thinking.” Looks cool at first. Then you step back. Training bills had jumped to nuts levels: GPT-4o hit about $100 million. Claude 3.5 Opus close behind. Each one just dumping raw power into the mix. NVIDIA’s shares? They jittered like a guy on too much coffee, touching $150 by March on talk of endless growth. But growing what? More brain cells? Bigger piles of web junk? Felt like stacking cards into a tower—pretty, but one breeze away from flat.

Read on →

The Code That Bit Back: Surviving AI’s Jagged Frontier in Code Reviews

I remember the day our shiny new AI code reviewer went live like it was yesterday. It was a Tuesday in early 2025, and our team at EchoSoft—a mid-sized dev shop cranking out enterprise apps—had just pushed the button on integrating GPT-4o into our GitHub Actions pipeline. We’d spent weeks fine-tuning prompts, benchmarking against human reviewers, and celebrating how it slashed review times from hours to minutes. “This is it,” I told the devs over Slack. “No more blocking PRs on nitpicks.” We high-fived virtually, popped a bottle of virtual champagne, and watched the first few PRs sail through with glowing approvals.

Then came PR #478 from junior dev Alex. A simple refactor of our auth module—nothing fancy, just swapping out a deprecated hash function for Argon2. The AI scanned it in seconds: “LGTM! Solid upgrade, no security flags.” Alex merged it. By Friday, our staging server was compromised. Attackers exploited a buffer overflow the AI had glossed over because, in its infinite wisdom, it hallucinated that our input sanitization was “enterprise-grade” based on a snippet from some outdated Stack Overflow thread it pulled from thin air. We lost a weekend scrubbing logs, notifying users, and patching the hole. The client? They bailed, citing “unreliable tooling.” That stung. We’d bet the farm on AI being our force multiplier, but it turned out to be a loaded gun.

Why did this happen? Not because we picked a bad model—GPT-4o was crushing benchmarks left and right. No, it was the jaggedness. That term had been buzzing in AI circles for months, ever since Ethan Mollick’s piece laid it out clear as day: AI doesn’t progress smoothly like a rising tide; it advances in fits and starts, acing PhD-level theorem proving one minute and fumbling basic if-else logic the next. Our code reviewer was a poster child for it—flawless on boilerplate CRUD ops, but a disaster on edge-case vulns that humans spot with a coffee-fueled squint. We’d ignored the warning signs during our proof-of-concept phase, too dazzled by the 95% accuracy on synthetic datasets. In production, though? The cracks showed fast.

Read on →

When Size Isn’t Everything: Why Sapient’s 27M-Parameter HRM Matters for Small Models & AGI

What is HRM (and why we should care)

Singapore’s Sapient Intelligence introduced the Hierarchical Reasoning Model (HRM) — a 27M-parameter, brain-inspired, multi-timescale recurrent architecture trained with just 1,000 examples and no pre-training. According to the authors (arxiv.org), HRM outperforms GPT-o3-mini and Claude on the ARC-AGI benchmark, a test designed to measure genuine inductive reasoning rather than pattern replication.

The design mirrors cognitive neuroscience: the brain separates slow, global planning from fast, fine-grained execution. HRM encodes these separate timescales directly into its architecture.

Alt text

Empirical Results

Sapient reports:

  • ARC-AGI: HRM surpasses o3-mini-high, Claude 3.7 (8K), and DeepSeek R1 on Sapient’s internal ARC-AGI evaluations (coverage).
  • Structured reasoning tasks: Near-perfect results on Sudoku-Extreme and 30×30 Maze-Hard, where chain-of-thought-dependent LLMs typically break down.
  • Efficiency profile:
    • ~1,000 labeled examples
    • Zero pre-training
    • No chain-of-thought supervision
    • Single-pass inference
    • Over 90% reduction in compute relative to typical LLM reasoning pipelines (ACN Newswire)

The data suggests that architectural inductive bias can outperform sheer parameter scale.

Read on →

The $1.5 Trillion Question: Is AI Investment a Bubble or the Future?

The world is witnessing an investment phenomenon unlike anything since the dot-com boom. In 2024 alone, artificial intelligence companies attracted over $100 billion in venture capital funding, while semiconductor manufacturing has seen commitments exceeding $630 billion. Tech giants are pouring unprecedented sums into AI infrastructure, with some analysts now questioning whether this represents visionary transformation or dangerous overinvestment. The answer may determine the trajectory of the global economy for the next decade.

The Numbers Don’t Lie: A Historic Investment Surge

AI Funding Reaches Stratospheric Heights

The scale of AI investment in 2024-2025 defies historical precedent:

  • Global AI VC funding in 2024: $110 billion (up 80% from $55.6 billion in 2023)
  • Generative AI funding alone: $45 billion (nearly double 2023’s $24 billion)
  • 2025 trajectory: Through August, AI startups raised $118 billion, on pace to exceed 2024’s record
  • Market concentration: AI captured 33% of all global venture funding in 2024

To put this in perspective, AI investment in 2024 represented the highest funding year for the sector in the past decade, surpassing even the peak global funding levels of 2021. The late-stage deal sizes tell an even more dramatic story: average valuations jumped from $48 million in 2023 to $327 million in 2024 for generative AI companies.

Read on →