Jonathan Haaswritingnowusesabout
emailgithubx
Jonathan Haaswritingnowusesabout

When Claude Hits Its Limits: Building an AI-to-AI Escalation System

June 25, 2025·3 min read

Different LLMs have different strengths. Routing tasks to the right model -- like heterogeneous compute -- turns out to be more valuable than using one ...

#ai#mcp#claude#gemini#debugging

I hit a wall debugging a distributed system race condition. Claude Code had analyzed 30 files, but the bug spanned microservices with gigabytes of traces. Claude is excellent at surgical code edits. Correlating thousands of trace spans across services requires a model that can hold all that context at once.

So I built an escalation system. Claude calls Gemini when it needs the 1M token context window. Gemini returns structured analysis. Claude uses it to make the fix.

The Architecture

The Model Context Protocol (MCP) made this straightforward. An MCP server sits between Claude Code and Gemini. When Claude recognizes it needs help -- too many files, too much trace data, cross-service correlation -- it calls the escalation tool. The server packages relevant context and routes it to Gemini. Gemini returns structured findings. Claude acts on them.

The interesting addition is conversational tools. Instead of one-shot analysis, Claude and Gemini engage in multi-turn dialogues. Claude asks follow-ups based on Gemini's responses. The models build on each other's insights in ways single-shot analysis can't achieve.

What I Learned

Context preparation is 80% of the work. The hardest part isn't the API calls. It's deciding which files, traces, and logs are actually relevant. Too little context and the analysis fails. Too much and you waste tokens and confuse the model. Most development time went into intelligent context selection.

The escalation heuristic matters. Not every problem needs Gemini. The MCP server uses thresholds: file count over 50, trace size over 100MB, more than 3 services involved, more than 5 failed approaches. Getting this wrong in either direction is expensive -- unnecessary escalations waste money, missed escalations waste developer hours.

Multi-turn beats single-shot. Single-shot analysis misses nuances. The conversational approach lets models build iteratively, leading to discoveries neither would make alone. This was the biggest surprise -- multi-turn capability was more valuable than raw context window size.

It's not magic. Sometimes the analysis is wrong. Sometimes the context preparation misses the relevant data. Sometimes the bug is in the one file you didn't include. This is a tool that improves your odds, not a guarantee.

The Bigger Point

Treat LLMs like heterogeneous compute resources. Route tasks to the model best equipped to handle them, the same way you'd pick the right database for the right query pattern. Claude for code edits and reasoning. Gemini for massive context analysis. Smaller models for simple classification tasks.

In two years, developers will orchestrate multiple specialized models routinely. The question is whether you're building the orchestration layer now or waiting for someone else to build it for you.

The deep-code-reasoning-mcp server is open source.

share

Continue reading

Two Minds in the Machine: Shared Context Is the Only Thing That Matters

I added Gemini to a codebase that already had Claude embedded. The useful discovery was about shared context files, not model capabilities.

AI Detection Hysteria: When Human Creativity Gets Mislabeled

A photographer friend posted a sunset photo after three hours of waiting for the perfect light. Within minutes: 'Obvious Midjourney.' 'Nice prompt, bro.'

The CLI Renaissance: How AI is Driving the Command Line Revolution

AI coding assistants output shell commands, not GUI instructions. That single fact is reversing a decade of developer tooling trends.

emailgithubx