Jonathan Haaswritingnowusesabout
emailgithubx
Jonathan Haaswritingnowusesabout

Orchestrating AI Coding Agents: What I Learned Running Three Autonomous Sessions at Once

March 26, 2026·5 min read

I spent a session orchestrating three concurrent AI coding agents across four repos. They shipped 20+ PRs, wrote 100+ blog posts, removed an entire database dependency, and resolved merge conflicts. Here is what actually works.

#ai#ai-agents#developer-tools#engineering-management#automation#claude#codex

Last night I ran three AI coding agents simultaneously -- two OpenAI Codex sessions and one Claude Code session -- across four repositories. They shipped 20+ pull requests, addressed 10+ GitHub issues, wrote two 400-line technical design documents, and handled their own merge conflicts, CI failures, and code review feedback.

This is not a demo. This is what my actual Tuesday looked like.

The Setup

Three tmux sessions on a Linux dev-desktop, accessed over SSH from my Mac. Each agent working a different repo:

  • Cerebro (Codex): A Go graph database engine. Removing a Snowflake dependency, implementing deployment profiles, wiring NATS change capture.
  • Platform (Codex): A Python/FastAPI evaluation platform. Fixing test suites, resolving merge conflicts, addressing RBAC security findings.
  • Maestro (Claude Code): A TypeScript agent framework. Implementing unified thinking-level abstractions, extension system groundwork.

A fourth session -- Hopper, our Next.js marketing site -- ran earlier and churned out 19 PRs of blog posts and documentation pages before I shut it down.

What the Orchestrator Actually Does

I wrote zero lines of code. My job was:

Directing via issues. Every task started as a GitHub issue with an implementation comment. Not "fix the tests" -- a specific breakdown: which files, which patterns to follow, which branch name. The agents read issue comments with gh issue view and work from there.

Checking in. Every few minutes: tmux capture-pane to read what each agent is doing. Are they stuck? Making progress? Burning context on a dead end? This is the core loop. Check, direct, queue, check.

Queue management. Codex supports Tab to queue follow-up prompts. After "fix the CI failures," I Tab-queue "then merge the dependabot PRs" and "then pick up issue #7521." This creates a pipeline of work that flows without me touching it.

Unblocking. When an agent hits a sandbox permission wall, a merge conflict it can't resolve, or a disk-full error -- I intervene. Fix the infrastructure problem, then let the agent continue.

Merging. I watch CI, merge green PRs with gh pr merge --squash --admin, and rebase branches that fall behind main. The agents create PRs; I decide when they ship.

What Actually Works

Issue comments as coordination layer

The single most effective pattern: post detailed implementation guidance as a GitHub issue comment before the agent starts. This persists beyond the session. When an agent dies at 12% context and I restart it fresh, I just say "read issue #711" and it picks up exactly where the guidance says to start. No prompt reconstruction. No lost context.

Backend-specific agents

Giving each agent a single repo and language worked far better than asking one agent to context-switch between Go, Python, and TypeScript. The agents internalize project conventions -- import patterns, test structures, commit message formats -- and stay consistent. Context-switching between codebases burns tokens on re-learning.

Tab-queued pipelines

Codex's Tab queue is underrated. A well-loaded queue means the agent transitions smoothly from "fix CI" to "merge PRs" to "start feature work" without me sending a new prompt each time. I front-loaded 3-4 queued tasks per session and checked in less frequently.

Kill without sentiment

A session at 14% context with three queued tasks needs a restart, not encouragement. Kill it, restart with --dangerously-bypass-approvals-and-sandbox, give it a one-paragraph summary of where things stand, and Tab-queue the remaining work. The new session at 100% context will outperform the struggling one in minutes.

What Does Not Work

Agents do not proactively check for review feedback

This was the biggest operational gap. Cursor Bugbot posted high-severity findings on PRs -- RBAC permission downgrades, data race conditions, SQL injection patterns -- and the agents shipped and moved on. They never circled back. I had to manually audit every open PR for unresolved comments and then interrupt agents to address them.

Agents will take the path of least resistance on deployment

Without explicit GitOps instructions, agents will kubectl apply directly, push to main without a branch, or run terraform apply from their shell. You must state the deployment model in the first prompt and again as an issue comment. "ArgoCD watches k8s/, all changes via git" needs to be said every time.

Long reasoning phases look like hangs

Codex enters 5-minute "Working..." phases where it is planning but producing no visible output. The first few times I interrupted these. Wrong move -- the agent was actually making good decisions about how to structure a complex change. The tell: if the context percentage is dropping, it is working. If it is frozen, it is stuck.

Pre-commit hooks are the biggest time sink

More agent time was lost to pre-commit hook failures than to actual logic bugs. Git hooks that run linters, type checkers, OpenAPI validators, and UUID audits add 2-5 minutes per commit attempt. When the hook fails on something unrelated to the agent's changes, the agent enters a fix-hook, retry, fail, fix-hook loop. Bypassing with --no-verify and validating manually was always faster.

The Economics

In one session:

  • Cerebro: 8 PRs (7 merged), 8 issues addressed, 1 Snowflake dependency fully removed
  • Platform: 5 PRs merged, test suite fixed, 2 design documents posted (750+ lines total), merge conflicts resolved on a 45-file PR
  • Hopper: 19 PRs merged, 107 blog posts, 29 documentation pages
  • Maestro: 2 PRs, 3 issues worked, unified thinking-level abstraction shipped

The constraint was not agent capability. It was my orchestration bandwidth. Three agents was the sweet spot -- I could maintain the check-direct-queue loop with enough frequency that no agent sat idle or went off-track for long. With four, I started dropping context on what each was doing.

The Meta-Lesson

The value of AI coding agents is not in any individual output. A single PR from Codex is fine. It is roughly what a junior engineer would produce with clear direction.

The value is in parallelism with direction. Three agents working simultaneously, each with a clear issue, a queued pipeline, and periodic course correction, produce more in a few hours than a solo developer produces in a week. Not because the agents are better -- because they never context-switch to Slack, never take a coffee break, never lose motivation on the third test fix in a row.

The orchestrator role -- the person managing the agents -- is the new bottleneck. The skill is not prompt engineering. It is the same skill that makes a good engineering manager: clear task definition, fast feedback loops, knowing when to intervene and when to let things run, and never doing the work that can be delegated.

The agents are the easy part. The system around them is the hard part. And we are just getting started.

share

Continue reading

Building the HTTP for Agents: A Complete Guide to Agent Infrastructure

Autonomous agents need the same infrastructure primitives that web services got a decade ago: identity, policy, and secrets as first-class citizens.

AI Code Review Is Reasoning, Not Pattern Matching

AI code reviewers moved from rules-based checking to reasoning-based analysis. The gap between what they catch and what humans catch is closing fast.

The Shift to Async Code Gen: What It Means for Developers

Async code generation turns development into specification and review. The coding happens in the background. This changes what it means to be a senior...

emailgithubx