Last night I ran three AI coding agents simultaneously -- two OpenAI Codex sessions and one Claude Code session -- across four repositories. They shipped 20+ pull requests, addressed 10+ GitHub issues, wrote two 400-line technical design documents, and handled their own merge conflicts, CI failures, and code review feedback.
This is not a demo. This is what my actual Tuesday looked like.
The Setup
Three tmux sessions on a Linux dev-desktop, accessed over SSH from my Mac. Each agent working a different repo:
- Cerebro (Codex): A Go graph database engine. Removing a Snowflake dependency, implementing deployment profiles, wiring NATS change capture.
- Platform (Codex): A Python/FastAPI evaluation platform. Fixing test suites, resolving merge conflicts, addressing RBAC security findings.
- Maestro (Claude Code): A TypeScript agent framework. Implementing unified thinking-level abstractions, extension system groundwork.
A fourth session -- Hopper, our Next.js marketing site -- ran earlier and churned out 19 PRs of blog posts and documentation pages before I shut it down.
What the Orchestrator Actually Does
I wrote zero lines of code. My job was:
Directing via issues. Every task started as a GitHub issue with an implementation comment. Not "fix the tests" -- a specific breakdown: which files, which patterns to follow, which branch name. The agents read issue comments with gh issue view and work from there.
Checking in. Every few minutes: tmux capture-pane to read what each agent is doing. Are they stuck? Making progress? Burning context on a dead end? This is the core loop. Check, direct, queue, check.
Queue management. Codex supports Tab to queue follow-up prompts. After "fix the CI failures," I Tab-queue "then merge the dependabot PRs" and "then pick up issue #7521." This creates a pipeline of work that flows without me touching it.
Unblocking. When an agent hits a sandbox permission wall, a merge conflict it can't resolve, or a disk-full error -- I intervene. Fix the infrastructure problem, then let the agent continue.
Merging. I watch CI, merge green PRs with gh pr merge --squash --admin, and rebase branches that fall behind main. The agents create PRs; I decide when they ship.
What Actually Works
Issue comments as coordination layer
The single most effective pattern: post detailed implementation guidance as a GitHub issue comment before the agent starts. This persists beyond the session. When an agent dies at 12% context and I restart it fresh, I just say "read issue #711" and it picks up exactly where the guidance says to start. No prompt reconstruction. No lost context.
Backend-specific agents
Giving each agent a single repo and language worked far better than asking one agent to context-switch between Go, Python, and TypeScript. The agents internalize project conventions -- import patterns, test structures, commit message formats -- and stay consistent. Context-switching between codebases burns tokens on re-learning.
Tab-queued pipelines
Codex's Tab queue is underrated. A well-loaded queue means the agent transitions smoothly from "fix CI" to "merge PRs" to "start feature work" without me sending a new prompt each time. I front-loaded 3-4 queued tasks per session and checked in less frequently.
Kill without sentiment
A session at 14% context with three queued tasks needs a restart, not encouragement. Kill it, restart with --dangerously-bypass-approvals-and-sandbox, give it a one-paragraph summary of where things stand, and Tab-queue the remaining work. The new session at 100% context will outperform the struggling one in minutes.
What Does Not Work
Agents do not proactively check for review feedback
This was the biggest operational gap. Cursor Bugbot posted high-severity findings on PRs -- RBAC permission downgrades, data race conditions, SQL injection patterns -- and the agents shipped and moved on. They never circled back. I had to manually audit every open PR for unresolved comments and then interrupt agents to address them.
Agents will take the path of least resistance on deployment
Without explicit GitOps instructions, agents will kubectl apply directly, push to main without a branch, or run terraform apply from their shell. You must state the deployment model in the first prompt and again as an issue comment. "ArgoCD watches k8s/, all changes via git" needs to be said every time.
Long reasoning phases look like hangs
Codex enters 5-minute "Working..." phases where it is planning but producing no visible output. The first few times I interrupted these. Wrong move -- the agent was actually making good decisions about how to structure a complex change. The tell: if the context percentage is dropping, it is working. If it is frozen, it is stuck.
Pre-commit hooks are the biggest time sink
More agent time was lost to pre-commit hook failures than to actual logic bugs. Git hooks that run linters, type checkers, OpenAPI validators, and UUID audits add 2-5 minutes per commit attempt. When the hook fails on something unrelated to the agent's changes, the agent enters a fix-hook, retry, fail, fix-hook loop. Bypassing with --no-verify and validating manually was always faster.
The Economics
In one session:
- Cerebro: 8 PRs (7 merged), 8 issues addressed, 1 Snowflake dependency fully removed
- Platform: 5 PRs merged, test suite fixed, 2 design documents posted (750+ lines total), merge conflicts resolved on a 45-file PR
- Hopper: 19 PRs merged, 107 blog posts, 29 documentation pages
- Maestro: 2 PRs, 3 issues worked, unified thinking-level abstraction shipped
The constraint was not agent capability. It was my orchestration bandwidth. Three agents was the sweet spot -- I could maintain the check-direct-queue loop with enough frequency that no agent sat idle or went off-track for long. With four, I started dropping context on what each was doing.
The Meta-Lesson
The value of AI coding agents is not in any individual output. A single PR from Codex is fine. It is roughly what a junior engineer would produce with clear direction.
The value is in parallelism with direction. Three agents working simultaneously, each with a clear issue, a queued pipeline, and periodic course correction, produce more in a few hours than a solo developer produces in a week. Not because the agents are better -- because they never context-switch to Slack, never take a coffee break, never lose motivation on the third test fix in a row.
The orchestrator role -- the person managing the agents -- is the new bottleneck. The skill is not prompt engineering. It is the same skill that makes a good engineering manager: clear task definition, fast feedback loops, knowing when to intervene and when to let things run, and never doing the work that can be delegated.
The agents are the easy part. The system around them is the hard part. And we are just getting started.