Bad Agent Architecture

This weekend ended up being a pain in the butt lesson in bad agent architecture.

I migrated OpenClaw from Claude to Codex. Anthropic cutting off third party harnesses makes total sense. That wasn’t the painful part.

The painful part was realizing how much of my automation stack was really built around one model, Opus 4.6.

When personal automation lives behind WhatsApp, Telegram, Discord, or iMessage, it feels clean. You chat with it and stuff happens. What you don’t see are the prompts, markdown files, skills, brittle scripts, weird edge cases, and spaghetti glue underneath. You also don’t see how many failed tool calls might have happened before you got one clean result, or that sometimes the result only happened because one model was more persistent than another.

The lesson is that a lot of “agentic” systems are still just vibe-coded systems. They look robust right up until one model changes. Turns out Opus 4.6 is just that much better than even GPT 5.4 at agentic use cases.

So now I’m rebuilding more of my personal automation stack the boring way. More deterministic code. More scripts. More tests. More monitoring. More alerting.

A few things I now think should be standard if you’re building serious personal automation:

keep the critical path deterministic whenever you can
use agents for judgment and messy exploration, not for steps that should behave the same way every time
read the prompts, markdown files, skills, TOOLS.md, and AGENTS.md instead of treating the chat surface like magic
actually understand how the architecture works, especially where state, retries, and fallbacks live
when something gets weird, ask the agent to inspect the session jsonl logs and trace what actually happened
log intermediate steps and failed tool calls, not just the final answer
have a cron job periodically review runs and tell you the recurring failure patterns
treat model swaps like dependency upgrades, and assume they will break more than you think

turns out the best practices for building software transfer pretty well to building personal automation too.

duh