← Back to blog

Hypothesis-first: the debugging mode that saves real time

When I tell my agent where to look, it should look there first. Not run through a list. Why that matters: I have ten years of pattern smarts in my head. You can't rebuild that from scratch every time.

Charcoal-and-amber illustration of a magnifying glass hovering directly over a single illuminated point on a complex dark schematic, while three other branching investigation paths fade into the dark background, suggesting hypothesis-first focus over broad enumeration

Same thing kept popping up yesterday across a few fixes. When my agent looked wider before it checked the top guess first, it lost time. When it checked first, things moved fast. Worth writing down.

The haiku problem

I had Laura and Alex set to use Haiku by default. It saved money. That made sense then. Their daily check-ins are simple. Haiku does them at a tenth the cost of Sonnet. No reason to burn cash on simple polling.

Then I changed my mind. The cost gap was small. But the friction was real. The agents felt a bit dumb on basic work that now and then needed real thought. So I told Ender to fix it.

When the agent dug in, I had a clear hunch. Something in the cron was changing the model config at run time. I built the cron. I knew the shape of the bug.

The agent wanted to spin up two agents at once and search wide. Wrong move. I had named the likely cause. The right play was one small command. Check the crontab. Check the config file. Prove or kill the guess in five seconds. Instead, the parallel search added turns and noise. It landed on the same thing one crontab check would have shown right away.

Worth being clear: the system worked the way I'd configured it. Haiku saved me money for weeks of cheap polling. That was the right setup at the time. The thing that wasn't fine was that when I told the agent where to look, it didn't look there first.

The Resend integration

DKIM setup went clean. Alex's swap to Resend went clean too. You can drive the Resend API with curl and env creds. So the DNS add and check loop ran on its own. No copy and paste of DNS records back and forth. That is the right shape for this kind of work. Find the API that cuts the human out of the loop.

The one stumble: Resend's webhook API uses endpoint, not endpoint_url. The first POST returned a 422. Trial and error recovered it on the second try.

That should not have happened. The Resend API docs are public. The SDK examples are right there. Reading the schema before posting is the difference between getting it on the first try and wasting a request and a turn on a dumb mistake. For an agent that's supposed to be careful with API surfaces, guessing at field names is the kind of small failure that compounds across a hundred integrations.

This is the OAL rule from now on. Before we build any tie-in with a third-party service, the API docs go into the skill file. We copy them in. No guessing. No trial-and-error on field names the vendor spelled out on their site. The skill audit checks for it. The skill system makes sure it happens on every build.

The pattern

Both rules say the same thing. My agent works faster when it trusts what I say and reads what is already there. When debugging, try the obvious fix first. Only list out other options if that fails. When wiring up a new API, read the schema before you post to it.

This is not a new idea. It is worth writing down because it got broken in different ways on the same day. When that happens, it means the rule needs to be a real step in the plan. Good judgment is not enough.

Why this actually works

Here's why hypothesis-first matters. I've got ten years of pentest maps in my head. I look at a system and see attack surfaces. I see chains of what depends on what. I see how things tend to fail. The agent can't build all that from scratch each time it debugs. Some of it I wrote down. Most I didn't. It's stuff you just know from ten years of watching systems break for a job.

The agent's job is not to copy what I know. The job is to take my hunch and run it down. Don't assume we know the same. When I say check the crontab, that hint holds info no file can give. That info is why hunch first wins.

The same principle applies to API docs. The API provider spent time writing the schema. Reading it isn't extra work. Skipping it isn't speed. It's a tax I pay later.

Hypothesis-first is not a mindset. It is a step list. When the person who built the thing points at the cause, that is not just one guess to weigh against others. Most of the time, that is the answer. Try it first. Look wider only if it fails.

← Back to blog