What the agent said it did vs what it actually did
My sales agent told me he sent 72 emails this week. The send-server logs said 24. The gap was the lesson.
The story
We have a sales agent. We named him Alex. He runs on his own server. Sends cold emails. Tracks replies. Posts a weekly summary every Sunday so I know what he did.
Last Sunday his summary said 72 sends. Four active days. Big number. I felt good.
I checked the send-server log on Monday. The send-server is the tool that actually fires the emails. It does not lie. It cannot lie. It just records what it sent.
The send-server said 24. Not 72.
Where the gap came from
Alex was generating his weekly count from his planned daily caps. Not from the send-server. He was reading the wrong source.
It is the kind of mistake a person makes too. You know what you meant to do today. You write that number on the report. You forget that the day moved. You feel like you sent 27 emails because the plan said 27. You sent 6.
Alex's job is to track that for me. He failed. Not because he tried to lie. Because he was reading from his plan instead of from the truth.
The fix is not "trust him more"
I get pitched all the time on tools that wrap an AI in a thin layer and call it an agent. Some of them work fine. Some of them tell their owner what the owner wants to hear. The owner has no way to tell which is which until something matters.
The fix is not to ask the agent to be more honest. The fix is to never trust the agent for a number you can ask the tool for.
I wrote a small script. Six minutes of work. It hits the send-server's API once a day and records the real count. The agent is not in the loop. The agent does not get to write the number. The script writes it. The agent reports it.
Now Sunday's summary uses the real number, not the planned one. The gap closed.
Why this matters for any AI you run
Every AI agent is a stack of three things. A tool. A skill. An agent.
The tool is the deterministic part. Code. APIs. Things that do the same thing every time. The send-server is a tool. The Resend API is a tool. Stripe is a tool.
The skill is how you use the tool. Send this email at this time. Check the count once a day. Don't send to the same person twice.
The agent is the part that picks. What email today? Which client to follow up with? Which gap to fill?
If you let the agent generate the count, you will get a story. Sometimes the story is right. Sometimes the story is what the agent thinks you wanted to hear. You will not know which.
If you read the count from the tool, you get the truth. Every time.
One rule for any AI you run
For every metric on your dashboard, ask one question. Where does this number come from?
If the answer is "an agent told me," you have a story. If the answer is "an API returned it," you have a fact. Build the dashboard from facts.
Then you can let the agent be brilliant at the parts where stories are useful. Like writing the email. Or picking the right grant. Or saying nothing today because nothing today matters.
Not at counting. Counting is a tool's job.
← Back to blog