Building Intelligent Automation Systems

Building Intelligent Automation Systems

May 15, 2025
6 min read
Automation
AI Agents
Production ML

There's a moment in *Seven Samurai* where the leader of the samurai is recruiting his fighters and explains that they need exactly seven, no more and no less, because the village can be defended by seven well-placed men but a hundred well-meaning ones will trip over each other. That's the closest analogy I have for designing a multi-agent system. The temptation is always to add another agent. The discipline is figuring out the smallest number that can actually hold the line.

The phrase "intelligent automation" has been ruined by people who sell it. I'm going to use it anyway because the underlying thing is real, and because nobody has come up with a better name for the actual work.

Here's what the work is. You have a process that involves human judgment at several steps. Maybe a venue booker checking three calendars, quoting a price, and going back and forth on dates. Maybe a loan officer scanning a document for one number that decides everything. The traditional automation answer is to write rules. If this, then that. The problem is that the real process has about 200 ifs, half of them only show up in production, and the people who do the work for a living can't list them all because they don't think about them. They just know.

What I've been building at Ensemble is the version of automation that doesn't pretend the 200 ifs don't exist. The orchestration layer is n8n, because it's honest about what it is, which is glue. The interesting part sits on top: small agents that handle one slice of the process each, talking to each other in a way that lets the messy stuff stay messy without breaking everything downstream. One agent owns calendar reconciliation. Another owns pricing. Another handles the actual conversation with the venue. They don't know about each other's internals. They just pass clean handoffs.

This is not a new architecture idea. People have been writing about multi-agent systems since the 90s. What's new is that the agents are now smart enough to handle the ambiguity that used to require a human, and cheap enough to run that you can actually deploy more than one of them.

The thing nobody tells you about building these systems is that 80% of the work is not the agents. It's the evaluation. How do you know your booking agent is actually doing a good job? Not in the demo. In the wild, with the venue owner who answers in two-word texts at midnight, with the customer who changes their mind three times, with the calendar that has a recurring event from 2019 that nobody remembered to delete. You can't write tests for that ahead of time. You can only build the observability to catch it after it happens, and then close the loop.

I learned this the slow way. The first version of Ensemble's booking agent looked great in controlled tests. I put it in front of real venues and it did fine for about a week, then started making confident, completely wrong decisions in a long-tail of edge cases I hadn't imagined. The fix wasn't a smarter model. The fix was instrumenting every single decision the agent made, watching the logs every morning for a month, and adding guardrails one failure mode at a time. The model got cheaper. The system got better.

A few things I'd tell you if you're trying to build something like this:

Start with the worst version of the process. Don't start by mapping the ideal workflow. Start by sitting with the person who does the job today, and asking them to walk you through the last three times it went wrong. The pattern of failure tells you more about what to automate than the pattern of success.

Pick the boring orchestration layer. n8n, Airflow, whatever. The temptation is to build your own because it'll be cleaner. It won't be. You'll spend six weeks reinventing retries.

Don't trust your own demos. The demo is the version where you knew what was going to happen. Production is the version where you don't.

Treat hallucination as a measurable defect rate, not as a property of the model. If your agent hallucinates 4% of the time and your tolerable rate is 1%, that's a number you can drive down. If you treat it as "AI is just like that," you'll never improve it.

The honest summary is that intelligent automation is mostly unintelligent infrastructure with a few smart components in the middle. The smart components get the headlines. The infrastructure is what determines whether it actually works.

If you're building something in this space and want to compare notes, my email's on the contact page. I'm always interested in how other people are solving the boring part.