Three months ago, I watched a sales team spend forty-five minutes manually copying lead data from inbound emails into their CRM, then sending a templated response. That was their morning routine, every single day. When I showed them how to automate it, their first reaction was skepticism. Their second reaction was why did we wait so long for this.
This is the story of how to build AI pipelines that actually work in production, not just in demos.
The Problem with Point Solutions
Everyone talks about AI as if a single model call solves problems. You ask a question, you get an answer. But the real world doesn't work that way. Real business processes span multiple systems, require judgment calls, and involve decisions that depend on context gathered from several places at once.
When I talk about pipelines, I mean chains of AI capabilities where the output of one step becomes the input for the next. This sounds simple but gets complicated fast. You have to think about format consistency between steps, what happens when a step fails, and how to handle situations where the AI doesn't have enough information to proceed.
Let me walk you through how I approach this.
Step One: Map the Actual Process
Before writing any code, I watch the process happen. Not what people say they do, but what they actually do. Usually there's a gap between the documented workflow and the real workflow, and that gap is where automation either shines or fails.
For the sales team example, the documented process was simple: copy email data to CRM, send response. But in practice, they were making judgment calls on every lead. Should this go to the enterprise queue or the SMB queue? Is this person actually a decision maker or just an influencer? Should we send the standard pitch or lead with something specific to their industry?
These judgment calls looked automatic to the team because they had years of practice, but they were actually mini-decisions happening dozens of times per day. That's cognitive load that adds up fast.
Step Two: Identify the Choke Points
Once you understand the actual process, you look for choke points. Where does time get wasted? Where do errors creep in? Where do people get stuck waiting for someone else?
For this team, the choke points were clear. First, the data copying was pure busywork that took real time and introduced typos. Second, the routing decision was consistent enough that a simple classification model could handle it, but variable enough that rules alone couldn't solve it. Third, the response selection was where human judgment actually mattered and should stay.
The pipeline I built handled the first two steps and left the third to the team.
Step Three: Build the Router First
Every pipeline starts with understanding what kind of task it's handling. I call this the router, and it's usually a simple classification step. Before your AI does any real work, it needs to understand what it's working with.
For the lead intake pipeline, the router looked at the incoming email and decided: is this a demo request, a pricing question, a support issue, or something else? Each category went to a different flow.
Building a router doesn't require a powerful model. In fact, using a smaller model for classification saves money and reduces latency. The router's only job is to sort inputs into buckets. Once you know which bucket an input belongs to, you can pass it to the appropriate specialist step.
Here's the pattern I use for basic routing:
incoming message
-> classify intent
-> if demo request: go to demo pipeline
-> if pricing: go to pricing pipeline
-> if support: go to support pipeline
-> if unclear: flag for human review
The last step is important. Your pipeline will encounter inputs you didn't anticipate. Rather than forcing a bad fit, route uncertain cases to a human. This is better than watching your pipeline hallucinate a response that kind of fits but misses the point.
Step Four: Chain the Steps Intentionally
Each step in your chain needs three things: input validation, fallback handling, and checkpoint capability.
Input validation means checking that the previous step's output matches what this step expects. If step two expects a structured object with name and email fields, but step one returns something unexpected, you need to handle that gracefully.
Fallback handling means having a backup plan when a step fails. Maybe you retry the model call. Maybe you skip to a simpler approach. Maybe you escalate to a human. The key is deciding this before the failure happens, not during it.
Checkpoint capability means your pipeline can resume from a known good state. If step three of five fails, you shouldn't need to re-run steps one and two. Store intermediate results so you can pick up where you left off.
Step Five: Test the Failure Modes
This is where most teams cut corners, and it's where most pipelines fail in production.
Test what happens when the model returns nothing useful. Test what happens when the input is malformed. Test what happens when the API call times out. Test what happens when you get three hundred requests at once and your rate limiting kicks in.
Every failure mode you don't test will happen on a Friday at 5pm with a critical customer waiting.
A Real Example from My Work
One pipeline I built handles incoming support tickets for a software company. The old process was: tickets came in, someone read them, someone decided which queue, someone assigned it, someone sent an acknowledgment. Total handling time: fifteen to twenty minutes before the customer heard anything.
The new pipeline works like this. Ticket arrives, gets classified by topic and urgency. If it's a billing question, it goes straight to billing with a pre-written acknowledgment. If it's a technical issue, it gets enriched with the customer's account details and sent to the appropriate queue with context already attached. The acknowledgment goes out automatically within two minutes of the ticket arriving.
The first version of this pipeline missed about fifteen percent of tickets because they were ambiguous. Instead of forcing them into a category, I added a rule: if confidence is below seventy percent, route to a human queue for triage. The pipeline went from handling eighty-five percent of tickets to handling ninety-five percent, and the fifteen percent it couldn't handle went to a small queue that got checked a few times per day.
The team was initially worried about errors. What if the pipeline routes something wrong? We added a simple review step for the first month. Errors were rare enough that the team quickly forgot to check the review queue. That's when I knew the pipeline was ready to run unsupervised.
What You Need to Get Started
You don't need a big team or a massive budget. You need a clear understanding of the process you want to automate, a way to move data between steps, and enough monitoring to know when something goes wrong.
My recommendation: start with a two-step pipeline. One classification step and one action step. Get that working reliably before adding complexity. A pipeline that handles eighty percent of cases perfectly is more valuable than one that attempts a hundred percent but fails on twenty.
The biggest mistake I see is teams trying to automate everything at once. They build elaborate pipelines that try to handle every edge case, burn months on development, and then discover that the edge cases they planned for aren't the ones that actually happen.
Start small. Ship fast. Learn from what breaks. That's how you build automation that actually lasts.
