Build vs. Buy for Agent Harnesses: The Real Question

A thread on the CTO Lunches mailing list a few weeks ago. One of the members opened it:

"Like many of you, I see trends converging and am struggling to navigate the build vs buy decisions and identify where or when it makes sense to build a bespoke approach (with the attendant maintenance and support) and when to bet on third-party / managed 'service'-ish approaches."

The organizer of the group responded:

"Your question is the question everyone is asking. And I'm not seeing folks deciding yet."

Another participant, after sharing his approach in detail, concluded:

"I think you can get a good chunk of the benefit building incrementally and owning your context and it substantially reduces the 'pick the right one' problem which to me feels unsolvable right now."

Fourteen messages. No consensus.

I've been in that conversation myself, not in the thread but in practice, building a product where the build vs. buy question came up repeatedly. Here's what I learned.

The Pain We Actually Had

We were building LLM-powered dialogue systems. Customer-facing, multi-turn, high stakes. The kind where a bad response has real consequences.

The standard approach: write a system prompt, append conversation history, send to the model. Works in demos. In production, with long conversations, the model drifts. Instructions it followed in turn three get ignored by turn fifteen. Rules compete with history for the token budget. Same question, different answers depending on conversation length.

We looked at what existed. RAG was stateless. It knows nothing about where in the conversation you are. Agent frameworks (LangChain, CrewAI) gave us tool orchestration, not dialogue state management. Visual bot builders handled intent matching, not context control.

None of them addressed the specific thing that was broken: at each turn, the model sees too much of the wrong context and not enough of the right one.

We spent time trying to make existing tools work. We added layers. We wrote workarounds. Each workaround introduced new drift. The more we patched, the more fragile it got.

Eventually we stopped patching and started building.

What We Built

ExoChat models conversations as finite state machines. Each state has its own prompt, assembles context from structured facts rather than raw history, and has explicit exit conditions. The model only sees what's relevant at the current step.

Core principle: minimum viable context per state. Not everything the model might need. Only what it needs now.

The full breakdown: Context Engineering with ExoChat: Parsimony in Action.

The Question Before the Question

Build vs. buy is the wrong question to start with.

The right question is: what exactly is broken? Not in general. Specifically. Which turn in the conversation drifts? Which instruction gets ignored? Under what conditions?

Most teams reach for custom solutions before they can answer that question. They build because existing tools feel inadequate, not because they've diagnosed what's inadequate about them. The result is custom code that solves a vague problem. Vague problems don't stay solved.

We made that mistake first. We tried to build a general solution to "LLM dialogues are unreliable." It didn't work. What worked was narrowing down to a specific failure: context accumulation causing instruction drift in long conversations. Once the problem was that specific, the solution became obvious. The question of build vs. buy answered itself. Nothing on the market addressed that exact thing.

The understanding of your specific pain rarely comes from analysis. It comes from running into the same wall enough times to know exactly which part of the wall is solid.

Build or buy. But diagnose first.

The Pain We Actually Had

What We Built

The Question Before the Question

Comments