Context Engineering with ExoChat: Parsimony in Action

I wrote about the principle of parsimony in context engineering a week ago. That article defined parsimony for developer workflows: specs, cursor rules, agent instructions. The core idea: a context is parsimonious when nothing in it can be removed without introducing ambiguity or degrading the result.

That definition holds. But it was scoped to one audience: developers building with AI tools.

The same problem (what enters the context window and at what detail level) exists in every LLM-based product that talks to users. Customer support bots. Financial advisors. Mental health assistants. Pre-sales qualification flows. Any system where an LLM conducts a multi-turn conversation with a person.

Context engineering in dialogue systems is the same discipline as in development: you manage what the model sees at each step. The principle of parsimony tells you how: only what's needed, nothing else. The missing piece is a tool that lets you practice context engineering at scale, without writing code for every change.

ExoChat is that tool.

The Problem: Context Bloat in Dialogue Products

The typical approach to building an LLM dialogue product looks like this:

Write a system prompt. Put everything in it: persona, rules, examples, edge cases, compliance disclosures, escalation instructions.
Append conversation history.
Send to the model. Hope it follows the rules.

This works for demos. It fails in production.

As the conversation grows, instructions compete with history for the token budget. The model's attention is finite. Instructions that were at the top of the prompt get pushed further from the model's effective focus. The result:

Rule drift. The model starts ignoring instructions it followed perfectly three turns ago.
Hallucinations. Lost context triggers confabulation to fill the gaps.
Inconsistent behavior. Same question, different answers depending on conversation length.
Escalating costs. Longer contexts burn more tokens, and the output quality doesn't improve.

The longer the conversation, the worse it gets. This isn't a model quality issue. It's a failure to engineer the context: to control what the model sees at each turn of the dialogue.

How ExoChat Does Context Engineering

ExoChat is a context engineering tool for dialogue systems. Its core design principle follows parsimony directly: at each state of the conversation, assemble only the context relevant to that state.

Not a monolithic system prompt. Not "everything we might need." The minimum viable context for the current conversational step. Assembled automatically, controlled visually.

Six mechanisms make this work:

State Graph (FSM)

The conversation is modeled as a finite state machine. Each state has an explicit goal, its own rules, and defined transitions to other states. The state graph is the primary context engineering structure. It determines what the model needs to know right now, not what it might need later.

An ExoChat FSM for a financial advisor might have states like: greeting → risk_profiling → product_recommendation → disclosure → confirmation. At the risk_profiling state, the model doesn't need disclosure text. At disclosure, it doesn't need the profiling questionnaire. Each state scopes its own context.

Managed Prompts per State

Each state assembles its prompt independently:

State-specific instructions (what to do, what to avoid)
Relevant facts collected so far (not raw conversation history)
Minimal history (summarized or filtered, not the full transcript)

The prompt controller builds context from these components. The model never sees the full system prompt. Only what applies to the current state.

Fact Storage

When the user provides validated information (name, consent, risk tolerance, symptoms), ExoChat extracts and stores these as structured facts. Facts are not kept in raw conversation history where they'd consume tokens every turn. They're stored separately and injected only when a state needs them.

This is parsimony at the data level: the model gets risk_tolerance: moderate instead of re-reading the five-turn exchange where the user explained their preferences.

Model Routing

Not every step needs the same model. Classification ("did the user agree?") is a cheap operation; a small, fast model handles it. Complex generation ("explain this investment product in the user's terms") needs a capable model with richer context.

ExoChat routes requests by task type: classification, generation, verification, summarization. Each route gets a context budget appropriate to the task. A yes/no classifier doesn't need the full conversation history.

Context Assembly from Policies

Compliance rules, disclosure requirements, escalation triggers. These are domain policies. ExoChat injects them only in states where they apply. A disclosure policy enters the context at the disclosure state, not at greeting. An escalation trigger for high-risk topics is active in relevant states, not everywhere.

Transition Validators

Before the conversation moves from one state to another, validators check exit conditions. Did the user provide the required information? Did they confirm consent? Validators prevent premature transitions that would lose context or skip required steps.

Comparison: Context Engineering Approaches for Dialogue Systems

Every LLM dialogue system deals with context engineering, consciously or not. The approaches differ in how much control they offer over what enters the context, who controls it, and how well they follow the principle of parsimony.

Approach	Context Engineering Method	Who Controls	Parsimony	Iteration Speed
Raw Prompt Engineering	Monolithic system prompt	Developer	Low: everything upfront	Slow: code deploy per change
RAG	Chunk retrieval by similarity	Developer + embeddings	Medium: relevant chunks, but no conversation state awareness	Medium: update knowledge base
Agent Frameworks (LangChain, CrewAI)	Code-defined tool chains	Developer	Medium: tools scope context, but flow is code	Slow: code changes for flow
Visual Bot Builders (Voiceflow, Botpress)	Decision trees, intents	Designer	Medium: structured, but intent-based	Fast: visual editor
ExoChat (FSM + M2P)	State graph with per-state context assembly	Operator (no-code)	High: minimum viable context per state	Fast: visual editor, no deploy

The key differences:

RAG answers "what knowledge to include" but not "what instructions and rules apply right now." Retrieval is stateless. It doesn't know where in the conversation you are. Context engineering without conversation state is incomplete.

Agent frameworks are developer tools. Every flow change (a new state, a different transition condition, a modified prompt) requires code. Context engineering is possible, but gated by developer availability. Parsimony degrades because nobody tunes it.

Visual bot builders (Voiceflow, Botpress) share the no-code philosophy. Operators can edit flows visually. They also allow different prompts per node. But the context engineering model is different in three ways:

Routing: bot builders route by user intent (what did the user say?). ExoChat routes by state graph with validated transitions (what should the system do next?). Intent-based routing is reactive: it responds to user input. State-driven routing is proactive: the system leads the conversation toward a goal.
Context assembly: bot builders typically pass the full conversation history plus the node's prompt to the model. ExoChat assembles context per state from structured facts, domain policies, and filtered history, not the raw transcript. This is where parsimony is enforced: each state gets only what it needs.
Data model: bot builders work with conversation memory (full or summarized). ExoChat extracts validated facts (risk_tolerance: moderate, consent: true) and injects them selectively. Facts don't consume tokens sitting in raw history. They're available when a state requests them.

ExoChat is a context engineering tool that combines these three mechanisms with no-code editing and M2P conversation control. State-driven routing, per-state context assembly, structured fact storage. The operator manages what context enters each state. Parsimony is the default, not an afterthought.

Operator-First: Context Engineering Without Developers

Context engineering following the principle of parsimony requires constant tuning. A context that's parsimonious today becomes bloated tomorrow when you add a new product line, a compliance rule, or an edge case handler.

If every tuning cycle requires a developer to change code, review, test, and deploy, context quality degrades. The cost of maintaining parsimony exceeds the perceived benefit. The system prompt grows. Context bloat returns.

ExoChat is designed so that the operator (product manager, analyst, domain expert) does context engineering directly:

Visual state graph editor. See the full conversation flow, edit per-state prompts, adjust transition conditions.
Version control. Ship scenario versions with A/B testing and feature flags.
ExoChat Quality Lab. Run synthetic users through the scenario, evaluate quality per persona, catch regressions before production.
No deploy cycle. Changes go live when the operator publishes the version.

The operator does context engineering directly. They add context to a state where the model hallucinates. They remove context from a state where it's not needed. They test with the Quality Lab. They ship. No developer in the loop. Parsimony is maintained because the person closest to the domain manages the context.

ExoChat Quality Lab deserves its own article, and it will get one. In short: LLM dialogues operate in fuzzy logic territory where you can't write a unit test that says "response must equal X." Quality Lab solves this by simulating dozens of conversations in parallel — each with a different user persona and task, then automatically scoring every dialogue across a set of criteria. The result is a quality map of your ExoChat scenario: which personas get great service, which hit dead ends, and where context engineering needs attention. More on this soon.

This is the difference between context engineering as an abstract discipline and context engineering as daily practice.

Context Engineering Across the Stack

The prompt engineering vs context engineering article described three levels of AI system complexity. Context engineering with parsimony applies at every level, but the tools and the people change.

Layer	Context Engineering Tool	Who Does It
Developer workflow	CLAUDE.md, cursor rules, specs	Developer
Agent orchestration	Inter-agent context references, tool-first approach	Developer / Architect
User-facing dialogue	ExoChat FSM, per-state context assembly	Operator

The principle of parsimony is the same everywhere: minimum viable context per task. The difference is who practices context engineering and with what tools.

At the developer layer, context engineering means writing compressed specs and directive rules. At the agent layer, it means passing references instead of full context between agents. At the dialogue layer, it means assembling per-state prompts from facts, policies, and scoped instructions. ExoChat is the tool that makes this possible without code.

From Principle to Product

Context engineering is the discipline. Parsimony is the principle that guides it: remove everything from context that doesn't contribute to the result. This applies to developer workflows, agent orchestration, and user-facing dialogue systems alike.

ExoChat is what happens when you build a context engineering tool around the principle of parsimony. State graphs scope context per conversation step. Managed prompts assemble minimum viable context. Fact storage replaces raw history. Model routing matches context budgets to task complexity. The operator, not the developer, practices context engineering daily. The system stays parsimonious because the person tuning it understands the domain.

The context window is finite. What you put in it determines what comes out. ExoChat is the tool that manages what gets in, following the principle of parsimony, in the hands of the people who know the domain best.