Specification-Driven Development: The Four Pillars
This article serves two purposes.
First, it is the reference material for all ForEach Partners internship test assignments. If you are applying for an internship at jl.foreachpartners.com, read this before starting your test task. Your submission will be evaluated against these principles.
Second, it is a standalone document. If you are building your own development workflow and want a systematic approach to specifications, traceability, and AI-assisted enforcement, this framework applies regardless of stack, team size, or project type.
What Is Specification-Driven Development
Specification-Driven Development (SDD) is an approach where the specification precedes implementation. You formulate a requirement first. Then you derive a contract from it: an API definition, a schema, an interface. Only then do you write code.
AI tools (Claude, Cursor, Windsurf) accelerate every stage but do not replace any of them. An AI can generate code faster than you type it, but if that code has no traceable requirement, no single source of truth, and no automated validation, you have speed without control. SDD provides the control. If you are wondering where prompt engineering ends and context engineering begins, I covered that distinction in When Prompt Engineering Stops Being Enough. SDD operates at the context engineering level.
The entire methodology rests on four non-negotiable principles. We call them pillars.
In practice, our workflows are more complex than what this article describes, and the details vary from project to project: different tools, different specification formats, different levels of ceremony. But the four pillars remain constant everywhere. What matters is understanding their purpose and being able to apply them in practice, not memorizing the specific examples from this document.
Pillar 1: Traceability
Every behavioral change must trace to a requirement.
Traceability is a bidirectional link between a requirement and its implementation:
- Top-down: requirement, specification, code, test
- Bottom-up: test, code, specification, requirement (no orphans)
How It Works
Every requirement gets a unique identifier. Code, tests, and commits reference that identifier through annotations. If code changes system behavior and contains no reference to a requirement, that is a violation.
A concrete example. Suppose a requirement says: "Access tokens expire after a configurable interval." That requirement gets an ID, say FR-AUTH-002. The API contract references it. The handler implementation references it. The test scenario references it. The commit message references it. If you search the codebase for FR-AUTH-002, you find every artifact that implements or validates that requirement. If you search for code that changes token behavior and find no requirement reference, you have found a traceability gap.
The same works in reverse. If FR-AUTH-002 exists but no code, no test, and no commit references it, the requirement is unimplemented. You know exactly what is missing.
What This Gets You
- Any reviewer sees WHY this code exists, not just what it does
- Tests are tied to requirements, not to implementation details
- Changing a requirement reveals its blast radius: every artifact that references the ID
- No dead code without a reason. No requirements without implementation
Violation Test
If a behavioral change has no reference to a requirement, traceability is broken.
Pillar 2: DRY (Don't Repeat Yourself)
Every fact has exactly one authoritative source. Everything else references it.
DRY is not a new concept. But working with AI tools makes it critical in a way it has never been before.
Why DRY Is Critical with AI
AI tools generate code fast. LLMs lowered the barriers for tools that used to require deep expertise: Git, SQL, infrastructure scripting. More people write more code faster. Duplicated sources of truth diverge faster than anyone notices. AI agents are especially prone to copying context between files "for convenience." Each such copy becomes a future inconsistency.
Consider a scenario. Your API contract lives in an OpenAPI YAML file. An AI agent, trying to be helpful, copies the endpoint descriptions into a README. Now you have two places describing the same endpoints. You update the YAML. The README is now wrong. Three weeks later, a new team member reads the README, builds a client against it, and files a bug because the actual API behaves differently.
This happens constantly. The faster the code generation, the faster the divergence.
How It Works
For every fact, you define a single source of truth:
- API contract lives in the specification (protobuf, OpenAPI, JSON Schema). Not duplicated in documentation
- Configuration semantics described in one place. Not in the README, not in code comments, not in a wiki page
- Business requirement captured once. Code and tests reference it, not quote it
Violation Test
If the same fact exists in two places, one of them is wrong. It is a matter of time.
Pillar 3: Deterministic Enforcement
Every check that can be a script must be a script. AI models fill the gaps where judgment is required.
The Boundary Between Tools and AI
The line is clear. If a check produces the same result every time given the same input, it belongs to a deterministic tool. If it requires interpretation, context, or judgment, it belongs to AI.
Deterministic tools (zero ambiguity, repeatable results):
- Compilation, linting, formatting
- Schema validation (OpenAPI validator, proto lint, JSON Schema)
- Spec-to-code alignment checks
- Test execution
- Audit scripts (annotation coverage, naming conventions)
Hybrid approach (script narrows the scope, AI makes the decision):
Not every check is fully formalizable, and not every check requires full AI interpretation. There is a productive middle ground: a script finds suspicious spots through pattern matching, then AI analyzes the filtered results and makes a judgment call.
Example: grep finds all functions without @req annotations. Most of them are utility functions that do not need annotations. An AI reviews the filtered list and identifies which ones genuinely violate traceability and which are legitimate utilities.
Example: a script detects duplicate constants across files. AI evaluates whether each case is a real DRY violation or an acceptable coincidence.
The hybrid approach applies when a fully deterministic check is impossible but manual review of the entire codebase is impractical. The script reduces the volume for analysis by orders of magnitude. AI provides the precision of the final decision.
AI models (require interpretation, context, judgment):
- Code generation from requirements
- Test scenario selection and edge case identification
- Architecture decisions and trade-off analysis
- Semantic code review (logic errors beyond linter scope)
- Writing specifications from business needs
The Verification Pyramid
Three levels, in order of preference:
- Deterministic tool if the check is fully formalizable
- Script + AI if a script narrows the scope and AI makes the judgment
- Pure AI only if the check requires full context and interpretation
Tool-first, LLM-for-gaps. Building a deterministic tool or even a regex-based filter is often a better investment than repeatedly invoking an AI agent to scan an entire codebase.
Violation Test
If a check is fully formalizable, it must be a deterministic tool. If a deterministic tool can substantially narrow the scope of analysis, it must be created, even if the final decision is left to AI.
Pillar 4: Parsimony
Minimum representation that preserves full semantics and enforceability.
I wrote a dedicated article on this topic: Principle of Parsimony in Context Engineering. Here is the essence as it applies to SDD.
Three Requirements
- Minimality. Exclude everything that does not affect behavior: redundant explanations, repeated instructions, irrelevant fragments, history not needed for the current step
- Sufficiency. Compression must not introduce ambiguity. Context remains complete enough for unambiguous reconstruction of the goal, constraints, and result format
- Budget prioritization. Tokens saved on operational description are reallocated to substantive artifacts (code, specifications, examples, data) which have the strongest empirical impact on response quality
Parsimony Is Not Unreadability
Text written with parsimony (imperative, no filler, compressed) reads easily by a competent specialist who knows the domain terminology and context. Parsimony removes noise, not meaning.
Compare:
When you are writing configuration files, it is important to remember that you should always validate the required fields at startup, because this helps catch errors early in the development process.
versus:
Configuration must validate required fields at startup (fail-fast).
Both say the same thing. The second version is 85% shorter. A developer reading it knows exactly what to do. No meaning is lost.
Operational Practices
- Directive vocabulary: MUST / SHOULD / MAY / DO NOT. Not "you might want to consider"
- Commit messages:
feat(auth): implement login [FR-AUTH-001]. Not a paragraph of prose - Specifications: compressed YAML with requirements and prohibitions. Not a narrative
Violation Test
A context is parsimonious when no essential part can be removed without introducing ambiguity, and the total volume contains no redundant or irrelevant elements.
How the Pillars Work Together
Every project decision derives from the four pillars:
| Decision | Derives from |
|---|---|
| Specification before code (API-first) | DRY (spec is the source of truth) + Deterministic Enforcement (compiler/validator enforces it) |
Annotations in code (@req) |
Traceability (code-to-requirement link) |
| Automated checks in CI | Deterministic Enforcement (everything verifiable is automated) |
| Compressed rules for AI agents | Parsimony (minimum tokens, maximum signal) |
| Single contract format | DRY (no markdown copies of APIs) + Traceability (contract traces to requirement) |
| Tests linked to requirements | Traceability + Deterministic Enforcement |
The pillars reinforce each other. Traceability without deterministic enforcement is manual bookkeeping that will drift. DRY without parsimony produces a single source of truth that is bloated and ignored. Parsimony without traceability produces compact artifacts that no one can connect to business needs. Deterministic tools without DRY validate against duplicated specs that may already be inconsistent.
All four together create a system where specifications are living artifacts, not documentation theater.
Applying the Pillars in Practice
If you are working on a project or a test assignment, here is what applying each pillar looks like concretely.
Traceability. Your code and tests must be traceable to the task requirements. Use annotations, structured commit messages, and clear references. If someone reads your commit history, they should be able to reconstruct which requirement each change addresses.
DRY. Every fact is described once. Configuration values, type definitions, constants, API contracts: each has a single source. If you define a type in a spec file, your code derives from it or references it. It does not redefine it.
Deterministic Enforcement. Validation is automated. Linting, tests, schema validation, CI checks: if a machine can verify it, a machine should verify it. Where full automation is not possible, write a script that narrows the scope for AI review.
Parsimony. Your code, documentation, and prompts are concise. No redundant comments restating what the code already says. No empty abstractions. No premature generalization. Every line earns its place.
For LLM Users: How to Work with This Material
This article is written in Markdown and available as a raw Markdown version for LLM consumption: https://blog.rezvov.com/specification-driven-development-four-pillars.md. That format is more convenient for feeding directly into Claude, Cursor, or any other AI tool.
But the point is not to feed it to Claude or Cursor so it "just does what's written here." The point is to use it as a basis for a conversation with your AI tool:
- Walk through the four pillars with the AI. Ask it to explain each one in the context of your specific task
- Challenge the examples. Ask "what would a traceability violation look like in my project?"
- Work through scenarios: "If I change this requirement, what else needs to change?"
- Ask the AI to review your code against these principles and explain what it finds
The value is not in the AI blindly applying rules. It is in you developing an understanding of WHY each pillar exists and HOW to apply it, using the AI as a thinking partner.
For ForEach Partners Internship Candidates
If you are completing a test assignment for the ForEach Partners Junior Lab internship program, here is what we evaluate:
Not just the code. We look at the result (code, tests, specs), but also at the process. Your prompts during development, your iteration with AI tools, your decisions about what to automate and what to review manually: these are part of the evaluation.
Show your work. A screencast of your development session is the strongest signal. A chat log export is acceptable. We want to see how you decomposed the task, how you guided the AI, how you caught and corrected its mistakes.
SDD as a conversation topic. During the interview, we will discuss how you applied SDD principles in your test assignment. Not whether you memorized the definitions, but whether you understood the reasoning and made deliberate choices.
The specific requirements for each role are described in the corresponding vacancy posting.
