AI coding agents ignore the spec because to the agent it is just text passed in alongside everything else, with nothing holding the build to it. The spec says what is wanted, not what already exists, and nothing checks the result against it. So the agent follows the path of least resistance, and the spec quietly becomes a suggestion.
An AI coding agent will read a spec, appear to agree with it, and then build something else. This is not a malfunction. The spec arrives as a block of text, one input sitting next to the prompt, the open files, the error in the terminal, and whatever the agent can infer from the immediate task. Nothing in the process holds the finished code against what the spec asked for. The moment the easiest route diverges from the written one, the agent takes the easier route, and no one is told it happened.
Does the agent actually read the spec?
Usually, yes. The problem is what reading means to an agent. A spec is treated as context, not as a constraint. It informs the next token the way a related file or a comment does, and it competes for attention with everything else in the window. The agent is optimising for a plausible completion of the task in front of it, not for fidelity to a document it skimmed several steps ago.
So the spec influences the build without governing it. On a small, self-contained task the influence is enough and the output looks faithful. On anything longer, the immediate signals win. The agent follows the structure of the code it is editing right now, the pattern it just used two files ago, the shortest path to something that runs. The spec fades into the background exactly as the work gets complicated enough to need it.
Why doesn't following the spec stick?
Because nothing makes it stick. A spec written as a static document has no hold over what gets built from it. There is no point where the output is measured against the intent and the difference is flagged. The spec is read once, at the start, and then the build runs free of it.
Three things pull the agent off course, and a fixed document answers none of them:
- The spec describes the goal, not the ground. It says what to build and nothing about what already sits in the codebase, so the agent cannot tell that half of what it is writing already exists.
- The work moves and the document does not. A decision made mid-build, a constraint discovered in the data, a fix applied two prompts ago: none of it travels back into the spec, so the agent and the document are working from different versions of reality within the hour.
- Nothing checks the result. Done is whatever the agent produced. There is no gate that holds the code up against the spec and says this part does not match.
A faster, more capable agent makes each of these worse, not better, because it covers more ground between the points where a human might have noticed the drift.
Don't AGENTS.md and cursor rules fix this?
They help, and they are worth using, but they fix a different problem. An AGENTS.md file, a CLAUDE.md, a set of cursor rules: these carry standing instructions, the conventions and preferences the agent should apply every time. They make the agent behave more consistently. They do not connect the spec to the build.
They are still static text, read in one direction, with nothing checking the output against them. They tell the agent how to write code in general. They cannot tell it that the function it is about to create already exists three folders away, or that the design changed in a call yesterday, or whether what it just shipped actually matches what was asked. The instruction file and the spec share the same flaw: both describe, neither governs, and neither knows what the build currently is.
What does it cost when the agent ignores the spec?
It rarely shows up as an obvious failure. The code runs. The cost is quieter and it compounds. The same capability gets built twice under two names, so the codebase grows faster than the product does. The plan on file and the thing being built drift apart, so anyone reading the spec to understand the system is reading fiction. And because nothing flagged the divergence, it is found late, usually by the next person in, who cannot tell which parts of the spec are still true and which the agent overrode weeks ago.
This is the wall the builders we work with keep hitting. The strongest of them write genuinely good specs and still watch the agent wander off them, because a document, however good, was never built to hold a moving build to account. The spec is right and it is inert, and inert is the one thing it cannot afford to be once an agent is doing the work.
| Once the agent is building | A spec as a static document | A Canery spec |
|---|---|---|
| Knows what already exists in the codebase? | No | Yes |
| Stays current as decisions move? | It falls behind | Yes |
| Checks the build against the intent? | Nothing does | Yes |
| Tells you when the agent went off-spec? | You find out later | It surfaces |
A spec only stops being ignored when it is connected to the work the agent is doing. Canery is built for that. It holds the spec against the codebase it describes, the work as it changes, and the team building from it, so the same thing is not built twice, the plan and the product stay in step, and a divergence is something that surfaces rather than something discovered weeks later. The spec stops being a description the agent can wander away from, and becomes the thing the build is actually held to.
If this is how you want to work
If you want a spec the agent is actually held to, one that knows your codebase, stays current as the build moves, and surfaces the moment the work drifts from the plan, Canery is being built for exactly that. Join the waitlist at canery.ai and be among the first to use it.
