AI Coding

My Agent Can Say No: Spec-Driven Agent Code Review with OpenSpec

How OpenSpec CLI enables spec-driven agent code review: your agent can decline suggestions that conflict with documented architecture conventions.

Arnaud Pouppeville

25 Apr 2026 • 6 min read

CodeRabbit left a comment on my PR. My agent read it and declined.

Not because I told it to. The spec had our deployment convention in it. That's what CodeRabbit was missing.

The spec is what gives the agent standing to say no.

The Upstream Prompt Problem

Before I wired OpenSpec into my workflow, every Claude Code session started the same way: I'd describe the context from scratch. Here's what we're building, here's the tech stack, here's what we decided last week. The specs either lived in Azure DevOps, formatted for humans and invisible to agents, or I was writing them manually in Markdown, one-off, no standard structure. Every feature had a different format.

The problem wasn't hallucination. It was underspecification: I was feeding the agent inconsistent input and expecting consistent output.

The model was the same each time. The framing was different, and so were the results. That wasn't the model being unreliable. That was me.

The tool I landed on is OpenSpec CLI. Install it with npm install -g @fission-ai/openspec@latest, wire it into Claude Code with openspec init.

It gives Claude Code three slash commands: /opsx:propose, /opsx:apply, and /opsx:archive. Every feature starts with /opsx:propose: the agent writes a structured change proposal before any code exists. That proposal lands in openspec/changes/[feature-name]/ as three files:

openspec/changes/add-coordinator-api/
├── proposal.md     # Why: problem, scope, constraints
├── tasks.md        # How: implementation steps, parallelization
└── spec.md         # What: exact parameters, boundaries, edge cases

The output is now a spec, not a chat message. It's always the same shape. I review it, the agent works from it, and we both agree on the contract before the first line of code gets written.

The Spec Says No

Here's the CodeRabbit comment that made this concrete.

On March 10, I was reviewing a PR for a background job service. CodeRabbit flagged this in appsettings.json:

{
  "Worker": {
    "CoordinatorBaseUrl": "http://localhost:5001"
  }
}

CodeRabbit — ⚠️ Major: Production override for Worker:CoordinatorBaseUrl is missing. Base appsettings.json hardcodes localhost:5001. No appsettings.Production.json found.

For most codebases, this is a legitimate catch. A hardcoded localhost URL with no visible production override is a real deployment risk. CodeRabbit was doing what it's supposed to do.

But the spec for this service encodes the deployment convention: production uses Worker--CoordinatorBaseUrl stored in Azure Key Vault. The Key Vault configuration provider loads the secret at runtime and maps the double-dash separator to the colon hierarchy (Worker:CoordinatorBaseUrl), overriding appsettings.json. We don't commit an appsettings.Production.json because it'd be redundant, and it'd commit the production URL to source control, which we explicitly decided against.

CodeRabbit can't know this from a diff. It saw a localhost URL and no production override file, and it applied the general rule correctly. The spec encoded our specific convention.

The fix-pr loop is a Claude Code skill that instructs the agent to work through PR comments one by one, addressing each, but declining any suggestion that conflicts with an ADR, a spec, or official documentation. When the agent ran it here, it checked the review suggestion against spec.md, found the deployment pattern documented under configuration management, and declined:

Reviewing CodeRabbit suggestion: add appsettings.Production.json to override
Worker:CoordinatorBaseUrl.

Per spec.md § Configuration Management: production configuration uses Azure Key Vault
(Worker--CoordinatorBaseUrl). The Key Vault provider maps this to Worker:CoordinatorBaseUrl
at runtime, overriding appsettings.json. This is the established deployment pattern for
this service; committing appsettings.Production.json would be redundant and would expose
the production URL in source control.

Decision: declining change. No code modification needed. Suggestion noted as by-design.

CodeRabbit was right in the general case. It was wrong for this codebase. The spec was how I could show that.

Without that spec, the default agent behavior is to comply. "CodeRabbit flagged it, so I should fix it." The reviewer's authority fills the vacuum.

With the spec in scope, every review suggestion gets weighed against the spec first. The spec isn't treated as infallible: if you want to override it, you have to update it first. That means every disagreement gets recorded as a spec revision instead of evaporating into a comment thread. The spec's authority comes from that paper trail.

The Spec Belongs in the Repo

The spec lives at openspec/ in the project root. It's checked in. It has Git history. It goes through the same PR lifecycle as the code it governs.

That sounds obvious until you realize where most teams keep their specs: Confluence, a wiki, a shared doc, a chat thread. All of those have the same problem: agents can't read them consistently, and they don't stay in sync with the code. Move the spec to Confluence and it'll eventually drift.

The openspec/archive/ folder is the answer to "why was this built this way?" six months later. The decision history sits right next to the code it shaped.

If the spec isn't in the repo, the agent simply doesn't have it.

One Worktree, One Spec

One spec per unit of work means there's never a question about which spec governs the review. I run each feature in its own Git worktree, and the branch slug matches the OpenSpec change folder name directly:

Branch:      feature/add-coordinator-api
Spec folder: openspec/changes/add-coordinator-api/

Because the names match, there's no ambiguity about which spec applies to which branch: one feature, one spec, one PR.

The proposal gate is what makes this tight. The spec gets reviewed and approved before the worktree is created. That means the spec review is the design review, not a separate ceremony, not a comment thread in Confluence.

If there's an architectural question about the feature, we answer it in proposal.md before the branch exists. Work can't start until intent is agreed on.

When I run /opsx:apply, I don't apply the full task list from tasks.md in one shot. I select a subset of tasks and switch Claude Code to Plan mode. The agent proposes exactly what it'll do for that task group; I check it before execution starts.

The problem is that a task in tasks.md is only as useful as how the agent reads it, and that interpretation isn't always what you intended. Plan mode intercepts it before it becomes code. The discipline pays back: I review a plan for two minutes instead of debugging an incorrect implementation for twenty.

Spec Drift

OpenSpec's own documentation acknowledges it: specs don't self-update when Claude Code changes approach mid-task.

This is more dangerous than hallucination and harder to catch. The code compiles, the tests pass, and CI is green.

But the spec no longer describes what was built. It's architectural drift with a green badge on it.

The /opsx:archive step forces reconciliation: it audits what was built against what the spec said, merges decisions into openspec/specs/, and closes the change. But archive runs at the end of the task. If the drift happened halfway through, it's already in the branch by the time the archive step runs.

My mitigation is task-group checkpoints. Before marking a task group complete in tasks.md, Claude Code does a spec delta review: does the current state of the code still match spec.md? If it doesn't, update the spec or flag the divergence explicitly.

Move that checkpoint earlier. Don't wait for the PR to surface it.

This isn't a perfect answer. It's a discipline, and it only works if something in the process is actually checking for it. But naming the failure mode matters.

Spec drift is real, and "CI is green" doesn't mean the spec and the code agree. If you're building with specs and not watching for this, you'll eventually merge a feature that's correct by its own internal logic and wrong relative to everything around it.

The Spec Has Standing

Spec drift is the ongoing cost of this governance model. The spec earns its authority by being maintained. A stale spec is just documentation: it records what was supposed to happen but governs nothing.

The spec doesn't describe decisions already made. It's what future decisions get checked against.

When the spec is in the repo, the agent gains real standing to say no. It isn't stubborn. It has something to cite.

When the agent pushes back, the question shifts from whether the agent is right to whether the spec is right. That's a different discussion: now you're looking at the contract, not the agent's judgment.

In the PR, authority shifts. It no longer belongs to whoever left the most confident-sounding comment; it belongs to the spec that was reviewed before the branch existed.

If you're building with Claude Code or any agentic coding tool and you're not running a spec-first workflow, that standing is missing. The agent will comply with whatever the last reviewer said, because it has nothing else to cite.

The previous articles in this series cover Git Worktrees and ADRs for governing parallel agents and the blast radius problem in agentic coding.