Your agent works like a senior developer.
Zero-ceiling architecture, uncapped reasoning.
Exhaustive elicitation, lossless intent.
Bounded autonomy, zero-drift execution.
Install — Claude Code
/plugin marketplace add fn-opt/dryforge
/plugin install dryforge
Codex
codex plugin marketplace add fn-opt/dryforge
codex plugin add dryforge@dryforge
Requires git and Claude Code or Codex.
Update — Codex picks up new releases automatically at startup. Claude Code: enable
auto-update for the dryforge marketplace in /plugin → Marketplaces (off by default for
community marketplaces), or update manually:
# Claude Code
/plugin marketplace update dryforge
/plugin update dryforge@dryforge
# Codex (immediately, without restarting)
codex plugin marketplace upgrade dryforge
"Build a booking system" reads like a requirement. It's a goal statement — the requirement-level decisions are all still open: booking-to-service cardinality, the lifecycle of bookings whose service is retired, whether cancellation releases the slot or holds it.
An agent does not stop at those gaps. It resolves each one inline with a plausible default and ships code that compiles, runs, and demos cleanly. One booking = one service is now load-bearing schema with no decision record behind it; the cost surfaces months later as a migration, the day packages become a feature.
Prompts carry intent. Implementations are fixed by decisions. An unsupervised agent makes every decision you never saw — and reports none of them.
The second failure compounds the first: nothing persists. Whatever was decided lives in the session transcript, and the transcript dies with the session. The next session re-derives project state from code alone — and code encodes outcomes, not rationale. The result is structural drift: settled questions re-litigated, invariants re-implemented incompatibly, conventions diverging module by module.
dryforge intervenes at both points: the open decisions are enumerated before code exists, and every resolution is recorded with its rationale at the path every future session reads first.
/ready <INPUT> ──▶ /go ──▶ working software + the project harness
# <INPUT> — an idea, a spec, scattered notes: anything, or nothing
# short for /dryforge:ready · /dryforge:go · /dryforge:migration
Already have running code?
/migration grafts the harness onto it (one-time; afterwards /ready → /go)
| Command | Consumes | Produces |
|---|---|---|
/dryforge:ready |
anything — one line to full documents | a reviewed, executable design contract |
/dryforge:go |
the approved contract | verified code + the project harness |
/dryforge:migration |
an existing codebase | the project harness (one-time) |
/ready accepts arbitrary input — a one-line idea, a requirements
document generated by another tool, scattered notes, or nothing but a
hunch. All of it enters with the same status: material under
challenge, not ground truth. A document's existence is no evidence of
the design conversation behind it. Conflicts inside the input become
questions, not silent picks; embedded code fragments are reduced to the
behavioral contract they encode — inputs, outputs, invariants — rather
than carried forward as implementation.
From that material, /ready enumerates the decisions the design is
obligated to answer — stated or not — and drives each to one of two
terminal states: derived from your expressed intent, or asked.
Silent defaulting is not a terminal state.
Question volume is bounded by construction:
- Derivable from previous answers → never asked.
- A tuning value inside a confirmed mechanism → defaulted, recorded as tunable.
- Every question leads with a recommendation — accepting it is one keypress.
- Domain questions carry an open "none of these" — your domain knowledge outranks the option list.
On a project's first cycle, /ready additionally runs the conversation
that precedes any serious build: what the project is and is not, the
domain model — entities, lifecycles, rules — and the stack, recommended
with explicit trade-offs when you bring none. No technical direction is
fixed by silence.
Before the result reaches you, it passes independent review by an agent that did not author it — checking that nothing you said was dropped or distorted, and that the artifact is executable as written. Your approval is the only event that makes it final.
/go consumes the approved contract and owns all git state from that
point.
Scheduling. The plan carries an explicit dependency graph. /go
validates it — cycles, dangling references, coverage gaps — before any
git mutation; a malformed graph fails fast as a producer defect, not
something to patch at execution time. Independent tasks execute
concurrently, each implementer in an isolated git worktree, and re-enter
through a merge gate that verifies the branch actually advanced and the
diff touches its declared surface. Implementer self-reports carry no
weight.
Verification is risk-proportional. A mechanical rename and a payment-path change do not get the same ceremony. High-risk tasks receive independent review against the spec slice and the raw diff — never the implementer's summary, which is how reviewers get anchored.
Gates, end to end.
- Each parallel wave ends with the project's verification suite running against the merged base — the first point at which cross-task interactions exist.
- Completion re-runs full verification and adds runtime smoke: a spec-declared service must boot and answer a live request. Compiles and works are different claims.
- A verification that cannot be evaluated — the command died before asserting anything — is a failure, not an inferred pass.
Escalation is synchronous. A blocked task halts and waits for your answer. It does not assume one and build on top of it.
Git stays yours. Existing projects execute on a feature branch; main is never written directly, and final integration — merge, PR, manual — is never autonomous. A dirty working tree or unpushed commits abort the run before it starts.
When everything passes, /go writes or updates the project harness,
runs one final independent review across the full change, and archives
the design contract.
A one-time conversion, not a task runner. /migration scans the
codebase, then elicits precisely what code cannot attest: a code path
shows what an auth check does, not whether that is the entire
policy. The elicitation is risk-weighted — what is inferable and cheap
to get wrong is inferred; what is inferable but expensive to get wrong
is confirmed with you: domain invariants, security boundaries, the
business model behind the checks.
Questions arrive in plain language — "if this changes, must that change
with it?" — and answers are compiled back into precise rules. It
generates the full harness, leaves the commit to you, and exits. From
then on, the project runs on /ready → /go.
/ready → /go is one pipeline with exactly two approval points — both
yours. Everything between them runs autonomously.
/ready decompose input → resolve every open decision → write the contract
→ independent review → ▶ your approval
/go validate the graph → execute in parallel waves → integration gates
→ runtime smoke → write/update the harness → final review
→ ▶ your approval → archive the contract
What /ready leaves on disk is a three-document design contract in
.dryforge/:
- spec — the authority on what: behavior rules, invariants, API surface, every edge case with an explicit disposition, and the verifications the result must pass.
- plan — the blueprint for how: per-task behavior contracts and
the dependency graph
/goschedules from. - handoff — the governing document: how the three relate, and the hard gates no step may cross.
The contract is self-contained by construction — written so a future agent can act on it without the conversation that produced it. Decisions that cannot be re-derived from code carry their reasoning inline. The authority hierarchy is explicit: when spec and code disagree, spec wins. When the spec itself looks wrong, the agent does not patch it — it comes back to you.
After /go completes, the contract is archived under .dryforge/,
cycle by cycle — a durable record of what was decided, when, and why.
your-project/
├── CLAUDE.md # entry point for Claude Code — identity + work rules
├── AGENTS.md # entry point for Codex — identical content
├── docs/
│ ├── architecture.md # composition: components, flow, dependencies
│ ├── business-rules.md # domain logic: entities, invariants, edge cases
│ ├── security.md # policy: protected assets, access, audit
│ ├── standards.md # the rules: hard gates, conventions, boundaries
│ ├── engineering-notes.md # hard-won knowledge: traps, mechanisms, checklists
│ ├── operations.md # how to run it: setup, build, deploy
│ ├── contracts.md # external interface contracts
│ └── tracking/
│ ├── status.md # where the project stands vs. its full scope
│ ├── decisions/ # decision records — what was chosen, and why
│ └── findings.md # known unresolved problems
└── <module>/AGENTS.md # per-module scope, boundaries, invariants
- Session-independent context. A new conversation reads the harness and resumes with the architecture, the rules, and their reasons in scope. Re-explaining is not part of the workflow.
- Rationale is first-class. What was chosen and why — so intent is not quietly reversed, and settled debates stay settled.
- Maintained, not appended. Updates reconcile in both directions: work that invalidates an existing statement corrects it. The harness tracks reality or it gets fixed.
- Zero lock-in. Standard
CLAUDE.md/AGENTS.md. The generated docs contain no dryforge vocabulary and no proprietary format — any agent reads them. Delete dryforge tomorrow; the asset stays.
Plenty of tools touch planning or orchestration. The difference is in what each one trusts.
- vs. a bare agent — a strong model with no anchor re-derives everything each session, and the stronger the model, the wider it roams. dryforge supplies the same decisions and the same rationale, every session.
- vs. spec generators & plan modes — they organize what you said. dryforge enumerates what you didn't say — and challenges incoming documents instead of formatting them.
- vs. prescriptive workflows — "if X, do Y" rulebooks cost the same ceremony forever and cap quality at their author's foresight. dryforge pins contracts and floors, and leaves the ceiling open — model upgrades translate directly into output upgrades.
- vs. parallel orchestrators — parallelism without grounded intent ships the wrong thing faster. dryforge parallelizes only downstream of an approved spec, and merges only what survives the gates.
- Floor, not ceiling. dryforge fixes interface contracts and the procedural skeleton; conclusions stay with the model. Prescriptive rules cap a tool at their author's foresight — a floor means a better model yields a better result, automatically.
- Bounded autonomy. You approve the intent; inside that boundary the agent decides freely. Autonomy means executing approved intent — never setting intent on its own.
- Ask, don't guess. Anything not derivable from your intent comes back as a question — synchronously, before proceeding. One guess costs more than one pause.
- Explicit invocation only. The three commands never auto-trigger. Nothing runs unless you call it.
- One source, two platforms. Claude Code and Codex artifacts build from a single platform-neutral source; releases are verified for parity between the two.
- Requirements.
git, and Claude Code or Codex.
- A one-line fix doesn't need a design conversation. dryforge front-loads cost — elicitation, verification — to eliminate rework. That trade pays on features and projects, not on typo fixes.
- git is required. The execution discipline is built on branches and worktrees.
- The time spent answering questions is recovered in execution — and again in every later cycle that reads the harness instead of asking you.
MIT