Skip to content

fn-opt/dryforge

Repository files navigation

dryforge

dryforge

A plugin harness for Claude Code & Codex.

Your agent works like a senior developer.

Zero-ceiling architecture, uncapped reasoning.
Exhaustive elicitation, lossless intent.
Bounded autonomy, zero-drift execution.

Website

Install — Claude Code

/plugin marketplace add fn-opt/dryforge
/plugin install dryforge

Codex

codex plugin marketplace add fn-opt/dryforge
codex plugin add dryforge@dryforge

Requires git and Claude Code or Codex.

Update — Codex picks up new releases automatically at startup. Claude Code: enable auto-update for the dryforge marketplace in /plugin → Marketplaces (off by default for community marketplaces), or update manually:

# Claude Code
/plugin marketplace update dryforge
/plugin update dryforge@dryforge

# Codex (immediately, without restarting)
codex plugin marketplace upgrade dryforge

Every prompt is underspecified

"Build a booking system" reads like a requirement. It's a goal statement — the requirement-level decisions are all still open: booking-to-service cardinality, the lifecycle of bookings whose service is retired, whether cancellation releases the slot or holds it.

An agent does not stop at those gaps. It resolves each one inline with a plausible default and ships code that compiles, runs, and demos cleanly. One booking = one service is now load-bearing schema with no decision record behind it; the cost surfaces months later as a migration, the day packages become a feature.

Prompts carry intent. Implementations are fixed by decisions. An unsupervised agent makes every decision you never saw — and reports none of them.

The second failure compounds the first: nothing persists. Whatever was decided lives in the session transcript, and the transcript dies with the session. The next session re-derives project state from code alone — and code encodes outcomes, not rationale. The result is structural drift: settled questions re-litigated, invariants re-implemented incompatibly, conventions diverging module by module.

dryforge intervenes at both points: the open decisions are enumerated before code exists, and every resolution is recorded with its rationale at the path every future session reads first.


Three commands

  /ready <INPUT>  ──▶  /go      ──▶  working software + the project harness

  # <INPUT> — an idea, a spec, scattered notes: anything, or nothing
  # short for /dryforge:ready · /dryforge:go · /dryforge:migration


  Already have running code?
  /migration  grafts the harness onto it   (one-time; afterwards /ready → /go)
Command Consumes Produces
/dryforge:ready anything — one line to full documents a reviewed, executable design contract
/dryforge:go the approved contract verified code + the project harness
/dryforge:migration an existing codebase the project harness (one-time)

/ready — from intent to executable spec

/ready accepts arbitrary input — a one-line idea, a requirements document generated by another tool, scattered notes, or nothing but a hunch. All of it enters with the same status: material under challenge, not ground truth. A document's existence is no evidence of the design conversation behind it. Conflicts inside the input become questions, not silent picks; embedded code fragments are reduced to the behavioral contract they encode — inputs, outputs, invariants — rather than carried forward as implementation.

From that material, /ready enumerates the decisions the design is obligated to answer — stated or not — and drives each to one of two terminal states: derived from your expressed intent, or asked. Silent defaulting is not a terminal state.

Question volume is bounded by construction:

  • Derivable from previous answers → never asked.
  • A tuning value inside a confirmed mechanism → defaulted, recorded as tunable.
  • Every question leads with a recommendation — accepting it is one keypress.
  • Domain questions carry an open "none of these" — your domain knowledge outranks the option list.

On a project's first cycle, /ready additionally runs the conversation that precedes any serious build: what the project is and is not, the domain model — entities, lifecycles, rules — and the stack, recommended with explicit trade-offs when you bring none. No technical direction is fixed by silence.

Before the result reaches you, it passes independent review by an agent that did not author it — checking that nothing you said was dropped or distorted, and that the artifact is executable as written. Your approval is the only event that makes it final.


/go — execution that only passes on evidence

/go consumes the approved contract and owns all git state from that point.

Scheduling. The plan carries an explicit dependency graph. /go validates it — cycles, dangling references, coverage gaps — before any git mutation; a malformed graph fails fast as a producer defect, not something to patch at execution time. Independent tasks execute concurrently, each implementer in an isolated git worktree, and re-enter through a merge gate that verifies the branch actually advanced and the diff touches its declared surface. Implementer self-reports carry no weight.

Verification is risk-proportional. A mechanical rename and a payment-path change do not get the same ceremony. High-risk tasks receive independent review against the spec slice and the raw diff — never the implementer's summary, which is how reviewers get anchored.

Gates, end to end.

  • Each parallel wave ends with the project's verification suite running against the merged base — the first point at which cross-task interactions exist.
  • Completion re-runs full verification and adds runtime smoke: a spec-declared service must boot and answer a live request. Compiles and works are different claims.
  • A verification that cannot be evaluated — the command died before asserting anything — is a failure, not an inferred pass.

Escalation is synchronous. A blocked task halts and waits for your answer. It does not assume one and build on top of it.

Git stays yours. Existing projects execute on a feature branch; main is never written directly, and final integration — merge, PR, manual — is never autonomous. A dirty working tree or unpushed commits abort the run before it starts.

When everything passes, /go writes or updates the project harness, runs one final independent review across the full change, and archives the design contract.


/migration — onboarding an existing codebase

A one-time conversion, not a task runner. /migration scans the codebase, then elicits precisely what code cannot attest: a code path shows what an auth check does, not whether that is the entire policy. The elicitation is risk-weighted — what is inferable and cheap to get wrong is inferred; what is inferable but expensive to get wrong is confirmed with you: domain invariants, security boundaries, the business model behind the checks.

Questions arrive in plain language — "if this changes, must that change with it?" — and answers are compiled back into precise rules. It generates the full harness, leaves the commit to you, and exits. From then on, the project runs on /ready/go.


Anatomy of a cycle

/ready/go is one pipeline with exactly two approval points — both yours. Everything between them runs autonomously.

/ready    decompose input → resolve every open decision → write the contract
          → independent review → ▶ your approval

/go       validate the graph → execute in parallel waves → integration gates
          → runtime smoke → write/update the harness → final review
          → ▶ your approval → archive the contract

What /ready leaves on disk is a three-document design contract in .dryforge/:

  • spec — the authority on what: behavior rules, invariants, API surface, every edge case with an explicit disposition, and the verifications the result must pass.
  • plan — the blueprint for how: per-task behavior contracts and the dependency graph /go schedules from.
  • handoff — the governing document: how the three relate, and the hard gates no step may cross.

The contract is self-contained by construction — written so a future agent can act on it without the conversation that produced it. Decisions that cannot be re-derived from code carry their reasoning inline. The authority hierarchy is explicit: when spec and code disagree, spec wins. When the spec itself looks wrong, the agent does not patch it — it comes back to you.

After /go completes, the contract is archived under .dryforge/, cycle by cycle — a durable record of what was decided, when, and why.


What persists — the project harness

your-project/
├── CLAUDE.md                  # entry point for Claude Code — identity + work rules
├── AGENTS.md                  # entry point for Codex — identical content
├── docs/
│   ├── architecture.md        # composition: components, flow, dependencies
│   ├── business-rules.md      # domain logic: entities, invariants, edge cases
│   ├── security.md            # policy: protected assets, access, audit
│   ├── standards.md           # the rules: hard gates, conventions, boundaries
│   ├── engineering-notes.md   # hard-won knowledge: traps, mechanisms, checklists
│   ├── operations.md          # how to run it: setup, build, deploy
│   ├── contracts.md           # external interface contracts
│   └── tracking/
│       ├── status.md          # where the project stands vs. its full scope
│       ├── decisions/         # decision records — what was chosen, and why
│       └── findings.md        # known unresolved problems
└── <module>/AGENTS.md         # per-module scope, boundaries, invariants
  • Session-independent context. A new conversation reads the harness and resumes with the architecture, the rules, and their reasons in scope. Re-explaining is not part of the workflow.
  • Rationale is first-class. What was chosen and why — so intent is not quietly reversed, and settled debates stay settled.
  • Maintained, not appended. Updates reconcile in both directions: work that invalidates an existing statement corrects it. The harness tracks reality or it gets fixed.
  • Zero lock-in. Standard CLAUDE.md / AGENTS.md. The generated docs contain no dryforge vocabulary and no proprietary format — any agent reads them. Delete dryforge tomorrow; the asset stays.

Where it sits

Plenty of tools touch planning or orchestration. The difference is in what each one trusts.

  • vs. a bare agent — a strong model with no anchor re-derives everything each session, and the stronger the model, the wider it roams. dryforge supplies the same decisions and the same rationale, every session.
  • vs. spec generators & plan modes — they organize what you said. dryforge enumerates what you didn't say — and challenges incoming documents instead of formatting them.
  • vs. prescriptive workflows — "if X, do Y" rulebooks cost the same ceremony forever and cap quality at their author's foresight. dryforge pins contracts and floors, and leaves the ceiling open — model upgrades translate directly into output upgrades.
  • vs. parallel orchestrators — parallelism without grounded intent ships the wrong thing faster. dryforge parallelizes only downstream of an approved spec, and merges only what survives the gates.

Not a cage. A compass.

  • Floor, not ceiling. dryforge fixes interface contracts and the procedural skeleton; conclusions stay with the model. Prescriptive rules cap a tool at their author's foresight — a floor means a better model yields a better result, automatically.
  • Bounded autonomy. You approve the intent; inside that boundary the agent decides freely. Autonomy means executing approved intent — never setting intent on its own.
  • Ask, don't guess. Anything not derivable from your intent comes back as a question — synchronously, before proceeding. One guess costs more than one pause.

Platform notes

  • Explicit invocation only. The three commands never auto-trigger. Nothing runs unless you call it.
  • One source, two platforms. Claude Code and Codex artifacts build from a single platform-neutral source; releases are verified for parity between the two.
  • Requirements. git, and Claude Code or Codex.

When not to use it

  • A one-line fix doesn't need a design conversation. dryforge front-loads cost — elicitation, verification — to eliminate rework. That trade pays on features and projects, not on typo fixes.
  • git is required. The execution discipline is built on branches and worktrees.
  • The time spent answering questions is recovered in execution — and again in every later cycle that reads the harness instead of asking you.

License

MIT

↑ back to top · ready / go