Model Ledger

Model Ledger (ModL) is a tool for managing the identity of a domain data model over time — tracking what exists, what changed, and what that means for the systems that consume it.

Overview

As a domain data model evolves, producers and consumers need stable answers to questions like:

What concepts exist, and what do they mean?
What changed between releases, and when?
Has a change broken the data contract?
What is the stable runtime address for a given model element?

ModL answers these by maintaining four normalized, append-only CSV tables — the ledger — co-released alongside the data model in a version-controlled repository. Records in the ledger are never deleted, only superseded. The git history provides point-in-time reproducibility.

How It Works

ModL is language-agnostic. It does not parse model files directly. A language-specific adapter produces a diff report in modl's intermediate representation (IR), which is fed into modl along with the previous ledger state and a breaking change configuration.

flowchart LR
    A[Model snapshot] --> |Previous| DIFF[Model-specific diff]
    B[Model snapshot] --> |Current| DIFF
    DIFF --> |Diff report| ADAPTER[Intermediate Representation Adapter]
    ADAPTER --> |Diff in IR| SYNC
    LEDGER[Ledger] --> |Previous| SYNC
    F[Breaking change config] --> SYNC
    SYNC --> |Updated| LEDGER

Underlying pattern

The diff report describes changes using four element kinds:

Kind	Also known as	Receives bindings?
`ENTITY`	Container, branch, object type, class, feature of interest	No
`PROPERTY`	Field, attribute, signal, characteristic	Yes — one per instance, or one singleton
`ENUMERATION_SET`	Enum type, allowed values, expected values	No
`ENUM_VALUE`	Enum member, allowed value, listed option	No

ENTITY and PROPERTY cover the structural model. ENUMERATION_SET and ENUM_VALUE cover shared vocabulary — types, units, and code lists that properties reference. All four kinds receive concept URIs, revisions, and contracts in the ledger. Only PROPERTY concepts are runtime-addressable and therefore receive bindings.

graph TB
    E[ENTITY] --> |has property| P1[PROPERTY 1]
    E --> |has property| P2[PROPERTY 2]
    E --> |has property| PN[PROPERTY N]
    ES[ENUMERATION_SET] --> |has value| EV1[ENUM_VALUE 1]
    ES --> |has value| EV2[ENUM_VALUE 2]
    P1 -.->|references| ES

Example: model pattern

A Person that has a name and owns a Car

graph TB
    E[Person] --> |name| P1[String]
    E --> |ownsCar| P2[Car]

Notice that some properties can resolve to a primitive data type (e.g., name resolves to String), whereas others resolve to another entity (e.g., ownsCar resolves to Car). Hence, such a simple pattern resembles a graph when used systematically.

graph TB
    E[Person*] --> |name| P1[String]
    E --> |ownsCar| V[Vehicle*]
    V --> |speed| P3[Float]

* indicates that the element is an ENTITY.

Example: reported changes

A diff function (specific to the data modeling language and agnostic to ModL) must report changes to entities, properties, and vocabulary elements, indicating whether they were ADDED, REMOVED, or MODIFIED.

Change	Example
`ENTITY` added	New `Vehicle.Door` branch
`ENTITY` removed	`Vehicle.OldFeature` deleted
`ENTITY` modified	`Vehicle.Door` instances list changed
`PROPERTY` added	`Vehicle.Door.IsLocked` added
`PROPERTY` removed	`Vehicle.Door.IsOpen` removed
`PROPERTY` modified	`Vehicle.Speed` datatype changed from `Float` to `Int`
`ENUMERATION_SET` added	New `SpeedUnit` vocabulary type
`ENUM_VALUE` added	New `SpeedUnit.KMH` member
`ENUM_VALUE` modified	`SpeedUnit.KMH` symbol changed

Building Blocks

ModL tracks model identity across four dimensions. Namely: Concepts, Revisions, Contracts and Bindings. The following example, written in vspec, is used throughout:

Vehicle:  # This is an ENTITY
    type: branch

Vehicle.Speed:  # This is an PROPERTY
    type: sensor
    datatype: Float

Vehicle.Door:  # This is an ENTITY
    type: branch
    instances: [Left, Right]

Vehicle.Door.IsOpen:  # This is an PROPERTY
    type: sensor
    datatype: Boolean

Concepts

A concept is the agreed meaning of a model element — what it is, independent of any implementation detail. Think of it as a dictionary entry.

Kind	Label	Meaning
ENTITY	`Vehicle`	A motorized thing used for transporting people or goods
ENTITY	`Door`	A hinged or sliding barrier at the entrance to a vehicle
PROPERTY	`Vehicle.Speed`	The rate at which a vehicle moves
PROPERTY	`Door.IsOpen`	Whether a door is open or closed
ENUMERATION_SET	`SpeedUnit`	A vocabulary type enumerating recognised speed units
ENUM_VALUE	`SpeedUnit.KMH`	The kilometre-per-hour speed unit

Concepts are identified once and never reassigned. If a concept is renamed, the old label is recorded as a previous label — the concept identity does not change.

Revisions

A revision is assigned to every detected change, regardless of whether it is breaking. It is the raw audit log of what happened.

Examples of changes that trigger a new revision:

A typo fix (Vehicl → Vehicle)
A unit change (km/h → mph)
A description update
A field being added or removed
An instance list changing

Contracts

A contract captures the specific data structure agreement between producers and consumers. It assigns identity to a concrete variant of the model that is relevant to the downstream users. What counts as "relevant" is user-defined via a configuration file. Any change to an essential attribute triggers a new contract. In other words, if a braking change is detected, a new contract identy is minted.

For example, if datatype is declared essential for Vehicle.Speed:

contract_uri	Snapshot	Status
`http://namespace.example/contracts/10`	`Vehicle.Speed { datatype: Int }`	SUPERSEDED
`http://namespace.example/contracts/14`	`Vehicle.Speed { datatype: Float }`	ACTIVE

These are two contracts for the same concept — each a distinct variant of the essential metadata. The meaning of "speed" has not changed, but the data contract has.

Contracts apply to both entities and fields. An entity's essential metadata (e.g., its type or instance list) defines its contract just as a field's datatype defines its own. Each element's contract is governed independently by its own essential attribute configuration.

Bindings

Bindings assign a stable identity to every runtime-addressable path a property can appear on.

For Vehicle.Door with instances: [Left, Right], the field Door.IsOpen expands into:

serial	binding_uri	Runtime path
24	`http://namespace.example/bindings/o`	`Vehicle.Door.Left.IsOpen`
25	`http://namespace.example/bindings/p`	`Vehicle.Door.Right.IsOpen`

A system can then write a compact payload like 24: true to mean "the left door is open", without encoding the full path.

For a property whose parent entity has no instances (e.g., Battery.StateOfCharge), one binding is still minted with no instance label:

serial	binding_uri	Runtime path
42	`http://namespace.example/bindings/16`	`Battery.StateOfCharge`

Note: Bindings are assigned to PROPERTY concepts only. ENTITY concepts are not directly addressable at runtime and therefore never receive bindings. Vocabulary kinds (ENUMERATION_SET, ENUM_VALUE) never receive bindings either. The engine reads the kind column of the concept row to enforce all three rules.

Instance expansion behavior

When a new instance is added (e.g., Center), the behavior depends on the breaking change configuration:

Config	Entity revision	Entity contract	Field revision	Field contract	New binding
Breaking	yes	yes	yes	yes (new)	yes, anchored to new contract
Non-breaking	yes	no	yes	no (unchanged)	yes, appended to existing contract

In the non-breaking case, existing binding IDs remain stable and consumers are unaffected. Binding sets under a contract are append-only.

What each event writes to the ledger

The table below shows which rows modl sync creates or updates for each type of change event, given the breaking-change classification configured by the user.

Event	concepts	revisions	contracts	bindings
ENTITY `ADDED`	new row	new row	new row (initial contract)	—
ENTITY `MODIFIED`, non-breaking (no instance change)	update `current_label` if renamed	new row	— (unchanged)	—
ENTITY `MODIFIED`, non-breaking (instances added)	update `current_label` if renamed	new row; new child revisions	— (unchanged)	new bindings for added instances on child properties
ENTITY `MODIFIED`, breaking (non-instance)	update `current_label` if renamed	new row	new row	—
ENTITY `MODIFIED`, breaking (instances changed)	update `current_label` if renamed	new row; new child revisions	new row; new child contracts	new bindings for all child properties (old bindings superseded)
ENTITY `REMOVED`	status → REMOVED	new row	status → REMOVED	status → REMOVED for child property bindings (via child REMOVED events)
Property `ADDED`	new row	new row	new row (initial contract)	new binding per instance; one singleton if no instances
Property `MODIFIED`, non-breaking	update `current_label` if renamed	new row	— (unchanged)	—
Property `MODIFIED`, breaking	update `current_label` if renamed	new row	new row	new bindings anchored to new contract (old bindings superseded)
Property `REMOVED`	status → REMOVED	new row	status → REMOVED	status → REMOVED
`ENUMERATION_SET` `ADDED`	new row	new row	new row (initial contract)	—
`ENUMERATION_SET` `MODIFIED`	update `current_label` if renamed	new row	new row if breaking, unchanged if not	—
`ENUMERATION_SET` `REMOVED`	status → REMOVED	new row	status → REMOVED	—
`ENUM_VALUE` `ADDED`	new row	new row	new row (initial contract)	—
`ENUM_VALUE` `MODIFIED`	update `current_label` if renamed	new row	new row if breaking, unchanged if not	—
`ENUM_VALUE` `REMOVED`	status → REMOVED	new row	status → REMOVED	—

Key observations:

Every event produces a revision — the revision log is unconditional and unfiltered.
A contract is only created or superseded when a change is classified as breaking by the config. Non-breaking changes leave the active contract untouched, so any system holding a contract URI or binding URI is unaffected.
A rename never changes the concept URI. It updates current_label and appends the old label to previous_labels in the concept row.
ENUMERATION_SET and ENUM_VALUE events follow the same revision and contract rules as ENTITY and PROPERTY respectively, but never produce bindings regardless of configuration.

The Ledger Tables

Serial and URI minting

Each record minted is assigned a serial number — a monotonically increasing non-negative integer, never reused. The serial is permanently baked into the record's Uniform Resource Identifier (URI):

uri = namespace + table_name + "/" + base36(serial)

Base36 uses alphabet 0-9a-z (lowercase ASCII). Values 0–9 encode as single decimal digits, values 10–35 as single letters (a–z); larger values use multiple characters (e.g., serial 40 → 14, serial 103 → 2v).

Authorship rule: the ledger contains only records minted by the project owner. Every row has a serial number and a URI under the project namespace. Foreign Key (FK) columns (concept_uri, contract_uri, etc.) may reference URIs from other namespaces — those are foreign references, not rows authored here.

Cross-namespace imports (Work in progress): when a model references elements from an external project, the importing project ships its own ledger alongside a pruned copy of the external ledger containing only the referenced rows, annotated with provenance (source namespace, release URL, content hash).

`concepts.csv`

serial	concept_uri	current_label	previous_labels	kind	status
0	`http://namespace.example/concepts/0`	Vehicle	—	ENTITY	ACTIVE
1	`http://namespace.example/concepts/1`	Vehicle.Speed	Vehicle.Velocity	PROPERTY	ACTIVE
2	`http://namespace.example/concepts/2`	Vehicle.Door	—	ENTITY	ACTIVE
8	`http://namespace.example/concepts/8`	Vehicle.Door.IsOpen	—	PROPERTY	ACTIVE
5	`http://namespace.example/concepts/5`	SpeedUnit	—	ENUMERATION_SET	ACTIVE
6	`http://namespace.example/concepts/6`	SpeedUnit.KMH	—	ENUM_VALUE	ACTIVE

The kind column records the structural kind of the concept permanently. Only PROPERTY concepts receive bindings. ENTITY, ENUMERATION_SET, and ENUM_VALUE concepts never do.

`revisions.csv`

serial	concept_uri	revision_uri	previous_revision_uri	status
56	`http://namespace.example/concepts/0`	`http://namespace.example/revisions/1k`	—	ACTIVE
57	`http://namespace.example/concepts/8`	`http://namespace.example/revisions/1l`	—	SUPERSEDED
103	`http://namespace.example/concepts/8`	`http://namespace.example/revisions/2v`	`http://namespace.example/revisions/1l`	ACTIVE

`contracts.csv`

serial	concept_uri	contract_uri	revision_uri	status
40	`http://namespace.example/concepts/8`	`http://namespace.example/contracts/14`	`http://namespace.example/revisions/2v`	ACTIVE

`bindings.csv`

serial	contract_uri	binding_uri	instance_label	status
24	`http://namespace.example/contracts/14`	`http://namespace.example/bindings/o`	Left	ACTIVE
25	`http://namespace.example/contracts/14`	`http://namespace.example/bindings/p`	Right	ACTIVE
42	`http://namespace.example/contracts/1e`	`http://namespace.example/bindings/16`	(null)	ACTIVE

The third row is a singleton binding — Battery.StateOfCharge whose parent has no instances. instance_label is null; the binding still provides a stable, versioned identity for the runtime path.

Table relationships

erDiagram
    concepts ||--o{ revisions : "tracked by"
    concepts ||--o{ contracts : "realized as"
    revisions ||--o{ contracts : "triggers"
    contracts ||--o{ bindings : "expanded into"

    concepts {
        int serial PK
        string concept_uri UK
        string current_label
        string previous_labels
        string kind
        string status
    }
    revisions {
        int serial PK
        string concept_uri FK
        string revision_uri UK
        string previous_revision_uri FK
        string status
    }
    contracts {
        int serial PK
        string concept_uri FK
        string contract_uri UK
        string revision_uri FK
        string status
    }
    bindings {
        int serial PK
        string contract_uri FK
        string binding_uri UK
        string instance_label
        string status
    }

Usage

The following commands assume an active environment, see CONTRIBUTING for instructions on how to set it up.

`modl sync`

Synchronises the ledger with a diff report. If no ledger exists yet, it is created. If no diff report is provided, an empty ledger is initialised.

modl sync --ledger-dir PATH --model-metadata PATH --breaking-aspects PATH [--diff-report PATH] [--dry-run] [--strict]

Option	Description
`-d`, `--diff-report`	Path to the diff report JSON file (optional). Omit to initialise an empty ledger.
`-o`, `--ledger-dir`	Directory where the four ledger CSV files are read from and written to.
`-m`, `--model-metadata`	Path to the model metadata YAML file (`name`, `id`, `preferred_prefix`).
`-b`, `--breaking-aspects`	Path to the breaking aspects config YAML file.
`-n`, `--dry-run`	Preview what would change without writing anything to disk. Exits with code `1` if changes would be made.
`-s`, `--strict`	Treat aspect keys in the diff report that are not declared in the config as errors instead of warnings.

Model metadata file format

The model metadata file follows the s2dm metadata.yaml convention:

name: MyModel                           # human-readable model name
id: "http://namespace.example/"         # must end with '/' or '#'; full URIs are stored in the ledger
preferred_prefix: "ns"                  # optional display alias; used by inspection commands to shorten output

All three fields are required except preferred_prefix. id is used as the namespace base URI for minting ledger record identifiers.

Breaking aspects file format

entity:
  instances: true   # breaking — triggers a new contract
  type: true        # breaking — triggers a new contract
  name: false       # renames are non-breaking; suppresses --strict warnings

property:
  output_type: true  # breaking — triggers a new contract
  unit: true         # breaking — triggers a new contract
  accuracy: true     # user-defined domain-specific attribute
  description: false # known, non-breaking; suppresses --strict warnings

The entity and property sections default to empty — all changes are treated as non-breaking if omitted.

Each key maps to a boolean with three distinct states:

Value	Meaning
`true`	Aspect is breaking — a change triggers a new contract.
`false`	Aspect is known but non-breaking — changes are silently accepted; no warning even with `--strict`.
(absent)	Aspect is unknown — treated as non-breaking but produces a warning (error with `--strict`).

The reserved key name governs rename events (renamed_from set on a diff event). It never appears in aspects — it is checked separately via renamed_from.

Diff report format

The diff report is a JSON file produced by a language-specific adapter (e.g. for vspec, GraphQL SDL). It describes what changed between two model snapshots using modl's intermediate representation.

Each change event covers either an entity (container, object type, branch) or a property (field, attribute, signal). Key fields:

Field	Values
`kind`	`ENTITY`, `PROPERTY`, `ENUMERATION_SET`, or `ENUM_VALUE`
`change_type`	`ADDED`, `REMOVED`, or `MODIFIED`
`aspects`	On `ADDED`: full initial-state snapshot. On `MODIFIED`: delta of changed keys only. Absent on `REMOVED`.
`renamed_from`	Previous label when the element was renamed (`MODIFIED` only).
`parent_label`	Required for `PROPERTY` — the label of the owning entity.
`content`	`ENTITY` `MODIFIED` only — list of `{label, change_type}` for children that changed.

See diff_report_template.md for the full field reference, rename semantics, examples, and an adapter implementation checklist.

Adoption Guide

1. Define your model and take a snapshot

Represent your domain model in your chosen modeling language (vspec, GraphQL SDL, JSON Schema, etc.). A snapshot is the complete state of the model at a point in time — a version-controlled file, a release artifact, or a git tag. The diff is always computed between two such snapshots: the previous release and the current one.

2. Produce a diff between two snapshots

Compare two snapshots of your model to enumerate what was added, removed, or modified. The mechanism depends on your modeling language:

Text-based formats (YAML, JSON): diff the files and post-process the output.
Structured formats with tooling (vspec, Protobuf): use the language's own comparison tool if one exists, or write a script that loads both snapshots and walks the element tree.
Schema registries: use the registry's diff API if available.

For the first release there is no previous snapshot — treat every element as ADDED.

3. Write an adapter that translates the diff into the ModL IR format

The adapter is a script or tool — typically a short Python or shell program — that takes your language-specific diff output and writes a diff.json file in the ModL intermediate representation. At its simplest:

# pseudocode
previous = load("model-v1.yaml")
current  = load("model-v2.yaml")
changes  = compare(previous, current)   # language-specific logic
write_json("diff.json", to_modl_ir(changes))

The adapter is a one-time investment per modeling language. See diff_report_template.md for the full field reference, rename semantics, and an adapter implementation checklist.

4. Author a model metadata file and a breaking-aspects config

Create a metadata.yaml that declares your project's namespace following the s2dm convention. Only name and id are required:

name: MyModel
id: "https://myproject.org/model/"
preferred_prefix: "mp"

Create a breaking-aspects.yaml that lists which aspect keys constitute a breaking change. Start minimal — an empty file (or {}) is valid and treats all changes as non-breaking:

entity:
  instances: true
property:
  output_type: true
  unit: true

Use true for breaking aspects, false to explicitly mark a key as known-but-non-breaking (silences --strict warnings).

5. Validate with a dry run

Before touching the ledger, pass the diff report through modl sync with --dry-run and --strict:

modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report diff.json --dry-run --strict

Review any warnings about undeclared aspect keys. For each unknown key, decide: is it breaking (true) or intentionally non-breaking (false)? Update the config and re-run until the dry run is clean.

6. Initialise the ledger

On the first run, the ledger does not exist yet. modl sync creates it. For a first release where you want to capture the initial model state, pass the diff report that treats every element as ADDED. To start with an empty ledger and add history in subsequent syncs, omit --diff-report.

modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report initial_diff.json

Persist (e.g., release) the four generated CSV files (concepts.csv, revisions.csv, contracts.csv, bindings.csv) alongside your model.

7. Sync on every subsequent release

For each new model release, produce a diff between the previous and current snapshots, run the adapter, and sync:

modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report diff.json

Persist (e.g., release) the updated ledger files with the latest composed model. The ledger is append-only — existing records are never modified, only new rows are added or existing ones marked SUPERSEDED.

modl sync is designed to be a CI/CD step that runs automatically on every release. A typical pipeline stage looks like:

1. validate model
2. run adapter → diff.json
3. modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report diff.json --strict
4. commit and tag updated ledger CSV files

8. Iterate on the config as the model evolves

When the adapter emits a new aspect key that is not yet in the breaking-aspects config, modl warns. Decide whether it is breaking or non-breaking and add it to breaking-aspects.yaml. Run the dry run again to confirm the warning is resolved before syncing.

Contributing

See here if you would like to contribute.

Design Decisions and Discarded Alternatives

This section documents the rationale behind key design decisions and the alternatives that were considered and rejected. It serves as a reference when the design is challenged.

Why four tables?

One could argue that concepts and contracts are sufficient: concepts capture identity, contracts capture the data contract. This is true only if what constitutes a breaking change is known a priori and applies uniformly to all downstream consumers. In practice, different teams have different definitions of "breaking". The four-table split reflects this:

concepts — stable identity; what a thing is, regardless of how it changes
revisions — a complete, unfiltered audit log of every detected change; does not judge whether a change is breaking
contracts — derived from revisions using a user-configurable set of essential attributes; two rows share a contract only if nothing essential to that project's definition of "breaking" changed
bindings — some modeling languages define entity instances (e.g., Door: [Left, Right]), which expand fields into multiple individually addressable runtime paths; bindings assign a stable identity to each such path

Merging revisions and contracts would either force a single global breaking-change policy or lose the audit trail. Merging bindings into contracts would require contracts to know about instance expansion, coupling two independent concepts.

Why URIs as identifiers?

The alternative is opaque integers or short labels. URIs were chosen because:

They are globally unique without coordination — two independent projects can mint records and their identifiers will never collide
They are self-describing: a URI encodes the namespace (who minted it), the table (what kind of record it is), and the serial (which record)
They are dereferenceable in principle — a namespace owner can publish human-readable documentation at the URI
They compose naturally across namespaces: FK columns in one project's ledger can reference URIs minted by another project without any registry or mapping table

Plain integers require a global registry to avoid collisions across projects. Short labels (CURIEs) require a prefix resolution context that must travel with every document that uses them.

Why full URIs stored in the CSV tables, not CURIEs?

CURIEs such as ns:0 are shorter but require the prefix-to-namespace map to be present and unambiguous at read time. A CSV file is a standalone artifact — it may be opened months later, sent to another team, or imported by a tool that has no knowledge of the original prefix declarations. Full URIs make each CSV self-contained: the namespace authority, the table name, and the serial are all recoverable from the value itself without external context.

The config file's prefix field is an optional display alias used by inspection commands to shorten output. It is never stored in the ledger.

Why base36 for the URI suffix, not decimal?

The serial is a decimal integer internally. Decimal would be the simplest choice, but base36 was chosen for URI compactness. A model with tens of thousands of records would produce 5-digit decimal suffixes; the same range in base36 fits in 3 characters. Compact URIs matter in serialisation-heavy use cases (payloads, QR codes, logs).

Hexadecimal (base16) was rejected because it is less compact than base36 and gains nothing beyond familiarity.

Why base36 and not base62 or base64?

Base62 (0-9A-Za-z) and base64 (0-9A-Za-z+/=) are more compact than base36 for the same integer range. They were rejected because:

Case ambiguity: base62 uses both uppercase and lowercase letters. URIs are technically case-sensitive, but in practice URLs are routinely lowercased by proxies, logs, and developers. A URI like .../revisions/1K and .../revisions/1k would decode to different serials — a silent data corruption hazard.
No stdlib decode: Python has no built-in base62 decoder. int(s, 36) is part of the language; base62 requires a third-party library or hand-rolled code.
URL safety: base64 uses +, /, and =, which require percent-encoding in URIs. Base64url replaces them with - and _, but introduces yet another non-standard alphabet.

Base36 uses only 0-9a-z — all characters that are unambiguous in URLs, universally lowercased, and directly supported by Python's int(s, 36).

Why a language-agnostic intermediate representation?

ModL does not parse model files directly. A language-specific adapter produces a diff report in a simple JSON format, which ModL then processes. This separation exists because:

The identity ledger is valuable across modeling languages (vspec, GraphQL SDL, JSON Schema, etc.). The four-table structure and URI semantics are language-agnostic; only the diff production is language-specific.
Migrations and imports between modeling languages should preserve identity: if a concept previously defined in vspec is migrated to another language, its URI should not change. A shared IR makes this possible.
The adapter is a thin, replaceable component. ModL's validation, minting, and audit logic does not need to change when a new modeling language is supported.

Why CSV and not SQLite or another format?

Git-friendly: CSV produces line-level diffs in git diff. A change to a single record is visible as a single changed line. Binary formats (SQLite, Parquet) produce opaque binary diffs.
Human-readable: CSV files can be opened directly in a spreadsheet or text editor. They are suitable as release artifacts that non-technical stakeholders can inspect.
No tooling dependency: reading a CSV requires no database engine, no schema migration, no driver. Any language or environment with a standard library can parse it.
Easy manipulation: pandas, polars, and the Python csv module all handle CSV natively.

SQLite may be offered as an optional release artifact in the future to support consumers who prefer to run SQL queries over the ledger.

Why append-only? Why are records never deleted?

The ledger is designed for transparent governance, traceability, and provenance in data modeling projects. Deleting or modifying a record would:

Break any downstream system that holds a reference to the deleted URI
Make it impossible to reconstruct the state of the model at a past point in time without relying solely on git history
Undermine the audit trail needed to answer questions like "what did this field mean at the time this data was produced?"

Records that are no longer current are marked SUPERSEDED or REMOVED. The full history remains readable. The git history provides point-in-time reproducibility at the repository level; the ledger tables provide it at the record level without requiring a git checkout.

Why `previous_labels` as a list column in `concepts.csv`?

An alternative is a separate rename-history table (e.g., concept_label_history.csv) with one row per rename event. That would be more normalised and queryable. The list column was chosen for simplicity: label history is rarely queried independently, and the added table would require its own schema validation, FK constraints, and serial management. A flat list in the concepts table is sufficient for the primary use case — knowing what a concept used to be called — without adding a fifth table to the ledger.

If richer label history (timestamps, attribution) becomes necessary, a dedicated table is the natural upgrade path.

Why co-release the ledger in the same repository as the model?

Each model release produces a snapshot. A diff between two snapshots produces a diff report. That diff report is passed directly to ModL to update the ledger. Keeping the ledger in the same repository means:

Every model release tag also tags the corresponding ledger state; consumers can check out any release and find a consistent pair
The ledger is a self-contained artifact: it does not require references to an external repository to be meaningful
CI/CD pipelines operate on a single repository checkout

A separate ledger repository would require coordinated releases across two repositories, introduce the risk of the ledger falling out of sync with the model, and require consumers to know about and access a second repository.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
figures		figures
src/modl		src/modl
tests		tests
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
diff_report_template.md		diff_report_template.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Model Ledger

Overview

How It Works

Underlying pattern

Example: model pattern

Example: reported changes

Building Blocks

Concepts

Revisions

Contracts

Bindings

Instance expansion behavior

What each event writes to the ledger

The Ledger Tables

Serial and URI minting

concepts.csv

revisions.csv

contracts.csv

bindings.csv

Table relationships

Usage

modl sync

Model metadata file format

Breaking aspects file format

Diff report format

Adoption Guide

1. Define your model and take a snapshot

2. Produce a diff between two snapshots

3. Write an adapter that translates the diff into the ModL IR format

4. Author a model metadata file and a breaking-aspects config

5. Validate with a dry run

6. Initialise the ledger

7. Sync on every subsequent release

8. Iterate on the config as the model evolves

Contributing

Design Decisions and Discarded Alternatives

Why four tables?

Why URIs as identifiers?

Why full URIs stored in the CSV tables, not CURIEs?

Why base36 for the URI suffix, not decimal?

Why base36 and not base62 or base64?

Why a language-agnostic intermediate representation?

Why CSV and not SQLite or another format?

Why append-only? Why are records never deleted?

Why previous_labels as a list column in concepts.csv?

Why co-release the ledger in the same repository as the model?

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`concepts.csv`

`revisions.csv`

`contracts.csv`

`bindings.csv`

`modl sync`

Why `previous_labels` as a list column in `concepts.csv`?

Packages