Skip to content

COVESA/modl

Repository files navigation

Model Ledger

ModL banner

Model Ledger (ModL) is a tool for managing the identity of a domain data model over time — tracking what exists, what changed, and what that means for the systems that consume it.

Overview

As a domain data model evolves, producers and consumers need stable answers to questions like:

  • What concepts exist, and what do they mean?
  • What changed between releases, and when?
  • Has a change broken the data contract?
  • What is the stable runtime address for a given model element?

ModL answers these by maintaining four normalized, append-only CSV tables — the ledger — co-released alongside the data model in a version-controlled repository. Records in the ledger are never deleted, only superseded. The git history provides point-in-time reproducibility.

How It Works

ModL is language-agnostic. It does not parse model files directly. A language-specific adapter produces a diff report in modl's intermediate representation (IR), which is fed into modl along with the previous ledger state and a breaking change configuration.

flowchart LR
    A[Model snapshot] --> |Previous| DIFF[Model-specific diff]
    B[Model snapshot] --> |Current| DIFF
    DIFF --> |Diff report| ADAPTER[Intermediate Representation Adapter]
    ADAPTER --> |Diff in IR| SYNC
    LEDGER[Ledger] --> |Previous| SYNC
    F[Breaking change config] --> SYNC
    SYNC --> |Updated| LEDGER
Loading

Underlying pattern

The diff report describes changes using four element kinds:

Kind Also known as Receives bindings?
ENTITY Container, branch, object type, class, feature of interest No
PROPERTY Field, attribute, signal, characteristic Yes — one per instance, or one singleton
ENUMERATION_SET Enum type, allowed values, expected values No
ENUM_VALUE Enum member, allowed value, listed option No

ENTITY and PROPERTY cover the structural model. ENUMERATION_SET and ENUM_VALUE cover shared vocabulary — types, units, and code lists that properties reference. All four kinds receive concept URIs, revisions, and contracts in the ledger. Only PROPERTY concepts are runtime-addressable and therefore receive bindings.

graph TB
    E[ENTITY] --> |has property| P1[PROPERTY 1]
    E --> |has property| P2[PROPERTY 2]
    E --> |has property| PN[PROPERTY N]
    ES[ENUMERATION_SET] --> |has value| EV1[ENUM_VALUE 1]
    ES --> |has value| EV2[ENUM_VALUE 2]
    P1 -.->|references| ES
Loading

Example: model pattern

A Person that has a name and owns a Car

graph TB
    E[Person] --> |name| P1[String]
    E --> |ownsCar| P2[Car]
Loading

Notice that some properties can resolve to a primitive data type (e.g., name resolves to String), whereas others resolve to another entity (e.g., ownsCar resolves to Car). Hence, such a simple pattern resembles a graph when used systematically.

graph TB
    E[Person*] --> |name| P1[String]
    E --> |ownsCar| V[Vehicle*]
    V --> |speed| P3[Float]
Loading

* indicates that the element is an ENTITY.

Example: reported changes

A diff function (specific to the data modeling language and agnostic to ModL) must report changes to entities, properties, and vocabulary elements, indicating whether they were ADDED, REMOVED, or MODIFIED.

Change Example
ENTITY added New Vehicle.Door branch
ENTITY removed Vehicle.OldFeature deleted
ENTITY modified Vehicle.Door instances list changed
PROPERTY added Vehicle.Door.IsLocked added
PROPERTY removed Vehicle.Door.IsOpen removed
PROPERTY modified Vehicle.Speed datatype changed from Float to Int
ENUMERATION_SET added New SpeedUnit vocabulary type
ENUM_VALUE added New SpeedUnit.KMH member
ENUM_VALUE modified SpeedUnit.KMH symbol changed

Building Blocks

ModL tracks model identity across four dimensions. Namely: Concepts, Revisions, Contracts and Bindings. The following example, written in vspec, is used throughout:

Vehicle:  # This is an ENTITY
    type: branch

Vehicle.Speed:  # This is an PROPERTY
    type: sensor
    datatype: Float

Vehicle.Door:  # This is an ENTITY
    type: branch
    instances: [Left, Right]

Vehicle.Door.IsOpen:  # This is an PROPERTY
    type: sensor
    datatype: Boolean

Concepts

A concept is the agreed meaning of a model element — what it is, independent of any implementation detail. Think of it as a dictionary entry.

Kind Label Meaning
ENTITY Vehicle A motorized thing used for transporting people or goods
ENTITY Door A hinged or sliding barrier at the entrance to a vehicle
PROPERTY Vehicle.Speed The rate at which a vehicle moves
PROPERTY Door.IsOpen Whether a door is open or closed
ENUMERATION_SET SpeedUnit A vocabulary type enumerating recognised speed units
ENUM_VALUE SpeedUnit.KMH The kilometre-per-hour speed unit

Concepts are identified once and never reassigned. If a concept is renamed, the old label is recorded as a previous label — the concept identity does not change.

Revisions

A revision is assigned to every detected change, regardless of whether it is breaking. It is the raw audit log of what happened.

Examples of changes that trigger a new revision:

  • A typo fix (VehiclVehicle)
  • A unit change (km/hmph)
  • A description update
  • A field being added or removed
  • An instance list changing

Contracts

A contract captures the specific data structure agreement between producers and consumers. It assigns identity to a concrete variant of the model that is relevant to the downstream users. What counts as "relevant" is user-defined via a configuration file. Any change to an essential attribute triggers a new contract. In other words, if a braking change is detected, a new contract identy is minted.

For example, if datatype is declared essential for Vehicle.Speed:

contract_uri Snapshot Status
http://namespace.example/contracts/10 Vehicle.Speed { datatype: Int } SUPERSEDED
http://namespace.example/contracts/14 Vehicle.Speed { datatype: Float } ACTIVE

These are two contracts for the same concept — each a distinct variant of the essential metadata. The meaning of "speed" has not changed, but the data contract has.

Contracts apply to both entities and fields. An entity's essential metadata (e.g., its type or instance list) defines its contract just as a field's datatype defines its own. Each element's contract is governed independently by its own essential attribute configuration.

Bindings

Bindings assign a stable identity to every runtime-addressable path a property can appear on.

For Vehicle.Door with instances: [Left, Right], the field Door.IsOpen expands into:

serial binding_uri Runtime path
24 http://namespace.example/bindings/o Vehicle.Door.Left.IsOpen
25 http://namespace.example/bindings/p Vehicle.Door.Right.IsOpen

A system can then write a compact payload like 24: true to mean "the left door is open", without encoding the full path.

For a property whose parent entity has no instances (e.g., Battery.StateOfCharge), one binding is still minted with no instance label:

serial binding_uri Runtime path
42 http://namespace.example/bindings/16 Battery.StateOfCharge

Note: Bindings are assigned to PROPERTY concepts only. ENTITY concepts are not directly addressable at runtime and therefore never receive bindings. Vocabulary kinds (ENUMERATION_SET, ENUM_VALUE) never receive bindings either. The engine reads the kind column of the concept row to enforce all three rules.

Instance expansion behavior

When a new instance is added (e.g., Center), the behavior depends on the breaking change configuration:

Config Entity revision Entity contract Field revision Field contract New binding
Breaking yes yes yes yes (new) yes, anchored to new contract
Non-breaking yes no yes no (unchanged) yes, appended to existing contract

In the non-breaking case, existing binding IDs remain stable and consumers are unaffected. Binding sets under a contract are append-only.

What each event writes to the ledger

The table below shows which rows modl sync creates or updates for each type of change event, given the breaking-change classification configured by the user.

Event concepts revisions contracts bindings
ENTITY ADDED new row new row new row (initial contract)
ENTITY MODIFIED, non-breaking (no instance change) update current_label if renamed new row — (unchanged)
ENTITY MODIFIED, non-breaking (instances added) update current_label if renamed new row; new child revisions — (unchanged) new bindings for added instances on child properties
ENTITY MODIFIED, breaking (non-instance) update current_label if renamed new row new row
ENTITY MODIFIED, breaking (instances changed) update current_label if renamed new row; new child revisions new row; new child contracts new bindings for all child properties (old bindings superseded)
ENTITY REMOVED status → REMOVED new row status → REMOVED status → REMOVED for child property bindings (via child REMOVED events)
Property ADDED new row new row new row (initial contract) new binding per instance; one singleton if no instances
Property MODIFIED, non-breaking update current_label if renamed new row — (unchanged)
Property MODIFIED, breaking update current_label if renamed new row new row new bindings anchored to new contract (old bindings superseded)
Property REMOVED status → REMOVED new row status → REMOVED status → REMOVED
ENUMERATION_SET ADDED new row new row new row (initial contract)
ENUMERATION_SET MODIFIED update current_label if renamed new row new row if breaking, unchanged if not
ENUMERATION_SET REMOVED status → REMOVED new row status → REMOVED
ENUM_VALUE ADDED new row new row new row (initial contract)
ENUM_VALUE MODIFIED update current_label if renamed new row new row if breaking, unchanged if not
ENUM_VALUE REMOVED status → REMOVED new row status → REMOVED

Key observations:

  • Every event produces a revision — the revision log is unconditional and unfiltered.
  • A contract is only created or superseded when a change is classified as breaking by the config. Non-breaking changes leave the active contract untouched, so any system holding a contract URI or binding URI is unaffected.
  • A rename never changes the concept URI. It updates current_label and appends the old label to previous_labels in the concept row.
  • ENUMERATION_SET and ENUM_VALUE events follow the same revision and contract rules as ENTITY and PROPERTY respectively, but never produce bindings regardless of configuration.

The Ledger Tables

Serial and URI minting

Each record minted is assigned a serial number — a monotonically increasing non-negative integer, never reused. The serial is permanently baked into the record's Uniform Resource Identifier (URI):

uri = namespace + table_name + "/" + base36(serial)

Base36 uses alphabet 0-9a-z (lowercase ASCII). Values 0–9 encode as single decimal digits, values 10–35 as single letters (az); larger values use multiple characters (e.g., serial 40 → 14, serial 103 → 2v).

Authorship rule: the ledger contains only records minted by the project owner. Every row has a serial number and a URI under the project namespace. Foreign Key (FK) columns (concept_uri, contract_uri, etc.) may reference URIs from other namespaces — those are foreign references, not rows authored here.

Cross-namespace imports (Work in progress): when a model references elements from an external project, the importing project ships its own ledger alongside a pruned copy of the external ledger containing only the referenced rows, annotated with provenance (source namespace, release URL, content hash).

concepts.csv

serial concept_uri current_label previous_labels kind status
0 http://namespace.example/concepts/0 Vehicle ENTITY ACTIVE
1 http://namespace.example/concepts/1 Vehicle.Speed Vehicle.Velocity PROPERTY ACTIVE
2 http://namespace.example/concepts/2 Vehicle.Door ENTITY ACTIVE
8 http://namespace.example/concepts/8 Vehicle.Door.IsOpen PROPERTY ACTIVE
5 http://namespace.example/concepts/5 SpeedUnit ENUMERATION_SET ACTIVE
6 http://namespace.example/concepts/6 SpeedUnit.KMH ENUM_VALUE ACTIVE

The kind column records the structural kind of the concept permanently. Only PROPERTY concepts receive bindings. ENTITY, ENUMERATION_SET, and ENUM_VALUE concepts never do.

revisions.csv

serial concept_uri revision_uri previous_revision_uri status
56 http://namespace.example/concepts/0 http://namespace.example/revisions/1k ACTIVE
57 http://namespace.example/concepts/8 http://namespace.example/revisions/1l SUPERSEDED
103 http://namespace.example/concepts/8 http://namespace.example/revisions/2v http://namespace.example/revisions/1l ACTIVE

contracts.csv

serial concept_uri contract_uri revision_uri status
40 http://namespace.example/concepts/8 http://namespace.example/contracts/14 http://namespace.example/revisions/2v ACTIVE

bindings.csv

serial contract_uri binding_uri instance_label status
24 http://namespace.example/contracts/14 http://namespace.example/bindings/o Left ACTIVE
25 http://namespace.example/contracts/14 http://namespace.example/bindings/p Right ACTIVE
42 http://namespace.example/contracts/1e http://namespace.example/bindings/16 (null) ACTIVE

The third row is a singleton bindingBattery.StateOfCharge whose parent has no instances. instance_label is null; the binding still provides a stable, versioned identity for the runtime path.

Table relationships

erDiagram
    concepts ||--o{ revisions : "tracked by"
    concepts ||--o{ contracts : "realized as"
    revisions ||--o{ contracts : "triggers"
    contracts ||--o{ bindings : "expanded into"

    concepts {
        int serial PK
        string concept_uri UK
        string current_label
        string previous_labels
        string kind
        string status
    }
    revisions {
        int serial PK
        string concept_uri FK
        string revision_uri UK
        string previous_revision_uri FK
        string status
    }
    contracts {
        int serial PK
        string concept_uri FK
        string contract_uri UK
        string revision_uri FK
        string status
    }
    bindings {
        int serial PK
        string contract_uri FK
        string binding_uri UK
        string instance_label
        string status
    }
Loading

Usage

The following commands assume an active environment, see CONTRIBUTING for instructions on how to set it up.

modl sync

Synchronises the ledger with a diff report. If no ledger exists yet, it is created. If no diff report is provided, an empty ledger is initialised.

modl sync --ledger-dir PATH --model-metadata PATH --breaking-aspects PATH [--diff-report PATH] [--dry-run] [--strict]
Option Description
-d, --diff-report Path to the diff report JSON file (optional). Omit to initialise an empty ledger.
-o, --ledger-dir Directory where the four ledger CSV files are read from and written to.
-m, --model-metadata Path to the model metadata YAML file (name, id, preferred_prefix).
-b, --breaking-aspects Path to the breaking aspects config YAML file.
-n, --dry-run Preview what would change without writing anything to disk. Exits with code 1 if changes would be made.
-s, --strict Treat aspect keys in the diff report that are not declared in the config as errors instead of warnings.

Model metadata file format

The model metadata file follows the s2dm metadata.yaml convention:

name: MyModel                           # human-readable model name
id: "http://namespace.example/"         # must end with '/' or '#'; full URIs are stored in the ledger
preferred_prefix: "ns"                  # optional display alias; used by inspection commands to shorten output

All three fields are required except preferred_prefix. id is used as the namespace base URI for minting ledger record identifiers.

Breaking aspects file format

entity:
  instances: true   # breaking — triggers a new contract
  type: true        # breaking — triggers a new contract
  name: false       # renames are non-breaking; suppresses --strict warnings

property:
  output_type: true  # breaking — triggers a new contract
  unit: true         # breaking — triggers a new contract
  accuracy: true     # user-defined domain-specific attribute
  description: false # known, non-breaking; suppresses --strict warnings

The entity and property sections default to empty — all changes are treated as non-breaking if omitted.

Each key maps to a boolean with three distinct states:

Value Meaning
true Aspect is breaking — a change triggers a new contract.
false Aspect is known but non-breaking — changes are silently accepted; no warning even with --strict.
(absent) Aspect is unknown — treated as non-breaking but produces a warning (error with --strict).

The reserved key name governs rename events (renamed_from set on a diff event). It never appears in aspects — it is checked separately via renamed_from.

Diff report format

The diff report is a JSON file produced by a language-specific adapter (e.g. for vspec, GraphQL SDL). It describes what changed between two model snapshots using modl's intermediate representation.

Each change event covers either an entity (container, object type, branch) or a property (field, attribute, signal). Key fields:

Field Values
kind ENTITY, PROPERTY, ENUMERATION_SET, or ENUM_VALUE
change_type ADDED, REMOVED, or MODIFIED
aspects On ADDED: full initial-state snapshot. On MODIFIED: delta of changed keys only. Absent on REMOVED.
renamed_from Previous label when the element was renamed (MODIFIED only).
parent_label Required for PROPERTY — the label of the owning entity.
content ENTITY MODIFIED only — list of {label, change_type} for children that changed.

See diff_report_template.md for the full field reference, rename semantics, examples, and an adapter implementation checklist.

Adoption Guide

1. Define your model and take a snapshot

Represent your domain model in your chosen modeling language (vspec, GraphQL SDL, JSON Schema, etc.). A snapshot is the complete state of the model at a point in time — a version-controlled file, a release artifact, or a git tag. The diff is always computed between two such snapshots: the previous release and the current one.

2. Produce a diff between two snapshots

Compare two snapshots of your model to enumerate what was added, removed, or modified. The mechanism depends on your modeling language:

  • Text-based formats (YAML, JSON): diff the files and post-process the output.
  • Structured formats with tooling (vspec, Protobuf): use the language's own comparison tool if one exists, or write a script that loads both snapshots and walks the element tree.
  • Schema registries: use the registry's diff API if available.

For the first release there is no previous snapshot — treat every element as ADDED.

3. Write an adapter that translates the diff into the ModL IR format

The adapter is a script or tool — typically a short Python or shell program — that takes your language-specific diff output and writes a diff.json file in the ModL intermediate representation. At its simplest:

# pseudocode
previous = load("model-v1.yaml")
current  = load("model-v2.yaml")
changes  = compare(previous, current)   # language-specific logic
write_json("diff.json", to_modl_ir(changes))

The adapter is a one-time investment per modeling language. See diff_report_template.md for the full field reference, rename semantics, and an adapter implementation checklist.

4. Author a model metadata file and a breaking-aspects config

Create a metadata.yaml that declares your project's namespace following the s2dm convention. Only name and id are required:

name: MyModel
id: "https://myproject.org/model/"
preferred_prefix: "mp"

Create a breaking-aspects.yaml that lists which aspect keys constitute a breaking change. Start minimal — an empty file (or {}) is valid and treats all changes as non-breaking:

entity:
  instances: true
property:
  output_type: true
  unit: true

Use true for breaking aspects, false to explicitly mark a key as known-but-non-breaking (silences --strict warnings).

5. Validate with a dry run

Before touching the ledger, pass the diff report through modl sync with --dry-run and --strict:

modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report diff.json --dry-run --strict

Review any warnings about undeclared aspect keys. For each unknown key, decide: is it breaking (true) or intentionally non-breaking (false)? Update the config and re-run until the dry run is clean.

6. Initialise the ledger

On the first run, the ledger does not exist yet. modl sync creates it. For a first release where you want to capture the initial model state, pass the diff report that treats every element as ADDED. To start with an empty ledger and add history in subsequent syncs, omit --diff-report.

modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report initial_diff.json

Persist (e.g., release) the four generated CSV files (concepts.csv, revisions.csv, contracts.csv, bindings.csv) alongside your model.

7. Sync on every subsequent release

For each new model release, produce a diff between the previous and current snapshots, run the adapter, and sync:

modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report diff.json

Persist (e.g., release) the updated ledger files with the latest composed model. The ledger is append-only — existing records are never modified, only new rows are added or existing ones marked SUPERSEDED.

modl sync is designed to be a CI/CD step that runs automatically on every release. A typical pipeline stage looks like:

1. validate model
2. run adapter → diff.json
3. modl sync --ledger-dir ledger/ --model-metadata metadata.yaml --breaking-aspects breaking.yaml --diff-report diff.json --strict
4. commit and tag updated ledger CSV files

8. Iterate on the config as the model evolves

When the adapter emits a new aspect key that is not yet in the breaking-aspects config, modl warns. Decide whether it is breaking or non-breaking and add it to breaking-aspects.yaml. Run the dry run again to confirm the warning is resolved before syncing.

Contributing

See here if you would like to contribute.

Design Decisions and Discarded Alternatives

This section documents the rationale behind key design decisions and the alternatives that were considered and rejected. It serves as a reference when the design is challenged.

Why four tables?

One could argue that concepts and contracts are sufficient: concepts capture identity, contracts capture the data contract. This is true only if what constitutes a breaking change is known a priori and applies uniformly to all downstream consumers. In practice, different teams have different definitions of "breaking". The four-table split reflects this:

  • concepts — stable identity; what a thing is, regardless of how it changes
  • revisions — a complete, unfiltered audit log of every detected change; does not judge whether a change is breaking
  • contracts — derived from revisions using a user-configurable set of essential attributes; two rows share a contract only if nothing essential to that project's definition of "breaking" changed
  • bindings — some modeling languages define entity instances (e.g., Door: [Left, Right]), which expand fields into multiple individually addressable runtime paths; bindings assign a stable identity to each such path

Merging revisions and contracts would either force a single global breaking-change policy or lose the audit trail. Merging bindings into contracts would require contracts to know about instance expansion, coupling two independent concepts.

Why URIs as identifiers?

The alternative is opaque integers or short labels. URIs were chosen because:

  • They are globally unique without coordination — two independent projects can mint records and their identifiers will never collide
  • They are self-describing: a URI encodes the namespace (who minted it), the table (what kind of record it is), and the serial (which record)
  • They are dereferenceable in principle — a namespace owner can publish human-readable documentation at the URI
  • They compose naturally across namespaces: FK columns in one project's ledger can reference URIs minted by another project without any registry or mapping table

Plain integers require a global registry to avoid collisions across projects. Short labels (CURIEs) require a prefix resolution context that must travel with every document that uses them.

Why full URIs stored in the CSV tables, not CURIEs?

CURIEs such as ns:0 are shorter but require the prefix-to-namespace map to be present and unambiguous at read time. A CSV file is a standalone artifact — it may be opened months later, sent to another team, or imported by a tool that has no knowledge of the original prefix declarations. Full URIs make each CSV self-contained: the namespace authority, the table name, and the serial are all recoverable from the value itself without external context.

The config file's prefix field is an optional display alias used by inspection commands to shorten output. It is never stored in the ledger.

Why base36 for the URI suffix, not decimal?

The serial is a decimal integer internally. Decimal would be the simplest choice, but base36 was chosen for URI compactness. A model with tens of thousands of records would produce 5-digit decimal suffixes; the same range in base36 fits in 3 characters. Compact URIs matter in serialisation-heavy use cases (payloads, QR codes, logs).

Hexadecimal (base16) was rejected because it is less compact than base36 and gains nothing beyond familiarity.

Why base36 and not base62 or base64?

Base62 (0-9A-Za-z) and base64 (0-9A-Za-z+/=) are more compact than base36 for the same integer range. They were rejected because:

  • Case ambiguity: base62 uses both uppercase and lowercase letters. URIs are technically case-sensitive, but in practice URLs are routinely lowercased by proxies, logs, and developers. A URI like .../revisions/1K and .../revisions/1k would decode to different serials — a silent data corruption hazard.
  • No stdlib decode: Python has no built-in base62 decoder. int(s, 36) is part of the language; base62 requires a third-party library or hand-rolled code.
  • URL safety: base64 uses +, /, and =, which require percent-encoding in URIs. Base64url replaces them with - and _, but introduces yet another non-standard alphabet.

Base36 uses only 0-9a-z — all characters that are unambiguous in URLs, universally lowercased, and directly supported by Python's int(s, 36).

Why a language-agnostic intermediate representation?

ModL does not parse model files directly. A language-specific adapter produces a diff report in a simple JSON format, which ModL then processes. This separation exists because:

  • The identity ledger is valuable across modeling languages (vspec, GraphQL SDL, JSON Schema, etc.). The four-table structure and URI semantics are language-agnostic; only the diff production is language-specific.
  • Migrations and imports between modeling languages should preserve identity: if a concept previously defined in vspec is migrated to another language, its URI should not change. A shared IR makes this possible.
  • The adapter is a thin, replaceable component. ModL's validation, minting, and audit logic does not need to change when a new modeling language is supported.

Why CSV and not SQLite or another format?

  • Git-friendly: CSV produces line-level diffs in git diff. A change to a single record is visible as a single changed line. Binary formats (SQLite, Parquet) produce opaque binary diffs.
  • Human-readable: CSV files can be opened directly in a spreadsheet or text editor. They are suitable as release artifacts that non-technical stakeholders can inspect.
  • No tooling dependency: reading a CSV requires no database engine, no schema migration, no driver. Any language or environment with a standard library can parse it.
  • Easy manipulation: pandas, polars, and the Python csv module all handle CSV natively.

SQLite may be offered as an optional release artifact in the future to support consumers who prefer to run SQL queries over the ledger.

Why append-only? Why are records never deleted?

The ledger is designed for transparent governance, traceability, and provenance in data modeling projects. Deleting or modifying a record would:

  • Break any downstream system that holds a reference to the deleted URI
  • Make it impossible to reconstruct the state of the model at a past point in time without relying solely on git history
  • Undermine the audit trail needed to answer questions like "what did this field mean at the time this data was produced?"

Records that are no longer current are marked SUPERSEDED or REMOVED. The full history remains readable. The git history provides point-in-time reproducibility at the repository level; the ledger tables provide it at the record level without requiring a git checkout.

Why previous_labels as a list column in concepts.csv?

An alternative is a separate rename-history table (e.g., concept_label_history.csv) with one row per rename event. That would be more normalised and queryable. The list column was chosen for simplicity: label history is rarely queried independently, and the added table would require its own schema validation, FK constraints, and serial management. A flat list in the concepts table is sufficient for the primary use case — knowing what a concept used to be called — without adding a fifth table to the ledger.

If richer label history (timestamps, attribution) becomes necessary, a dedicated table is the natural upgrade path.

Why co-release the ledger in the same repository as the model?

Each model release produces a snapshot. A diff between two snapshots produces a diff report. That diff report is passed directly to ModL to update the ledger. Keeping the ledger in the same repository means:

  • Every model release tag also tags the corresponding ledger state; consumers can check out any release and find a consistent pair
  • The ledger is a self-contained artifact: it does not require references to an external repository to be meaningful
  • CI/CD pipelines operate on a single repository checkout

A separate ledger repository would require coordinated releases across two repositories, introduce the risk of the ledger falling out of sync with the model, and require consumers to know about and access a second repository.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages