Model library, content hashing, and semantic versioning¶
How a version-controlled model library (for example the sibling repository
Advanced-Process-Control/model-library) fits the ecosystem, howmodel-parserparticipates without becoming a registry or deployment tool, and how hashes, SemVer, and git play different roles.
1. Three axes of “version”¶
Do not collapse these into one number. They answer different questions.
| Axis | Question | Mechanism | Analogy |
|---|---|---|---|
| Content hash | Are two scaffolds semantically identical? Did anything meaningful change? | provenance.content_hash in the IR (sha256 over the semantic body, excluding provenance) |
Content-addressed blob id |
| Semantic model version | Is this change backward-compatible for consumers of the scaffold API? | Human or tool-assigned SemVer (e.g. model.source_version plus release policy) |
Library SemVer |
| History / lineage | Who changed what, when, and why? How do we review, branch, and roll back? | Git commits, tags, PRs | Git history |
The content hash is identity and integrity, not history. It supports deduplication, cache keys, cross-repository references (“this parameter set was fitted against scaffold sha256:…”), and CI checks that regenerated artifacts still match the IR. It does not replace git, and it is a poor user-facing release label.
See also ir-specification.md (identity / content_hash) and storing-mtk-models.md (durable IR + generated .jl).
2. What the hash is good for¶
- Drift detection — Re-emit Julia (or future backends) in CI and assert outputs match expectations, or compare to a lockfile that records per-artifact digests.
- Cache keys — Compiled problems or other caches should key on
content_hash, target profile, and relevant toolchain versions (see Storing MTK models §4). - Pins without paths — Downstream contracts reference the scaffold by hash, not by repository path (“hashes over file paths” org vocabulary).
- Equivalence — Two authoring files that normalize to the same IR yield the same hash (useful for detecting accidental duplicates or proving a refactor was semantics-preserving).
3. Semantic diff and inferred bumps¶
Because expressions live in an explicit tagged tree in the IR (not strings; see ADR 0003), the tool can compare two IRs structurally and suggest a SemVer bump for the model (distinct from ir_version, which versions the IR schema).
The CLI exposes:
model-parser diff <old.ir.json> <new.ir.json> [--json]
model-parser bump <old.ir.json> <new.ir.json> [--json]
diff lists human-readable changes. bump prints a suggested bump level:
| Level | Typical meaning (conservative policy) |
|---|---|
none |
Semantic bodies are identical (content_hash match). |
patch |
Documentation-like or bootstrap tweaks only: e.g. ModelInfo description / source_version / metadata, variable unit / description / roles, profiles, or only numeric defaults on existing parameters (same names, same order). |
minor |
Additive scaffold API: new parameters appended after all previous parameters, with unchanged names and defaults for all previous parameters, and no change to states, inputs, outputs, locals, or equations. |
major |
Anything else: renamed model, changed state/input/output set, changed equations, changed local expressions, removed or reordered parameters, ir_version or independent_variable change, or any change not covered by patch / minor. |
Policies can evolve; treat bump as advisory until your library documents stricter rules. When in doubt, the implementation prefers major over false compatibility.
4. Where logic lives: parser vs library repository¶
Inside model-parser (IR contract, pure logic):
parse/emit/validate/ hashing / JSON Schema.diffandbump— semantic comparison and bump suggestion (no git, no catalog).
Inside model-library (orchestration, I/O):
- Directory layout, lockfile / catalog, CI that regenerates artifacts and fails on drift.
- Git tags, release notes, and human curation.
This keeps model-parser small and standalone (see model-parser.md §2 and Non-goals) while still enabling a “living” multi-representation library.
5. Suggested model-library layout¶
One directory per model; one authoring file is the human edit surface; IR and lowered views are generated and committed (or generated only in CI — your choice; the sibling repo defaults to committed artifacts for reviewability).
models/<name>/
model.ini # authoring (ExprTk INI today)
model.ir.json # canonical IR (generated by model-parser parse)
views/
model.jl # generated by model-parser emit julia
model_rhs.jl # generated by model-parser emit julia-rhs
library.lock.json # index: names, paths, content_hash, view SHA-256 digests
A thin driver (scripts/sync.sh, just, or Makefile) invokes the installed model-parser. CI runs the same pipeline with --strict checks (e.g. git diff --exit-code after sync).
6. End-to-end workflow¶
flowchart TD
A[Edit models/foo/model.ini] --> B[model-parser parse → model.ir.json]
B --> C{content_hash changed vs baseline?}
C -- no --> Z[No semantic change; optional doc-only commit]
C -- yes --> D[model-parser diff / bump vs previous release IR]
D --> E[Regenerate views: emit julia, emit julia-rhs …]
E --> F[Update library.lock.json and changelog]
F --> G[Commit; tag if releasing]
G --> H[CI: re-sync or re-emit; fail on drift]
Do not hand-edit generated .jl files; they are compilation products (see storing-mtk-models.md). Planned emit ini round-trip (product roadmap) will reduce the need to maintain two authoring encodings by hand.
7. Scaffold vs parameter sets vs scenarios (authoring files)¶
The IR intentionally describes the scaffold only: structure, equations, roles, units, parameter declarations (optional defaults for bootstrap). Parameter sets and scenarios are sibling contracts (fitted values, x0/u0, horizons) — not stored in the canonical IR as execution truth.
Current ExprTk INI¶
Today’s frontend documents that [x0] / [u0] are dropped with a warning because they are scenario data (model-parser.md §4). [Dimensions] is still required: it declares how many x*, u*, and y* slots exist before equations are parsed. In principle, counts could be inferred from the maximum indices appearing in dx*, y*, and u* references; that would be a frontend enhancement, not an IR change.
Should parameter values leave the INI?¶
Ideal split (org contracts):
- Scaffold — declares parameters (names, units, bounds later); optional defaults only for convenience / tests.
- Parameter set — JSON (or similar) referencing the scaffold by
content_hash, carrying the numeric values used for calibration or deployment. - Scenario — references scaffold (and optionally a parameter set), carries
x0,u0, horizon, etc.
Whether values also appear in the INI is a workflow choice:
- Keeping numeric defaults in INI is convenient for quick
parse → emitdemos and matches legacy MPC files. - Moving values entirely into parameter sets is cleaner for governance (one hash-pinned scaffold, many value sets) and avoids duplicating “truth” in two places.
model-parser does not yet ship a parameter-set file format; when the org standardizes one, the library should adopt it alongside the IR.
Optimal authoring format (if not tied to INI)¶
A greenfield layout often works well as two or three small files per model (or one folder):
scaffold.*— YAML or TOML: metadata, symbol tables, equations as structured expressions or as controlled strings parsed by the same expression grammar. YAML maps naturally to nested expression trees for tooling.parameters.default.json— optional; referencescontent_hashof the IR once emitted, or referencesmodel.name+ lockfile version for humans.scenario.*— initial conditions, setpoints, simulation horizon.
The important part is not the concrete syntax but the separation of concerns and a single semantic hub (the IR) so backends stay renderers, not re-parsers.
8. Reproducible provenance timestamps¶
For committed IR JSON, provenance.created_at would otherwise change on every regeneration. When SOURCE_DATE_EPOCH is set (Unix epoch seconds, per reproducible-builds.org), the INI frontend uses it for created_at instead of the wall clock. Library sync scripts can export this variable so IR files stay byte-stable when semantics are unchanged.
9. Larger roadmap (optional)¶
- Library lockfile — pins each model’s
content_hashand digests of generated views; CI verifies them. emit iniround-trip — product roadmap item; reduces manual dual maintenance of INI and other views.ir_versionmigrations — when the IR schema bumps, migration tooling plus ADRs (ir-specification.md§5).- Composite models — dependency graph of scaffolds; composite content hash derived from constituents (Merkle-style).
- Signing / extended provenance — optional fields for signer identity, git commit SHA, parent scaffold hash (policy and ADR when needed).
10. Related documents¶
model-parser.md— product scope and CLI overview.ir-specification.md— IR shape andcontent_hashdefinition.storing-mtk-models.md— IR + generated.jlas durable artifacts.language-strategy.md— Python/Julia split at the IR file boundary.