Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Running an organism

audience: operators

Operating an organism is, at the protocol level, identical to operating any mosaik-native committee organism: a committee, a state machine, a narrow public surface. This page is the delta that applies when the organism is composite — when it spans multiple members (lattices, standalone organisms, or a mix) — so the operator must track per-member health rather than one upstream.

Modules (Atlas, Almanac, Chronicle, Compute, Randomness) are coalition-scoped organisms; the same rules apply. Each module’s crate publishes its own page once commissioned.

For the base runbook — systemd units, dashboards, incident response — follow the builder operator runbooks. Each per-organism crate publishes its own page once commissioned.

What to add on top of a single-committee runbook

  • Per-member subscription health. A composite organism reads from multiple members’ public surfaces. The dashboard carries one subscription-lag metric per spanned member, not an aggregated metric. A composite organism that stalls typically stalls on one member first.

  • Per-member ticket health. Committee members hold tickets from each spanned member’s operator. When a per-member operator rotates their ticket-issuance root, the organism’s bonds into that member break until new tickets are issued to the committee. Monitor ticket validity horizons per member.

  • Member-identity monitoring. When a referenced member retires (stable id changes) or bumps its content hash (if the organism pinned content), the organism’s own Config fingerprint is stale; the organism must be redeployed under an updated OrganismConfig. Detect this before integrators do.

  • Cross-operator communications. A composite organism with members run by different operators requires a standing channel with each — at minimum a mailing list or chat channel for advance announcements of retirements and rotations.

The lifecycle of a composite organism commit

  1. Committee driver watches each spanned member’s public collection / stream.
  2. An upstream event fires; the driver wraps it in an Observe* command with an evidence pointer back to the upstream commit.
  3. The organism’s Group commits the Observe* via Raft.
  4. Periodically (or on an apply-deadline timer), the driver issues an Apply command. The state machine reads accumulated observations and commits the organism’s own fact.
  5. The organism’s public surface serves the committed fact to integrators and to any downstream consumers (other organisms, including composite ones that fold this one in).

Every step is standard mosaik machinery. The organism- specific work is the driver’s multi-subscription logic.

Rotations

A composite organism rotates like any other organism:

  • Committee member rotation. Add a new member under the same OrganismConfig; drain the old member; decommission. No fingerprint change.
  • Committee admission policy rotation (e.g. a TDX Measurements bump). Requires a new OrganismConfig fingerprint. Announce to any coalition operators referencing the organism and follow the rotations and upgrades sequence.
  • Spanned-member-set rotation. Adding or removing a spanned member changes the organism’s content fingerprint. This is a larger change; typically accompanied by a fresh OrganismConfig publication and a notice to referencing coalitions.

Rotations that do NOT break integrators

  • Committee member swaps under a stable OrganismConfig. Integrators see a brief latency bump during drain; no handle failures.

Rotations that DO break integrators

  • Any OrganismConfig fingerprint bump. Integrators compiled against the old config see ConnectTimeout on the organism handle until they recompile against the new OrganismConfig (and any coalition referencing it updates its CoalitionConfig). Announce ahead of time via the change channel.

Retirement

When a composite organism’s committee is shutting down permanently, the committee emits a RetirementMarker as its final commit on each public primitive. The marker carries:

  • effective_at — the Almanac tick or wall-clock at which the committee ceases to commit;
  • replacement — an optional pointer to the replacement organism so integrators rebind cleanly rather than timing out.

If any referencing coalition ships a Chronicle, the retirement lands as ChronicleKind::OrganismRetired in the next Chronicle entry.

Incident response specific to organisms

Two incident classes to add to your playbook on top of the single-organism classes in builder incident response.

Evidence-pointer resolution fails on replay

Symptom: a committee member replaying the log fails to resolve an evidence pointer to an upstream member commit. Cause: the upstream member committed the fact, the organism observed it, but the member has since gone through a state compaction / reorg that removed the referenced commit from the public surface the organism reads.

Response:

  • The organism state machine is required to reject such replays, not tolerate them.
  • Confirm the issue is upstream-member retention, not organism state.
  • Coordinate with the per-member operator. The fix is usually member-side configuration: longer retention on the public surface the organism subscribes to.

One spanned member goes dark

Symptom: no events from one of the spanned members’ public surfaces for multiple slots (or multiple publish ticks, if the member is an organism without a slot clock).

Response:

  • Confirm with the per-member operator whether the outage is on their side or on the subscription.
  • If on their side, fall back to the stall policy. A organism committing partial evidence yields degraded commits; one stalling per slot yields no commits until the member returns. Integrators were warned in the composition-hooks doc.
  • If on the subscription side: mosaik transport troubleshooting; nothing organism-specific.

Cross-references