Running an organism
audience: operators
Operating an organism is, at the protocol level, identical to operating any mosaik-native committee organism: a committee, a state machine, a narrow public surface. This page is the delta that applies when the organism is composite — when it spans multiple members (lattices, standalone organisms, or a mix) — so the operator must track per-member health rather than one upstream.
Modules (Atlas, Almanac, Chronicle, Compute, Randomness) are coalition-scoped organisms; the same rules apply. Each module’s crate publishes its own page once commissioned.
For the base runbook — systemd units, dashboards, incident response — follow the builder operator runbooks. Each per-organism crate publishes its own page once commissioned.
What to add on top of a single-committee runbook
-
Per-member subscription health. A composite organism reads from multiple members’ public surfaces. The dashboard carries one subscription-lag metric per spanned member, not an aggregated metric. A composite organism that stalls typically stalls on one member first.
-
Per-member ticket health. Committee members hold tickets from each spanned member’s operator. When a per-member operator rotates their ticket-issuance root, the organism’s bonds into that member break until new tickets are issued to the committee. Monitor ticket validity horizons per member.
-
Member-identity monitoring. When a referenced member retires (stable id changes) or bumps its content hash (if the organism pinned content), the organism’s own
Configfingerprint is stale; the organism must be redeployed under an updatedOrganismConfig. Detect this before integrators do. -
Cross-operator communications. A composite organism with members run by different operators requires a standing channel with each — at minimum a mailing list or chat channel for advance announcements of retirements and rotations.
The lifecycle of a composite organism commit
- Committee driver watches each spanned member’s public collection / stream.
- An upstream event fires; the driver wraps it in an
Observe*command with an evidence pointer back to the upstream commit. - The organism’s
Groupcommits theObserve*via Raft. - Periodically (or on an apply-deadline timer), the
driver issues an
Applycommand. The state machine reads accumulated observations and commits the organism’s own fact. - The organism’s public surface serves the committed fact to integrators and to any downstream consumers (other organisms, including composite ones that fold this one in).
Every step is standard mosaik machinery. The organism- specific work is the driver’s multi-subscription logic.
Rotations
A composite organism rotates like any other organism:
- Committee member rotation. Add a new member under
the same
OrganismConfig; drain the old member; decommission. No fingerprint change. - Committee admission policy rotation (e.g. a TDX
Measurements bump). Requires a new
OrganismConfigfingerprint. Announce to any coalition operators referencing the organism and follow the rotations and upgrades sequence. - Spanned-member-set rotation. Adding or removing a
spanned member changes the organism’s content
fingerprint. This is a larger change; typically
accompanied by a fresh
OrganismConfigpublication and a notice to referencing coalitions.
Rotations that do NOT break integrators
- Committee member swaps under a stable
OrganismConfig. Integrators see a brief latency bump during drain; no handle failures.
Rotations that DO break integrators
- Any
OrganismConfigfingerprint bump. Integrators compiled against the old config seeConnectTimeouton the organism handle until they recompile against the newOrganismConfig(and any coalition referencing it updates itsCoalitionConfig). Announce ahead of time via the change channel.
Retirement
When a composite organism’s committee is shutting down
permanently, the committee emits a RetirementMarker as
its final commit on each public primitive. The marker
carries:
effective_at— the Almanac tick or wall-clock at which the committee ceases to commit;replacement— an optional pointer to the replacement organism so integrators rebind cleanly rather than timing out.
If any referencing coalition ships a Chronicle, the
retirement lands as ChronicleKind::OrganismRetired in
the next Chronicle entry.
Incident response specific to organisms
Two incident classes to add to your playbook on top of the single-organism classes in builder incident response.
Evidence-pointer resolution fails on replay
Symptom: a committee member replaying the log fails to resolve an evidence pointer to an upstream member commit. Cause: the upstream member committed the fact, the organism observed it, but the member has since gone through a state compaction / reorg that removed the referenced commit from the public surface the organism reads.
Response:
- The organism state machine is required to reject such replays, not tolerate them.
- Confirm the issue is upstream-member retention, not organism state.
- Coordinate with the per-member operator. The fix is usually member-side configuration: longer retention on the public surface the organism subscribes to.
One spanned member goes dark
Symptom: no events from one of the spanned members’ public surfaces for multiple slots (or multiple publish ticks, if the member is an organism without a slot clock).
Response:
- Confirm with the per-member operator whether the outage is on their side or on the subscription.
- If on their side, fall back to the stall policy. A organism committing partial evidence yields degraded commits; one stalling per slot yields no commits until the member returns. Integrators were warned in the composition-hooks doc.
- If on the subscription side: mosaik transport troubleshooting; nothing organism-specific.