Provisioning on demand
audience: ai
Reads feed observation (market-reads); the fleet wraps the cloud APIs (wrapping); this chapter is the grant-handling work. The provider loop decrypts each grant’s shuffle envelope, cross-checks the image hash, routes through the fleet to a backend, and emits the usage log when the workload terminates.
web3’s matching chapter is inference on Compute — the searcher acquires compute from the Compute module, the bridge serves it. Both bond against the same module’s Config; both fail the same way if the composition does not clear.
The provider loop
From
src/provider.rs:
pub async fn run(self) -> anyhow::Result<()> {
let provider_id = self.register_provider_card().await?;
let mut grants = coalition_compute::grants_for(
&self.network, self.compute, provider_id,
).await?;
let mut refresh = tokio::time::interval(
Duration::from_secs(
self.config.capacity_refresh_sec as u64,
),
);
loop {
tokio::select! {
Some(grant) = grants.next() => {
if let Err(err) = self.handle_grant(&grant).await {
tracing::warn!(
request_id = ?grant.request_id,
error = %err,
"grant handling failed",
);
}
}
_ = refresh.tick() => {
let _ = self.refresh_provider_card(provider_id).await;
}
else => break,
}
}
Ok(())
}
Grant handling and card refresh run in the same select, one per tick.
Handling one grant
The handle_grant function in the same file:
async fn handle_grant(
&self, grant: &ComputeGrant<'_>,
) -> anyhow::Result<()> {
// 1. Decrypt the shuffle envelope to learn the
// requester's peer_id and image payload.
let envelope = self.zipnet
.resolve(&grant.bearer_pointer).await?;
// 2. Cross-check the envelope's image hash against
// the grant's. Under honest unseal this should
// never fire; firing means the scheduler
// committee is compromised or the unseal
// quorum broke.
if envelope.image_hash() != grant.image_hash {
anyhow::bail!(
"envelope/grant image hash mismatch"
);
}
// 3. Route through the fleet.
let instance = self.fleet
.provision_for_grant(grant, &envelope).await?;
// 4. Seal an SSH receipt to the requester's x25519
// public key; return via the shuffle.
let receipt = SshAccessReceipt::build(&instance, grant)?;
let sealed = receipt.seal_to(envelope.peer_x25519_public())?;
self.zipnet.reply(&grant.request_id, sealed).await?;
// 5. Spawn a watcher that emits the ComputeLog on
// instance exit or deadline.
let fleet = self.fleet.clone();
let network = self.network.clone();
let request_id = grant.request_id;
let valid_to = grant.valid_to;
let instance_clone = instance.clone();
let provider_id = instance.provider_id();
tokio::spawn(async move {
let usage = fleet
.watch_until_exit(&instance_clone, valid_to)
.await.unwrap_or_default();
let log = ComputeLog {
grant_id: request_id,
provider: provider_id,
window: UsageMetrics::window_for(&usage),
usage: usage.clone(),
evidence: None,
};
let _ = coalition_compute::append_log(&network, &log).await;
});
Ok(())
}
Five steps. Each is a self-check.
1. Decrypt the shuffle envelope
The grant carries a bearer_pointer — a blake3
pointer into the shuffle-sealed envelope the
requester submitted. The unseal committee,
majority-honest, makes cleartext available to the
addressed provider:
// src/zipnet_io.rs — Envelope shape
pub struct Envelope {
peer_id: [u8; 32],
peer_x25519: [u8; 32],
image_hash: UniqueId,
requested_region: Option<String>,
image_pointer: Vec<u8>,
}
The bridge sees a rotating peer_id (so the
requester’s coalition identity stays hidden), the
requester’s x25519 public key for the receipt, the
image hash to serve, an optional region hint, and
a pointer to fetch the image contents. It does not
see the requester’s ClientId, the bid value, or
which other providers were considered before
clearing.
2. Image-hash cross-check
The envelope’s declared image_hash and the
grant’s committed image_hash must match. A
mismatch means either the unseal quorum returned
the wrong cleartext or the scheduler committee
broke. Neither is supposed to happen under honest
operation. The check is defensive. When it fires
the bridge aborts the grant without attempting to
provision.
3. Fleet routing
The fleet picks the first backend whose
can_satisfy returns true (see
wrapping — the Fleet router).
The returned ProvisionedInstance.backend
identifies which backend served the grant. The
bridge records it on the dashboard:
self.dashboard.record(DashboardEvent::GrantAccepted {
backend: instance.backend.to_string(),
}).await;
No requester identity. No instance id.
4. Sealed receipt
Chapter 6 (receipts) covers sealing. Here all that matters: the receipt is sealed to the requester’s x25519 public key published in the envelope, and the shuffle reply channel carries the sealed blob.
5. Usage-log watcher
A tokio::spawned task runs
fleet.watch_until_exit for the duration of the
grant. When the instance terminates — either
because the workload completed or because
valid_to passed — the watcher collects metrics
and appends a ComputeLog to the Compute module’s
log stream.
The ComputeLog is what the scheduler committee
and the reputation organism both read:
pub struct ComputeLog<'a> {
pub grant_id: UniqueId,
pub provider: ProviderId,
pub window: AlmanacRange,
pub usage: UsageMetrics,
pub evidence: Option<EvidencePointer<'a>>,
}
pub struct UsageMetrics {
pub cpu_core_seconds: u64,
pub ram_mib_seconds: u64,
pub net_bytes: u64,
}
The log names the grant and the provider so the
committee can correlate cleared grants to
completed workloads. Grants that clear to a
provider but never appear in the log stream score
the provider down on the reputation organism. The
bridge reports measured cpu-seconds, ram-mib-
seconds, and network bytes — what the backend
actually observed. A bridge that over-reports is
auditable because the committee can cross-check
against the cloud API’s own usage records
(when the committee has been granted access;
otherwise the reputation organism is the only
feedback). The evidence field points into the
cloud’s API-side telemetry when available
(typically None for bare-metal, a signed
telemetry snapshot for clouds).
Usage honesty
The ComputeLog stream is what keeps a bridge
running. Every active window produces logs; the
reputation organism (chapter 7) scores them; the
scheduler committee’s next clearing consults the
score.
A bridge whose logs arrive late (after the grant deadline) indicates dropped grants. Under-reporting — the requester’s SSH session measures more cpu-seconds than the declared total — indicates cheating on billing. Over-reporting means inflated bills. None of these is caught inside Compute; the reputation organism reading the stream catches them. A bridge optimising for persistence optimises for log honesty.
Capacity refresh
In parallel with grant handling, the provider refreshes the card on a timer:
async fn refresh_provider_card(
&self, id: ProviderId,
) -> anyhow::Result<()> {
let capabilities = self.fleet.capabilities().await?;
let card = ProviderCard {
provider_id: id,
tdx_quote: self.tdx_quote.clone(),
capabilities,
declared_rates: self.config.declared_rates,
zipnet_reply: self.zipnet.reply_pointer(),
refreshed_at: self.network.almanac().tick(),
};
coalition_compute::register(
&self.network, self.compute, card,
).await.map(|_| ())
}
capacity_refresh_sec in ProviderBootConfig sets
the cadence. Too slow and rate changes take long to
reach the market; the card carries stale capacity
during fast-moving demand. Too fast and every
refresh commits to the ProviderCard collection,
driving up the module’s read-side bandwidth.
Operators settle in the 30–120 second range.
Grant failure modes
A handling that fails in any step produces a
missing or partial ComputeLog:
- Shuffle resolve failure — unseal quorum did not return cleartext, or the bearer pointer was malformed. The bridge has nothing to serve; no log is emitted; the committee scores the bridge down only if this correlates with a reputation- organism observation that the bridge should have served the grant.
- Image-hash mismatch — step 2 fires. The bridge bails; no log.
- Fleet
provision_for_grantfails — no backend could satisfy (cloud-side exhaustion; every eligible backend returns false). The bridge emits a log with empty usage to signal acknowledgement; reputation organisms can distinguish “tried but failed” from “never responded”. - Workload crashes pre-SSH — same empty-usage log.
- Workload exceeds
valid_to— the watcher terminates the instance and emits usage up tovalid_to. The requester renews.
No market-maker variant
The market-maker variant exists on the consumer
side because a market-maker’s inference cadence
differs from a searcher’s. The provider side does
not have one — the bridge serves whatever
distribution of grants the market clears to it;
rapid quote-resubmission grants and once-a-day
training grants go through the same handle_grant
path.