Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

compute-bridge

audience: ai

A TDX-attested, provider-agnostic provider for the Compute basic service. One binary, four backends: AWS, GCP, Azure, bare-metal. The fleet router picks whichever backend can satisfy each incoming grant.

This page is the browsable root of the crate; every source file is rendered on its own sub-page below.

Status. Prototype / specification sketch. Signatures match the book; bodies are TODO-stubbed against upstream crates (the coalition meta-crate lands at v0.2, coalition-compute lands at v0.6 — see roadmap).

What it does

Watches a coalition’s Compute module for grants addressed to itself. For each grant, the fleet routes to whichever configured backend can satisfy the grant’s image manifest (CPU, RAM, TDX requirement, region) and returns SSH access to the requester via a zipnet- anonymised encrypted receipt.

The provider’s honesty claim rests on its TDX measurement: anyone can verify the running binary’s MR_TD against the one declared in manifest/compute-manifest.toml, then inspect the source to conclude what that code actually does — including that the fleet router honestly reports the union of regions and capacities across every configured backend.

Flow

  1. Boot inside a TDX guest. Per-backend credentials (AWS keys, GCP service-account JSON, Azure service principal, bare-metal SSH private keys) are all measured into the boot; a curious host cannot exfiltrate them at runtime.
  2. Produce a TDX self-quote over the binary’s MR_TD plus the provider’s ed25519 public key (tdx.rs).
  3. Resolve the coalition’s Compute module (main.rs), open a zipnet channel (zipnet_io.rs), and build the Fleet from every enabled backend.
  4. Register a provider card with the quote, the union of regions and capacities, and a tdx_capable flag set when at least one backend can satisfy TDX-required grants (provider.rs).
  5. For each grant addressed to this provider:
    1. Resolve the zipnet envelope to learn the requester’s peer_id x25519 public key.
    2. Cross-check the envelope’s declared image hash against the grant’s committed image_hash.
    3. Fleet picks the first backend whose can_satisfy(grant) is true; that backend provisions a workload.
    4. Seal an SSH-access receipt (host, per-grant private key, valid_to) to the requester’s x25519 public key (receipt.rs) and reply via zipnet.
    5. Spawn a watcher that emits the ComputeLog when the instance exits or the grant’s deadline passes.

Layout

compute-bridge/
  Cargo.toml
  README.md
  manifest/
    compute-manifest.toml
  src/
    main.rs
    config.rs
    tdx.rs
    provider.rs
    zipnet_io.rs
    receipt.rs
    backends/
      mod.rs
      aws.rs
      gcp.rs
      azure.rs
      baremetal.rs

Files

Inputs

Operator-supplied at boot and measured into the TDX quote:

  • Coalition configuration — the CoalitionConfig fingerprint to join and the Compute module’s ConfluenceConfig to register with.
  • Per-backend credentials — at least one of the four backends must be configured. See config.md for the exact shapes.

Outputs

Two outputs, both via zipnet:

  • Provider card — published to the Compute module’s Collection<ProviderId, ProviderCard> at boot and refreshed on capacity change. Folds the TDX quote, the per-backend capability summary, and the current real-time capacity telemetry.
  • SSH access receipts — one per grant, encrypted to the requester’s peer_id x25519 public key and posted to the zipnet reply stream matching the grant.

Launching the bridge — the operator’s flow

A compute-bridge operator’s total interface with the stack is: attach keys, watch a dashboard. Six steps, start to finish; after step 5, day-to-day operation is step 6 only.

  1. Build or download the TDX image for this crate. The build produces a post-hash (MR_TD); the operator notes it.

  2. Write backends.toml with whichever subset of the four backends the operator is enabling:

    # aws (optional) — keys + regions + instance families
    [aws]
    access_key_id     = "…"
    secret_access_key = "…"
    regions           = ["us-east-1", "eu-west-1"]
    instance_families = ["m6i", "c6i"]
    max_concurrent_instances = 20
    
    # gcp (optional) — service-account JSON path + regions + machine types
    [gcp]
    service_account_key_path = "/keys/gcp.json"
    project_id               = "infer-corp-prod"
    regions                  = ["europe-west1"]
    machine_families         = ["n2-standard-4"]
    tdx_machine_types        = ["c3-standard-4"]  # optional, enables TDX
    max_concurrent_instances = 20
    
    # azure (optional) — service principal + RG + VM sizes
    [azure]
    tenant_id       = "…"
    client_id       = "…"
    client_secret   = "…"
    subscription_id = "…"
    resource_group  = "infer-bridge"
    regions         = ["westus2"]
    vm_sizes        = ["Standard_D4s_v5"]
    tdx_vm_sizes    = ["Standard_DC4ads_v5"]
    max_concurrent_instances = 20
    
    # baremetal (optional) — SSH root to operator hosts
    [baremetal]
    machines = [
      { host = "10.0.1.5", port = 22, user = "root",
        ssh_key_path = "/keys/bare-01",
        region = "home-01",
        cpu_millicores = 32000, ram_mib = 131072,
        tdx_capable = true, tdx_mrtd_compat = "0x…" },
    ]
    

    The operator enables one or many; the bridge refuses to start if zero are configured. The only secrets involved are cloud API keys or bare-metal SSH private keys — no other cred store, no runtime secret fetch.

  3. Write coalition.toml pointing at the coalition to register with. This is the CoalitionConfig the operator got from the coalition’s handshake page; one file, compiled in verbatim.

  4. Write dashboard.toml with a port and a per-backend rate table (operators who want cost estimates on the dashboard supply the $/core-hour and $/GiB-hour rates they expect to pay on each cloud; bare-metal typically defaults to 0).

  5. Boot inside TDX. The TDX loader measures every config file above into the MR_TD; the bridge registers its provider card with the Compute module, attaches the TDX quote, and starts serving grants. Subscribers in the coalition see the new provider card immediately.

    export COALITION_CONFIG_PATH=/etc/compute-bridge/coalition.toml
    export BACKENDS_CONFIG_PATH=/etc/compute-bridge/backends.toml
    export DASHBOARD_CONFIG_PATH=/etc/compute-bridge/dashboard.toml
    compute-bridge
    
  6. Watch the dashboard. Port-forward and open:

    ssh -L 8080:localhost:8080 operator@tdx-host
    open http://localhost:8080/dashboard
    

    From this point on the operator’s job is only to watch. The dashboard updates live; the bridge handles every incoming grant, every provisioning call, every usage log. See src/dashboard.rs for what the dashboard exposes.

Active subscriptions — agents that bonded this bridge

An “active subscriber” is a coalition agent whose own driver has bonded this bridge’s provider card in the Compute module’s Collection<ProviderId, ProviderCard>. Subscribing means: the agent trusts this bridge’s claims (MR_TD match, regions, capacity), is willing to route grants to it, and may settle the coordination market’s clearings in its favour.

The bridge’s responsibility toward subscribers:

  • Stay registered. Refresh the provider card on a declared cadence so subscribers do not see it go stale.
  • Honour declared capacity. The fleet’s can_satisfy(grant) must not over-commit; the dashboard surfaces when the provider is approaching its limit so the operator can enable another backend or decline grants.
  • Emit honest usage logs. Each grant’s ComputeLog lets subscribers scoring providers (via a reputation organism) score accurately.
  • Emit a retirement marker on shutdown. Subscribers rebind to a successor bridge, if named, rather than timing out.

The subscriber count shown on the dashboard is the operator’s signal of product-market fit for their specific bridge: more regions + more TDX capacity + lower declared cost usually attracts more subscribers over time.

Dashboard — what the operator watches

The operator’s dashboard is specified on its own page: src/dashboard.rs. Summary of what appears:

  • Provider registration status + MR_TD match.
  • Active subscriber count.
  • Active grants / lifetime total.
  • Per-backend capacity vs current utilisation.
  • Window usage (cpu-hours, GiB-hours, network GB).
  • Window cost estimate (from the operator’s rate table).
  • Window revenue settled (from market clearings).
  • Ring buffer of recent events, all redacted of requester identity.

What the dashboard deliberately does not show: requester ClientId, peer_id, prompts, image contents, or per-settlement attribution. The zipnet privacy contract applies to the operator too — the operator sees flow, not identities.

Adding a bare-metal machine

“Adding a bare-metal machine” is exactly: the operator gives the bridge SSH root access to a host and declares its shape in the boot config. No cloud call is involved. The bridge maintains control-plane SSH sessions to each machine for the duration of the process; grants are satisfied by systemd-run on bare VMs or virsh plus qemu-tdx on bare-TDX hosts.

Bare-TDX hosts are the route for TDX-required grants in environments where cloud TDX is not yet available — AWS-only operators, operators running their own data centres, operators with early-access TDX hardware. The nested guest’s MR_TD flows back to the requester as part of the SSH receipt.

See backend-baremetal.md for the full provision flow.

Why zipnet for I/O

The requester’s identity is a mosaik ClientId. The provider receives a grant identifying the requester only by a zipnet-rotated sealed envelope; it never learns the requester’s coalition-level identity, only the peer_id inside the envelope. The encrypted receipt goes back through the same channel.

Consequences:

  • A curious provider cannot enumerate which coalition members are running what workloads.
  • A compromised provider cannot selectively deny service to specific coalition members by their coalition identity — they are not visible at the provider layer.
  • A coalition can run multiple compute-bridge instances under different operators without cross- provider identity leakage.

Threat model (short version)

  • Operator runs the binary in TDX. If not, the provider card’s claimed MR_TD does not match the running image; the Compute module rejects registration.
  • Per-backend credentials are operator-scoped. Grants in flight when credentials are revoked fail to provision; the Compute committee observes the usage-log absence and scores the provider down.
  • Zipnet carries requester identity. The provider never learns the coalition ClientId, only the ephemeral peer_id. Receipt encryption is to the peer_id’s x25519 public key, published in the zipnet envelope.
  • Cloud and bare-host infrastructure is not trusted with requester data unless the backend is TDX- capable and the grant required TDX. Non-TDX workloads treat the backend as a semi-trusted host; TDX workloads verify the nested MR_TD before trusting outputs.

A full threat analysis belongs in the book when this example graduates from prototype.