Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

src/dashboard.rs

audience: ai

The operator’s read-only view of the bridge. Binds to 127.0.0.1:<port> inside the TDX guest. The operator reaches it by SSH-forwarding the port from their workstation; no authentication beyond the SSH login. No external port is ever opened.

What the dashboard shows

Aggregate state only. No per-grant requester identity, no peer_id, no prompt or image contents, no market settlement hashes beyond count. The dashboard enforces the zipnet privacy contract on the operator just as the provider enforces it on the submitter: the operator sees how their infrastructure is used, not who is using it.

Fields exposed per snapshot:

  • Provider statusregistered / alive / mrtd_ok.
  • Coalition name and Compute module version — so the operator can confirm they booted against the right handshake.
  • Active subscribers — how many coalition agents have bonded this bridge’s provider card in the Compute module’s Collection<ProviderId, ProviderCard>. Trends up as word spreads; trends down when agents retire or unpin.
  • Active grants vs the provider’s declared concurrency cap.
  • Lifetime grants served.
  • Per-backend capacity and utilisation — for each enabled backend: declared capacity, active grant count, window cpu-core-seconds, window ram-mib-seconds, window network bytes.
  • Window cost estimate — sum over backends of core-hours × $/core-hour + GiB-hours × $/GiB-hour using the operator’s rate table.
  • Window revenue settled — sum of coordination- market clearings credited to this bridge.
  • Recent events (ring buffer, last 50) — all redacted of requester identity: grant accepted (backend only), grant completed (usage delta only), provider card refreshed, subscriber joined / left.

ASCII mock

  compute-bridge dashboard
  ════════════════════════════════════════════════════════════════

  Provider:   <provider_id_prefix>          Status:   registered
  Coalition:  infer.corp                    Module:   Compute v0.6
  Uptime:     4d 02h                        MR_TD:    ok

  Subscribers (coalition agents bonded to this card):  27
  Active grants:                                        3 / 20
  Total grants served (lifetime):                   1,842

  ───────────────────────────────  Backends  ────────────────────────
  aws         [eu-west-1, us-east-1]     cap=80c/320G  used=30c/128G
  gcp         [europe-west1]             cap=32c/128G  used=0
  azure       (disabled)
  baremetal   [home-01 tdx-capable]      cap=32c/128G  used=32c/128G

  ─────────────────────────────  Last 24h  ─────────────────────────
  Grants served:        412
  Usage:   cpu=18,420 core-h   ram=73,680 GiB-h   net=412 GB
  Idle:                 24%
  Estimated cost:       $147.23   (aws $94, gcp $0, azure $0, bare $0)
  Settled revenue:      $228.71   (market: infer-market clearings)
  Net:                  +$81.48   (55% margin)

  ──────────────────────────  Recent events  ──────────────────────
  [t-2m]   Grant accepted on aws:us-east-1
  [t-5m]   Grant completed; usage 40 c-min, 160 GiB-min
  [t-1h]   Provider card refreshed
  [t-3h]   Subscriber joined
  ...

Two HTTP routes

  • GET /dashboard — HTML view (the ASCII layout above but rendered). No JavaScript; the page refreshes every capacity_refresh_sec.
  • GET /snapshot.json — the full DashboardSnapshot as JSON. Operators who want to plumb the bridge into their own observability stack poll this.

What is deliberately not exposed

  • The requester’s ClientId, peer_id, or x25519 public key.
  • The image post-hash of any individual workload.
  • The prompt / input / output of any workload.
  • The coordination market’s per-settlement attribution (which requester paid what for which grant). The dashboard sees only aggregate revenue per window.

These are never in the dashboard’s memory; the bridge never pulls them out of the zipnet envelope or the market’s commit stream into process-local state.

Attaching the dashboard to other observability

Operators who run fleet-wide monitoring (many bridges under one operator) can:

  • Scrape /snapshot.json over the SSH tunnel.
  • Export a small set of Prometheus counters from the same process (not yet implemented in this prototype; see dashboard.rs for the extension point).
  • Forward aggregate metrics to an external TSDB without breaking the privacy contract — the exported metrics match the dashboard’s fields exactly.
//! Local operator dashboard.
//!
//! Boots alongside the provider loop and serves a small
//! read-only view at `http://127.0.0.1:<dashboard.port>`
//! from **inside** the TDX guest. The operator reaches
//! it by SSH-forwarding the port:
//!
//! ```shell
//! ssh -L 8080:localhost:8080 operator@tdx-host
//! open http://localhost:8080/dashboard
//! ```
//!
//! ## What it shows
//!
//! Aggregate state only. No per-grant identity, no
//! `peer_id`, no settlement hashes beyond count. The
//! dashboard enforces the zipnet privacy contract on
//! the *operator* just as the provider enforces it on
//! the submitter: the operator sees how their
//! infrastructure is used, not who is using it.
//!
//! Fields exposed:
//!
//! - Provider status: registered / alive / MR_TD match.
//! - Active grant count + lifetime total.
//! - Active-subscriber count (coalition agents bonded
//!   to this bridge's provider card in the Compute
//!   module's `Collection<ProviderId, ProviderCard>`).
//! - Per-backend capacity vs utilised.
//! - Last-window usage (cpu-hours, ram GiB-hours,
//!   network GB).
//! - Last-window cost estimate (per backend at the
//!   operator-supplied rate table).
//! - Last-window settled revenue (sum of market-
//!   clearing amounts; no per-requester detail).
//! - Ring-buffer of recent events, each redacted of
//!   requester identity.
//!
//! What is **not** shown: who submitted, what prompt
//! ran, which image contents executed. Those never
//! reached the dashboard's memory.

use std::sync::Arc;

use anyhow::Context;
use serde::Serialize;
use tokio::sync::RwLock;

use coalition_compute::{AlmanacTick, RequestId};

use crate::backends::{Capabilities, Fleet};
use crate::config::DashboardBootConfig;

pub struct Dashboard {
    cfg: DashboardBootConfig,
    state: Arc<RwLock<DashboardState>>,
    fleet: Fleet,
}

impl Dashboard {
    pub fn new(cfg: &DashboardBootConfig, fleet: Fleet) -> Self {
        Self {
            cfg: cfg.clone(),
            state: Arc::new(RwLock::new(DashboardState::default())),
            fleet,
        }
    }

    /// Spawn the dashboard HTTP server. Binds to
    /// 127.0.0.1:<port>; the TDX guest's firewall is
    /// configured to refuse external traffic to this
    /// port, so an operator's SSH port-forward is the
    /// only access path.
    pub async fn spawn(&self) -> anyhow::Result<()> {
        let addr = format!("127.0.0.1:{}", self.cfg.port);
        tracing::info!(addr = %addr, "dashboard listening");
        // TODO: bring up an axum / hyper server with two
        // routes:
        //   GET /dashboard       → HTML view
        //   GET /snapshot.json   → DashboardSnapshot JSON
        //
        // Every request is unauthenticated because the
        // bind is 127.0.0.1 and the only path in is the
        // operator's SSH tunnel.
        anyhow::bail!(
            "Dashboard::spawn is a prototype stub; wire up a \
             localhost-only axum server with two routes"
        )
    }

    /// Called by the provider loop every time something
    /// observable happens. The dashboard's state is the
    /// only in-memory sink; raw events are never
    /// persisted.
    pub async fn record(&self, ev: DashboardEvent) {
        let mut s = self.state.write().await;
        match ev {
            DashboardEvent::SubscriberJoined          => s.subscribers = s.subscribers.saturating_add(1),
            DashboardEvent::SubscriberLeft            => s.subscribers = s.subscribers.saturating_sub(1),
            DashboardEvent::GrantAccepted { backend } => {
                s.active_grants = s.active_grants.saturating_add(1);
                s.lifetime_grants = s.lifetime_grants.saturating_add(1);
                s.bump_backend(&backend, |b| b.active_grants += 1);
            }
            DashboardEvent::GrantCompleted { backend, usage } => {
                s.active_grants = s.active_grants.saturating_sub(1);
                s.bump_backend(&backend, |b| {
                    if b.active_grants > 0 { b.active_grants -= 1; }
                    b.window_cpu_core_seconds += usage.cpu_core_seconds;
                    b.window_ram_mib_seconds  += usage.ram_mib_seconds;
                    b.window_net_bytes        += usage.net_bytes;
                });
            }
            DashboardEvent::RevenueSettled { usd } => {
                s.window_revenue_usd += usd;
            }
            DashboardEvent::ProviderCardRefreshed => {
                s.last_card_refresh = Some(chrono_like_now());
            }
        }
    }

    /// Build a snapshot for the HTTP view.
    pub async fn snapshot(&self) -> anyhow::Result<DashboardSnapshot> {
        let s = self.state.read().await;
        let caps = self.fleet.capabilities().await?;

        let mut backends = Vec::with_capacity(caps.len());
        for (name, c) in caps {
            let b = s.backends.iter().find(|b| b.name == name);
            backends.push(BackendSnapshot {
                name,
                capabilities: c,
                active_grants:          b.map(|b| b.active_grants).unwrap_or(0),
                window_cpu_core_seconds: b.map(|b| b.window_cpu_core_seconds).unwrap_or(0),
                window_ram_mib_seconds:  b.map(|b| b.window_ram_mib_seconds).unwrap_or(0),
                window_net_bytes:        b.map(|b| b.window_net_bytes).unwrap_or(0),
            });
        }

        Ok(DashboardSnapshot {
            provider_status: "registered".into(),
            coalition_name:  self.cfg.coalition_name.clone(),
            mrtd_ok:         true,
            subscribers:     s.subscribers,
            active_grants:   s.active_grants,
            lifetime_grants: s.lifetime_grants,
            backends,
            window_revenue_usd: s.window_revenue_usd,
            window_cost_estimated_usd: s.estimated_cost_usd(&self.cfg.rate_table),
            last_card_refresh: s.last_card_refresh,
        })
    }
}

// -----------------------------------------------------
// State & events
// -----------------------------------------------------

#[derive(Default)]
struct DashboardState {
    subscribers:        u32,
    active_grants:      u32,
    lifetime_grants:    u64,
    window_revenue_usd: f64,
    last_card_refresh:  Option<u64>,
    backends:           Vec<PerBackend>,
}

impl DashboardState {
    fn bump_backend(&mut self, name: &str, f: impl FnOnce(&mut PerBackend)) {
        if let Some(b) = self.backends.iter_mut().find(|b| b.name == name) {
            f(b);
            return;
        }
        let mut b = PerBackend { name: name.into(), ..Default::default() };
        f(&mut b);
        self.backends.push(b);
    }

    fn estimated_cost_usd(&self, rate_table: &RateTable) -> f64 {
        self.backends.iter()
            .map(|b| rate_table.cost_for(&b.name, b.window_cpu_core_seconds, b.window_ram_mib_seconds))
            .sum()
    }
}

#[derive(Default)]
struct PerBackend {
    name: String,
    active_grants: u32,
    window_cpu_core_seconds: u64,
    window_ram_mib_seconds: u64,
    window_net_bytes: u64,
}

pub enum DashboardEvent {
    SubscriberJoined,
    SubscriberLeft,
    GrantAccepted { backend: String },
    GrantCompleted { backend: String, usage: UsageDelta },
    RevenueSettled { usd: f64 },
    ProviderCardRefreshed,
}

pub struct UsageDelta {
    pub cpu_core_seconds: u64,
    pub ram_mib_seconds:  u64,
    pub net_bytes:        u64,
}

// -----------------------------------------------------
// Snapshot — JSON shape returned by /snapshot.json
// -----------------------------------------------------

#[derive(Serialize)]
pub struct DashboardSnapshot {
    pub provider_status: String,
    pub coalition_name:  String,
    pub mrtd_ok:         bool,
    pub subscribers:     u32,
    pub active_grants:   u32,
    pub lifetime_grants: u64,
    pub backends:        Vec<BackendSnapshot>,
    pub window_revenue_usd: f64,
    pub window_cost_estimated_usd: f64,
    pub last_card_refresh: Option<u64>,
}

#[derive(Serialize)]
pub struct BackendSnapshot {
    pub name: String,
    pub capabilities: Capabilities,
    pub active_grants: u32,
    pub window_cpu_core_seconds: u64,
    pub window_ram_mib_seconds: u64,
    pub window_net_bytes: u64,
}

// -----------------------------------------------------
// Rate table (operator-supplied)
// -----------------------------------------------------

#[derive(Clone, Debug, Default, serde::Deserialize)]
pub struct RateTable {
    /// $/core-hour per backend. Missing key defaults to 0
    /// (bare-metal = 0 marginal cost unless the operator
    /// declares their power/cooling amortisation).
    pub core_hour_usd: std::collections::BTreeMap<String, f64>,
    /// $/GiB-hour per backend.
    pub ram_gib_hour_usd: std::collections::BTreeMap<String, f64>,
}

impl RateTable {
    fn cost_for(&self, backend: &str, cpu_s: u64, ram_mib_s: u64) -> f64 {
        let core_h = cpu_s as f64 / 3600.0;
        let gib_h  = (ram_mib_s as f64 / 1024.0) / 3600.0;
        let core_rate = *self.core_hour_usd.get(backend).unwrap_or(&0.0);
        let gib_rate  = *self.ram_gib_hour_usd.get(backend).unwrap_or(&0.0);
        core_h * core_rate + gib_h * gib_rate
    }
}

// Placeholder for SystemTime::now-ish; the real impl uses
// the Almanac tick if one is available, falling back to
// SystemTime.
fn chrono_like_now() -> u64 {
    std::time::SystemTime::now()
        .duration_since(std::time::UNIX_EPOCH)
        .map(|d| d.as_secs())
        .unwrap_or(0)
}

Up: compute-bridge.