Skip to content

Operations / Workers

Overview

Operations owns backend entrypoints, local commands, worker process shape, migrations, heartbeat reporting, and health diagnostics. Business rules stay in the modules each worker calls.

Use backend/RUNBOOK.md for step-by-step local commands and incident checks.

Responsibilities

  • Route CLI commands from backend/src/index.ts.
  • Run SQLite migrations before API/worker work.
  • Provide long-running and run-once worker entrypoints.
  • Record worker heartbeats for schedulers that use instrumentRun.
  • Expose internal worker health and ingestion health surfaces.
  • Keep worker cadence and process defaults in environment config.

Boundary Rules

  • Workers orchestrate module services; they should not contain durable product rules.
  • Worker heartbeat state belongs to Operations; module state belongs to module-owned tables.
  • Worker commands should be safe to run locally after npm run db:migrate.
  • Run-once commands should reuse the same service path as long-running workers.

Runtime Flow

Most workers follow this shape:

text
Load env
-> open SQLite
-> run migrations
-> construct module dependencies
-> run one cycle immediately
-> repeat on interval unless run-once
-> log cycle summary or error
-> close database on shutdown

ingest is the exception: it starts multiple source schedulers and stream clients, then stops them on shutdown. It records source health in ingestion_runs, not worker_heartbeats.

Main Code Paths

PathPurpose
backend/src/index.tsCLI command router for API, migrations, workers, status checks, and one-off commands.
backend/src/app/env.tsWorker interval defaults and runtime config parsing.
backend/src/observability/worker-heartbeats.tsHeartbeat insert/update helper and health read model.
backend/src/workers/ingest.tsLong-running ingestion process.
backend/src/workers/decisions.tsClone-scoped debug decision worker.
backend/src/workers/prop-decisions.tsProp-account decision worker.
backend/src/workers/mark-to-market.tsProp-account mark-to-market worker.
backend/src/workers/backtesting.tsBacktest queue worker.
backend/src/workers/audition-evaluator.tsDue-audition evaluator worker.
backend/src/workers/arena-entry.tsPromotion Queue / Arena Entry sync worker.
backend/src/db/migrator.tsSQLite migration runner.

State And Tables

TableOwnerPurpose
worker_heartbeatsOperationsLast start/completion/success/error, status, summary, latency, consecutive errors, total runs/errors.
ingestion_runsData IngestionSource/channel run status and freshness for source schedulers.
backtest_jobsBacktestingQueue lease, progress, attempt, result, and failure state for async backtests.

Workers also write module-owned tables such as clone_decision_runs, prop_*, audition_runs, promotion_queue_entries, and arena_clone_memberships.

Routes And Workers

For local single-terminal runtime work, npm run workers:all starts the non-ingestion runtime workers: prop decisions, mark-to-market, backtests, audition evaluation, and Arena Entry sync.

WorkerCommandDefault cadenceUnit of workHealth behavior
Ingestionnpm run ingest / npm run ingest:localSource-specificHyperliquid stream/features/maintenance, DefiLlama, PolymarketRecords ingestion_runs; no worker_heartbeats row.
Clone decisionsnpm run decisions:worker60sActive clone decision subjects, or --clone-id subsetWhole cycle heartbeat as clone_decisions.
Prop decisionsnpm run prop:decisions:worker60sActive prop accounts, default limit 100Whole cycle heartbeat as prop_decisions; per-account failures are logged and isolated.
Mark-to-marketnpm run prop:mark-to-market:worker60sActive prop accounts, default limit 100Heartbeat as prop_mark_to_market; per-account failures become ledger/audit outcomes where possible.
Backtestsnpm run backtests:worker15sLeased queued backtest jobsHeartbeat as backtesting; leases default 5 minutes, max concurrent 2, max per user 1.
Audition evaluatornpm run lifecycle:auditions:worker5mDue audition runsHeartbeat as lifecycle_audition_evaluator.
Arena Entrynpm run lifecycle:arena-entry:worker5mPassed auditions into Queue/Arena slotsHeartbeat as lifecycle_arena_entry.

Run-once commands:

bash
cd backend
npm run decisions:run-once -- --clone-id=42 --force
npm run prop:decisions:run-once
npm run prop:mark-to-market:once
npm run backtests:run-once
npm run lifecycle:auditions:evaluate-due
npm run lifecycle:arena-entry:run-once

Health routes:

RouteAuthPurpose
GET /api/v1/internal/health/workersService tokenWorker heartbeat health and staleness.
GET /api/v1/ingestion/healthBearer authSource/channel ingestion health.
GET /api/v1/ingestion/dashboardPublic currentlySource dashboard summary.

Failure Behavior

  • instrumentRun marks a worker running at cycle start and succeeded or failed at completion.
  • Failed cycles increment consecutive_errors and preserve the last error string.
  • Backtest workers requeue expired running jobs when leases lapse.
  • Prop decision worker catches failures per account so one bad account does not stop the cycle.
  • Prop mark-to-market catches account-level failures and records failure details when possible.
  • Source ingestion records source/channel failures in ingestion_runs; source schedulers use in-process running flags to avoid overlapping runs.
  • There is no general dead-letter queue yet.

Debugging Notes

  • npm run cadence prints ingestion cadence definitions.
  • npm run ingestion:status and npm run ingestion:check summarize source health.
  • GET /api/v1/internal/health/workers shows worker staleness and consecutive errors.
  • If a worker looks alive but no module state changes, check the module's due/lease/idempotency condition before assuming the process is stuck.
  • For backtests, check backtest_jobs.leased_by, lease_expires_at_ms, attempt_count, status, and progress_pct.

Tests

Worker tests are intentionally split by blast radius:

LayerScopeDefault use
Entry smokeBoot each run-once worker against an empty migrated SQLite database.Fast guard that worker composition still loads.
Worker smoke E2ESeed one realistic domain condition, run the worker once, then assert module-owned state and the heartbeat row.Main local/CI confidence check for worker behavior.
Process smokeStart API and/or long-running worker processes as child processes, wait for readiness or heartbeat, then shut down.Slower optional check for CLI/process wiring, not required for the default suite.

Current worker smoke E2E scenarios:

WorkerGivenWhenThen
Prop mark-to-marketActive prop account with an open BTC position and a newer mid price.runMarkToMarketWorker({ once: true }).Balance equity is repriced and prop_mark_to_market heartbeat succeeds.
Prop decisionsActive prop account with a default strategy draft and noop model provider.runPropDecisionWorker({ once: true }).A prop-account decision run is recorded with a skipped hold action and prop_decisions heartbeat succeeds.
Audition evaluatorRunning audition whose window has elapsed.runAuditionEvaluatorWorker({ once: true }).Audition transitions to passed and lifecycle_audition_evaluator heartbeat succeeds.
Arena EntryPassed audition with an approved public profile.runArenaEntryWorker({ once: true }).Clone is promoted into Arena membership and lifecycle_arena_entry heartbeat succeeds.
BacktestsQueued backtest job with a default strategy graph.runBacktestWorker({ once: true }).Job completes through the queue path, metrics are written, and backtesting heartbeat succeeds.
bash
cd backend
npm run typecheck
npm run test:workers:smoke

Known Gaps

  • Production process supervision and alert routing.
  • Generic retry/dead-letter handling for non-backtest workers.
  • Backup/restore checks and reconciliation dashboards.
  • Pending-purchase and payout reconciliation workers for future Economy V2.