Skip to content

Data Ingestion Module

Overview

The data-ingestion module owns global market memory and source health. It is the backend's source-of-truth layer for Hyperliquid, Polymarket, and DefiLlama context used by AI Trading and Prop Trading.

It does not own clone strategy, prompt execution, account execution truth, or lifecycle outcomes.

Responsibilities

  • Store Hyperliquid asset registry, candles, optional mids, and derived features.
  • Store Polymarket events/markets and asset-linked market context.
  • Store DefiLlama capital, flow, throughput, and related subject metrics.
  • Record per-source/channel ingestion runs.
  • Provide source health and dashboard summaries.
  • Provide latest market context reads for clone asset data and AI context resolution.
  • Run retention/rollup maintenance for high-volume market rows.

Boundary Rules

  • Source clients and source cadence belong here.
  • AI Trading reads compact context from this module; it does not call external market APIs directly.
  • Prop Trading reads latest execution prices from this module; it does not open market streams.
  • Ingestion health is source/channel health, not worker-process health.

Runtime Flow

Long-running ingestion starts multiple schedulers:

text
Start Hyperliquid price stream
-> subscribe to configured 1-minute candle channels
-> write candles and periodic ingestion_runs
-> run Hyperliquid feature ingestor every 5 minutes
-> run Hyperliquid retention/rollup maintenance every 60 seconds
-> run DefiLlama hourly pollers
-> run Polymarket discovery and feature pollers

The shipped Hyperliquid worker currently subscribes to 1-minute candle streams. allMids parsing/storage exists, but the worker does not currently subscribe to that stream, so Prop Trading commonly uses latest candle fallback for execution price.

Main Code Paths

PathPurpose
backend/src/modules/data-ingestion/ingestion/cadence.tsSource/channel cadence definitions.
backend/src/modules/data-ingestion/ingestion/dashboard.tsAggregated dashboard data.
backend/src/modules/data-ingestion/ingestion/monitoring.tsSource/channel health and stale logic.
backend/src/modules/data-ingestion/ingestion/sources/hyperliquid/*Hyperliquid clients, stream ingestion, features, storage policy, maintenance.
backend/src/modules/data-ingestion/ingestion/sources/polymarket/*Polymarket discovery, matching, and ingestion.
backend/src/modules/data-ingestion/ingestion/sources/defillama/*DefiLlama client and ingestion.
backend/src/modules/data-ingestion/repositories/*SQLite repositories for source data and ingestion runs.
backend/src/workers/ingest.tsLong-running ingestion process.

State And Tables

SourceTables
Hyperliquidhyperliquid_assets, hyperliquid_asset_categories, hyperliquid_asset_category_taxonomy, hyperliquid_candles, hyperliquid_mid_prices, hyperliquid_funding_pressure, hyperliquid_liquidity_profiles, hyperliquid_positioning_pressure, hyperliquid_technical_indicators, hyperliquid_volatility_profiles
Polymarketpolymarket_events, polymarket_markets, polymarket_market_asset_links, polymarket_immediate_markets, polymarket_later_markets, polymarket_recently_closed_markets
DefiLlamadefillama_capital_base, defillama_capital_flows, defillama_economic_throughput
Healthingestion_runs

Retention policy for Hyperliquid storage:

DataRetention
Mid prices30 minutes at 5-second granularity
1-minute candles24 hours
1-hour rolled-up candles7 days
Derived feature rows7 days

Routes And Workers

RouteAuthPurpose
GET /api/v1/assetsBearerHyperliquid asset registry and category data.
GET /api/v1/ingestion/healthBearerSource/channel health.
GET /api/v1/ingestion/dashboardPublic currentlyIngestion dashboard summary.
GET /api/v1/polymarket/markets/searchBearerPolymarket market search.
GET /api/v1/defillama/subjects/searchBearerDefiLlama subject search.
POST /api/v1/clones/:id/assets/:symbol/latest-dataOwner authClone-aware latest data lookup for selected asset.

Worker and one-off commands:

bash
cd backend
npm run ingest:local
npm run ingest
npm run cadence
npm run ingestion:status
npm run ingestion:check
npm run hyperliquid:maintain
npm run defillama:ingest
npm run polymarket:discover

Cadence

Source/channelDefault cadence
Hyperliquid OHLCV stream run recording60s when rows arrive
Hyperliquid technical indicators300s
Hyperliquid funding pressure300s
Hyperliquid liquidity profiles300s
Hyperliquid volatility profiles300s
Hyperliquid positioning pressure300s
Hyperliquid maintenance60s
Polymarket market discovery3600s
Polymarket immediate market context300s
Polymarket later market context1800s
DefiLlama capital base3600s
DefiLlama capital flows3600s
DefiLlama economic throughput3600s

Health marks a channel stale after roughly max(cadence * 2, cadence + 5 minutes).

Failure Behavior

  • Source failures record failed ingestion_runs rows per source/channel.
  • DefiLlama and Polymarket schedulers use a running guard to avoid overlapping runs.
  • Hyperliquid feature ingestion catches and records failures per channel.
  • The ingestion process keeps other source loops alive when one scheduled channel fails.
  • Health status can be missing, failed, stale, running, or ok.

Debugging Notes

  • Use npm run ingestion:check for a quick source health check.
  • If Prop Trading lacks prices, check latest hyperliquid_candles first, then hyperliquid_mid_prices.
  • If a source appears stale but the worker is running, inspect ingestion_runs for that source/channel rather than worker_heartbeats.
  • ingest:local caps subscriptions and delays to keep local runs bounded.
  • Matching issues for Polymarket usually involve asset search terms, event discovery pages, or polymarket_market_asset_links.

Tests

bash
cd backend
npm run typecheck
node --import tsx --test test/hyperliquid-*.test.ts test/polymarket-*.test.ts test/defillama-*.test.ts test/ingestion-runs.test.ts

Known Gaps

  • Production alert routing, backups, and cadence tuning are tracked in Production Readiness.
  • Hyperliquid allMids stream storage exists but is not wired into the shipped ingestion worker.
  • Polymarket matching and DefiLlama subject policy should evolve with product context requirements.