Appearance
Data Ingestion Module
Overview
The data-ingestion module owns global market memory and source health. It is the backend's source-of-truth layer for Hyperliquid, Polymarket, and DefiLlama context used by AI Trading and Prop Trading.
It does not own clone strategy, prompt execution, account execution truth, or lifecycle outcomes.
Responsibilities
- Store Hyperliquid asset registry, candles, optional mids, and derived features.
- Store Polymarket events/markets and asset-linked market context.
- Store DefiLlama capital, flow, throughput, and related subject metrics.
- Record per-source/channel ingestion runs.
- Provide source health and dashboard summaries.
- Provide latest market context reads for clone asset data and AI context resolution.
- Run retention/rollup maintenance for high-volume market rows.
Boundary Rules
- Source clients and source cadence belong here.
- AI Trading reads compact context from this module; it does not call external market APIs directly.
- Prop Trading reads latest execution prices from this module; it does not open market streams.
- Ingestion health is source/channel health, not worker-process health.
Runtime Flow
Long-running ingestion starts multiple schedulers:
text
Start Hyperliquid price stream
-> subscribe to configured 1-minute candle channels
-> write candles and periodic ingestion_runs
-> run Hyperliquid feature ingestor every 5 minutes
-> run Hyperliquid retention/rollup maintenance every 60 seconds
-> run DefiLlama hourly pollers
-> run Polymarket discovery and feature pollersThe shipped Hyperliquid worker currently subscribes to 1-minute candle streams. allMids parsing/storage exists, but the worker does not currently subscribe to that stream, so Prop Trading commonly uses latest candle fallback for execution price.
Main Code Paths
| Path | Purpose |
|---|---|
backend/src/modules/data-ingestion/ingestion/cadence.ts | Source/channel cadence definitions. |
backend/src/modules/data-ingestion/ingestion/dashboard.ts | Aggregated dashboard data. |
backend/src/modules/data-ingestion/ingestion/monitoring.ts | Source/channel health and stale logic. |
backend/src/modules/data-ingestion/ingestion/sources/hyperliquid/* | Hyperliquid clients, stream ingestion, features, storage policy, maintenance. |
backend/src/modules/data-ingestion/ingestion/sources/polymarket/* | Polymarket discovery, matching, and ingestion. |
backend/src/modules/data-ingestion/ingestion/sources/defillama/* | DefiLlama client and ingestion. |
backend/src/modules/data-ingestion/repositories/* | SQLite repositories for source data and ingestion runs. |
backend/src/workers/ingest.ts | Long-running ingestion process. |
State And Tables
| Source | Tables |
|---|---|
| Hyperliquid | hyperliquid_assets, hyperliquid_asset_categories, hyperliquid_asset_category_taxonomy, hyperliquid_candles, hyperliquid_mid_prices, hyperliquid_funding_pressure, hyperliquid_liquidity_profiles, hyperliquid_positioning_pressure, hyperliquid_technical_indicators, hyperliquid_volatility_profiles |
| Polymarket | polymarket_events, polymarket_markets, polymarket_market_asset_links, polymarket_immediate_markets, polymarket_later_markets, polymarket_recently_closed_markets |
| DefiLlama | defillama_capital_base, defillama_capital_flows, defillama_economic_throughput |
| Health | ingestion_runs |
Retention policy for Hyperliquid storage:
| Data | Retention |
|---|---|
| Mid prices | 30 minutes at 5-second granularity |
| 1-minute candles | 24 hours |
| 1-hour rolled-up candles | 7 days |
| Derived feature rows | 7 days |
Routes And Workers
| Route | Auth | Purpose |
|---|---|---|
GET /api/v1/assets | Bearer | Hyperliquid asset registry and category data. |
GET /api/v1/ingestion/health | Bearer | Source/channel health. |
GET /api/v1/ingestion/dashboard | Public currently | Ingestion dashboard summary. |
GET /api/v1/polymarket/markets/search | Bearer | Polymarket market search. |
GET /api/v1/defillama/subjects/search | Bearer | DefiLlama subject search. |
POST /api/v1/clones/:id/assets/:symbol/latest-data | Owner auth | Clone-aware latest data lookup for selected asset. |
Worker and one-off commands:
bash
cd backend
npm run ingest:local
npm run ingest
npm run cadence
npm run ingestion:status
npm run ingestion:check
npm run hyperliquid:maintain
npm run defillama:ingest
npm run polymarket:discoverCadence
| Source/channel | Default cadence |
|---|---|
| Hyperliquid OHLCV stream run recording | 60s when rows arrive |
| Hyperliquid technical indicators | 300s |
| Hyperliquid funding pressure | 300s |
| Hyperliquid liquidity profiles | 300s |
| Hyperliquid volatility profiles | 300s |
| Hyperliquid positioning pressure | 300s |
| Hyperliquid maintenance | 60s |
| Polymarket market discovery | 3600s |
| Polymarket immediate market context | 300s |
| Polymarket later market context | 1800s |
| DefiLlama capital base | 3600s |
| DefiLlama capital flows | 3600s |
| DefiLlama economic throughput | 3600s |
Health marks a channel stale after roughly max(cadence * 2, cadence + 5 minutes).
Failure Behavior
- Source failures record failed
ingestion_runsrows per source/channel. - DefiLlama and Polymarket schedulers use a
runningguard to avoid overlapping runs. - Hyperliquid feature ingestion catches and records failures per channel.
- The ingestion process keeps other source loops alive when one scheduled channel fails.
- Health status can be
missing,failed,stale,running, orok.
Debugging Notes
- Use
npm run ingestion:checkfor a quick source health check. - If Prop Trading lacks prices, check latest
hyperliquid_candlesfirst, thenhyperliquid_mid_prices. - If a source appears stale but the worker is running, inspect
ingestion_runsfor that source/channel rather thanworker_heartbeats. ingest:localcaps subscriptions and delays to keep local runs bounded.- Matching issues for Polymarket usually involve asset search terms, event discovery pages, or
polymarket_market_asset_links.
Tests
bash
cd backend
npm run typecheck
node --import tsx --test test/hyperliquid-*.test.ts test/polymarket-*.test.ts test/defillama-*.test.ts test/ingestion-runs.test.tsKnown Gaps
- Production alert routing, backups, and cadence tuning are tracked in Production Readiness.
- Hyperliquid
allMidsstream storage exists but is not wired into the shipped ingestion worker. - Polymarket matching and DefiLlama subject policy should evolve with product context requirements.