SQLite + R2 Market Memory Plan

This note captures the current direction for replacing the backend market data stream and clone decision context path. It is a planning document, not an implementation contract.

For review-level docs, start with:

The implementation contract for clone ownership, runtime context, pipeline configuration, and trading execution lives in Backend Ownership And Runtime Contract.

Goal

Replace the current Supabase/Pinecone-heavy backend path with a simpler structured data pipeline:

text

source APIs and streams
  -> normalized structured observations
  -> structured features
  -> deterministic decision context
  -> one decision engine call
  -> trading actions and ledger

The clone decision flow should not use AI-driven data retrieval for normal trading. The backend should assemble the relevant context first, then feed it into the model.

System Split

Use SQLite as the backend source of truth:

Normalized market observations
Feature buckets
Clone profiles
Trading and context configuration
Workflow subscriptions and context profiles
Decision runs and action indexes
Orders, fills, positions, ledger entries
Replay/backtest-oriented metadata
Frontend-facing summary state when useful

Supabase can remain temporarily for auth and migration compatibility, but the target architecture should not store clone configuration, market memory, generated analysis artifacts, or trading state in Supabase. The frontend should eventually read and write backend state through the backend API, which persists to SQLite.

Use R2 as the cheap immutable archive:

Raw source payloads when useful for audit or replay
Full prompt/context/response payloads
Large reason/debug traces
SQLite backups or cold monthly archives

SQLite should store compact pointers to R2 objects, not large blobs.

Deployment Shape

Start with one backend box. Keep separate processes, but colocate them with SQLite:

text

backend-api
ingestion-worker
decision-scheduler
decision-worker
trade-executor
backup-sync

This keeps operations simple and keeps SQLite close to the writers. Split onto more machines later only when there is real pressure from ingestion, decision-worker concurrency, trading isolation, SQLite contention, or availability requirements.

Source Coverage

The current product model already names these source/channel families:

Hyperliquid:

Liquidity Profile
Positioning Pressure
Volatility Profile
Funding Pressure

Polymarket:

Recently Closed Markets
Immediate Market
Later Market

DefiLlama:

Capital Base
Capital Flows
Economic Throughput

These should be stored as structured feature rows, not prose blobs.

Retention And Granularity

Market data is global, not per user. Clone subscriptions should filter shared data instead of duplicating data per clone.

Suggested Hyperliquid price hot retention policy:

text

last 30 minutes: 5-second allMids midpoint buckets
last 24 hours: 1-minute source candles
last 7 days: 1-hour granularity
older than 7 days: delete or archive to R2

For Hyperliquid price rows across all currently streamed symbols, this is still modest. At roughly 500-600 symbols:

text

5s mids for 30m: 360 buckets * 600 = 216,000 rows
1m for 24h: 1,440 buckets * 600 = 864,000 rows
1h for 7d: 168 buckets * 600 = 100,800 rows
total: about 1,180,800 price rows

With fixed-point OHLC columns and indexes, that should usually stay in the low hundreds of MB, not multi-GB. The exact size depends on symbol count, indexes, WAL/checkpoint behavior, and whether liquidity/orderbook-derived features are stored compactly or with large JSON payloads.

Feature families can follow the same idea. Avoid storing full raw Hyperliquid L2/orderbook updates in SQLite forever. Store derived liquidity profiles and periodic/change-threshold snapshots. Archive raw payloads to R2 only when useful.

Hyperliquid v1 channel origin:

text

source-backed:
- mid prices: allMids WebSocket stream bucketed at 5-second granularity
- candles: 1-minute candle WebSocket subscriptions
- liquidity profile: l2Book REST snapshots, stored as compact spread/depth/imbalance rows
- funding pressure: metaAndAssetCtxs + predictedFundings REST polls

derived:
- volatility profile: local rollup from hyperliquid_candles
- positioning pressure: local composite from price momentum, funding pressure, and liquidity imbalance

Data Model Direction

Use one SQLite table per source/channel family. This keeps each feature family explicit, gives us room for channel-specific typed columns, and still allows a common bucket shape across tables.

Current v1 table families:

text

hyperliquid_assets
hyperliquid_asset_categories
hyperliquid_mid_prices
hyperliquid_candles
hyperliquid_liquidity_profiles
hyperliquid_positioning_pressure
hyperliquid_volatility_profiles
hyperliquid_funding_pressure
polymarket_events
polymarket_markets
polymarket_market_asset_links
polymarket_recently_closed_markets
polymarket_immediate_markets
polymarket_later_markets
defillama_capital_base
defillama_capital_flows
defillama_economic_throughput

hyperliquid_assets is the supported-market registry. It records whether each symbol is a perp, spot-style @... market, or unknown stream-discovered market, plus coverage flags for mid prices, candles, liquidity, funding, volatility, positioning, and trading. Other services should read this registry instead of inferring support from raw observation rows.

The registry should be usable immediately after a fresh SQLite migration. Seed it from a checked-in Hyperliquid metadata snapshot, then let runtime ingestion keep it current:

text

metaAndAssetCtxs + spotMeta + allMids + candle WebSocket
  -> generated SQL seed migration
  -> hyperliquid_assets
  -> runtime universe and price stream refreshes

The registry also carries display_name, search_terms_json, and category metadata for UX selection. Hyperliquid perp metadata is ticker-oriented, while spot metadata exposes token fullName; for external matching we seed curated aliases for major perp assets and use symbol fallback for the long tail. Polymarket matching should use these terms instead of assuming ticker strings appear in prediction-market titles.

Asset selection should follow Hyperliquid's visible category structure:

text

High level: All, Perps, Spot, Crypto, Tradfi, Trending
Crypto: All, AI, Defi, Gaming, Layer 1, Layer 2, Meme
Tradfi: All, Stocks, Indices, Commodities, FX, Pre-IPO
Spot, if exposed: All, USDC, USDH, USDT
Out of scope: generic HIP-3 category exposure

The official Hyperliquid Info API should remain the source of truth for which perp and spot markets exist, but it should not be treated as the source of truth for category tags. Keep category/subcategory membership in the SQLite registry as a curated snapshot that can be regenerated intentionally.

Polymarket v1 discovery should be registry-first. Crawl Gamma's Crypto-tagged events hourly, then run throttled public-search requests for every safe Hyperliquid asset search term. Store polymarket_events, polymarket_markets, and polymarket_market_asset_links as the durable registry, then derive immediate/later/recently-closed summaries from that local registry. CLOB price/orderbook enrichment should stay limited to selected high-signal linked markets.

DefiLlama v1 should stay macro-first. Fetch broad hourly snapshots from the free APIs for chain TVL, protocol TVL, stablecoin supply, DEX volume, fees, revenue, and open interest. Store compact global, chain, protocol, and stablecoin subjects in defillama_capital_base, defillama_capital_flows, and defillama_economic_throughput. This is the market-regime layer for clone strategies, not an attempt to force DefiLlama coverage across every Hyperliquid asset.

For asset-specific decision context, DefiLlama reads should filter to relevant macro subjects before applying the requested datapoint limit: global defi, matching chain rows by metrics.tokenSymbol, and matching stablecoin subject rows by symbol. This lets BTC receive multiple global/Bitcoin rows across its lookback window without treating every protocol row as BTC-specific context.

Most channel tables should use this common subject/bucket key:

sql

subject_type TEXT NOT NULL,      -- asset, market, chain, protocol, global
subject_id TEXT NOT NULL,        -- BTC, market id, ethereum, uniswap, __global__
granularity_sec INTEGER NOT NULL,
bucket_ts INTEGER NOT NULL,
metrics_json TEXT NOT NULL,
PRIMARY KEY (subject_type, subject_id, granularity_sec, bucket_ts)

Candle tables can use more specific columns for fast queries:

text

symbol
granularity_sec
bucket_ts
open_price_e8
high_price_e8
low_price_e8
close_price_e8
volume_base
volume_quote_usd
sample_count

Use fixed-point price_e8 for Hyperliquid prices instead of cents. Some listed markets trade below one cent, so cent storage loses precision and can round to zero.

The live v1 worker stores short-horizon midpoint buckets from Hyperliquid allMids and 1-minute OHLC candles from Hyperliquid WebSocket candle subscriptions. Larger candle windows should be derived from 1-minute rows instead of subscribing to every interval.

Do not add a separate generated insight store in v1. Trading prompts should consume compact context packets assembled directly from the structured market tables, plus R2 links for large prompt/context/response audit payloads.

Clone Decision Flow

The clone decision path should be deterministic before the model call:

text

clone config from SQLite
  -> selected assets/channels/timeframe
  -> market context service queries SQLite
  -> prompt/context renderer
  -> decision worker creates a run row
  -> one Claude or other model call through a pluggable engine
  -> parse strict JSON actions
  -> validate actions against candidate asset, enabled state, and configured execution controls
  -> create dry-run order and ledger rows
  -> execute trades after safety gates
  -> record compact action/ledger rows in SQLite
  -> store full reasons/context in R2

During local development, the worker defaults to a no-op hold engine to exercise cadence, context, and storage without placing trades or requiring model credentials. Set DECISION_MODEL_PROVIDER=anthropic and ANTHROPIC_API_KEY to run the model-backed dry-run engine without order placement.

Example API shape:

getAgentMarketContext({
  cloneId,
  assets: ["BTC", "ETH", "SOL"],
  horizon: "intraday",
  sources: ["hyperliquid", "polymarket", "defillama"],
  asOf: Date.now()
});

The context returned to the prompt renderer should be compact structured data, for example:

{
  BTC: {
    liquidityProfile: { depthUsd1Pct: 12000000, spreadBps: 1.2, imbalance: 0.18 },
    fundingPressure: { fundingRate8h: 0.012, openInterestChange1hPct: 4.1, regime: "crowded_long" },
    volatilityProfile: { realizedVol1h: 0.034, regime: "high" },
    positioningPressure: { direction: "long_pressure", score: 0.72, confidence: 0.64 },
    polymarket: { immediateBias: "risk_on", probabilityShift1h: 0.06 },
    capitalFlows: { stablecoinFlow24hUsd: 82000000, chainFlowBias: "inflow" }
  }
}

Trading State

Paper and prop trading state should live in SQLite if it needs deterministic audit/replay:

Trade decisions
Orders
Fills
Position events
Wallet ledger
Portfolio snapshots

For real/prop trading, the exchange or broker remains the canonical source of external balances and fills. SQLite stores our local audit trail, observed reconciliations, and decision/action history.

Keep bulky reasons and full model artifacts in R2:

sql

CREATE TABLE clone_decision_runs (
  id TEXT PRIMARY KEY,
  clone_id INTEGER NOT NULL,
  branch_id TEXT NOT NULL,
  symbol TEXT NOT NULL,
  status TEXT NOT NULL,
  trigger TEXT NOT NULL,
  scheduled_for INTEGER NOT NULL,
  started_at INTEGER,
  completed_at INTEGER,
  candidate_symbols_json TEXT NOT NULL DEFAULT '[]',
  model TEXT NOT NULL,
  prompt_r2_key TEXT,
  response_r2_key TEXT,
  context_r2_key TEXT,
  error_message TEXT,
  metadata_json TEXT NOT NULL DEFAULT '{}'
);

CREATE TABLE clone_decision_actions (
  id TEXT PRIMARY KEY,
  run_id TEXT NOT NULL,
  clone_id INTEGER NOT NULL,
  symbol TEXT NOT NULL,
  action TEXT NOT NULL,
  confidence REAL NOT NULL,
  status TEXT NOT NULL,
  quantity REAL,
  notional_usd REAL,
  limit_price REAL,
  reason_summary TEXT,
  order_id TEXT,
  reason_r2_key TEXT,
  error_message TEXT,
  metadata_json TEXT NOT NULL DEFAULT '{}',
  created_at INTEGER NOT NULL
);

Size Expectations

With global ingestion and 7-day hot retention, market data should stay small:

text

Hyperliquid price hot data: low hundreds of MB to low single-digit GB
Hyperliquid full v1 hot data: likely 1-2 GB
Planning headroom for Hyperliquid: about 5 GB

The full v1 estimate assumes 5-second mid-price retention for the short tail, 1-minute/1-hour candle retention, 5-minute liquidity/funding/positioning/volatility profiles, and active feature pruning. The main growth risk is not market data. It is decision run logging. Keep SQLite compact:

Store action rows, ledger rows, status, model, timestamps, and R2 keys in SQLite.
Store full prompts, full contexts, full responses, and long reasons in R2.

Compact decision action storage estimate:

text

about 0.5-3 KB per decision run in SQLite
288 runs/day per enabled clone-asset if running every 5 minutes
7-day hot storage: about 1-6 MB per enabled clone-asset
100 clones x 5 enabled assets: about 0.5-3 GB
1,000 clones x 5 enabled assets: about 5-30 GB

Those numbers are manageable if full prompts and context blobs stay out of SQLite.

First Implementation Slice

Build this as a parallel replacement, not an in-place refactor:

Add SQLite connection, migrations, and repository layer.
Port Hyperliquid allMids and 1-minute candle WebSocket ingestion into the new worker.
Add rollup/compaction jobs for 1m and 1h buckets.
Add Hyperliquid liquidity/funding/volatility features.
Add marketContextService.getContext(...).
Switch the trading decision flow to read deterministic SQLite context instead of Supabase price history plus Pinecone snapshots.
Dual-run old and new context generation until output quality is acceptable.
Add Polymarket and DefiLlama structured pipelines.
Retire Pinecone/Supabase market-memory and configuration paths after cutover.

SQLite + R2 Market Memory Plan ​

Goal ​

System Split ​

Deployment Shape ​

Source Coverage ​

Retention And Granularity ​

Data Model Direction ​

Clone Decision Flow ​

Trading State ​

Size Expectations ​

First Implementation Slice ​

SQLite + R2 Market Memory Plan

Goal

System Split

Deployment Shape

Source Coverage

Retention And Granularity

Data Model Direction

Clone Decision Flow

Trading State

Size Expectations

First Implementation Slice