Appearance
Backtesting Module
Overview
Backtesting owns guarded replay runs for clone strategies. It is backend-complete for the v1 module: durable async jobs, usage metering, queue leases, comparable metrics, artifact retention, and preflight policy live under Backtesting even though API routes remain clone-scoped for compatibility.
A backtest result is strategy-evaluation evidence, not production prop-account truth.
Responsibilities
- Create queued backtest jobs.
- Authorize starts against daily quota and per-run caps.
- Estimate replay steps, active assets, decision calls, and token budget before execution.
- Process queued jobs with leases and concurrency limits.
- Run replay clocks over historical market context.
- Reuse AI Trading context/decision primitives and Prop Trading fill math in isolation.
- Store summaries, metrics, progress, and artifacts.
- Prune expiring prompt/context/response artifacts.
Boundary Rules
- Clone Service owns clone identity, ownership, strategy graphs, and entitlements.
- AI Trading owns prompt/model execution internals.
- Prop Trading owns production account truth; backtests must not write production prop tables.
- Economy owns final pricing/payment policy; current Backtesting limits are guardrail defaults.
- Backtesting routes are clone-scoped for API compatibility, but state is module-owned.
Runtime Flow
Start:
text
POST decision-backtest/preflight
-> estimate quota/caps without consuming usage
POST decision-backtest
-> authorize quota and per-run caps
-> increment usage-day counter
-> create queued backtest jobWorker:
text
Prune expired artifacts
-> requeue expired leases
-> claim next queued job within global/per-user limits
-> extend lease as progress updates
-> resolve replay context at timestamps
-> sample decision ticks within decision-call cap
-> simulate actions in isolated account state
-> store metrics/artifacts
-> mark completed or failedDefault replay shape is the last 6 hours, 5-minute cadence, 72 replay steps, and up to 12 inline decision calls unless capped by policy.
Main Code Paths
| Path | Purpose |
|---|---|
backend/src/modules/backtesting/policy.ts | Tier-aware preflight, daily quota enforcement, per-run cap policy. |
backend/src/modules/backtesting/planning.ts | Replay window, cadence, and decision-sampling helpers. |
backend/src/modules/backtesting/service.ts | Replay execution service built on AI Trading and Prop Trading primitives. |
backend/src/modules/backtesting/job-manager.ts | Async job orchestration, queue claiming, progress/result updates. |
backend/src/modules/backtesting/artifact-store.ts | Backtest artifact storage wrapper and retention policy. |
backend/src/modules/backtesting/repositories/backtest-jobs.ts | Job and metrics repository. |
backend/src/modules/backtesting/repositories/backtest-artifacts.ts | Artifact index repository. |
backend/src/modules/backtesting/repositories/backtest-usage.ts | Daily accepted-run and estimated usage counters. |
backend/src/workers/backtesting.ts | One-shot and long-running backtest queue worker. |
State And Tables
| Migration | Tables |
|---|---|
0017_backtesting.sql | backtest_jobs, backtest_job_metrics, backtest_usage_days, backtest_artifacts |
0018_backtesting_worker_artifacts.sql | Lease fields, attempt count, artifact expiration metadata. |
| Table | Technical significance |
|---|---|
backtest_jobs | Queue status, progress, lease, attempt, failure, result summary. |
backtest_job_metrics | Comparable run metrics for history/comparison UI. |
backtest_usage_days | Daily accepted-run and estimated usage counters by user/clone/day. |
backtest_artifacts | Stored summary/metrics/prompt/context/response artifact index and expiration. |
Routes And Workers
| Surface | Purpose |
|---|---|
POST /api/v1/clone/clones/:id/decision-backtest/preflight | Estimate replay cost, caps, quota, and blockers without consuming usage. |
POST /api/v1/clone/clones/:id/decision-backtest | Start an async backtest job. |
GET /api/v1/clone/clones/:id/decision-backtest/jobs | List recent jobs with lightweight metrics. |
GET /api/v1/clone/clones/:id/decision-backtest/jobs/:jobId | Read job status/result. |
npm run backtests:run-once | Process one queue cycle. |
npm run backtests:worker | Long-running queue worker. |
Worker defaults:
| Setting | Default |
|---|---|
| Worker interval | 15s |
| Lease duration | 5 minutes |
| Max concurrent jobs | 2 |
| Max concurrent jobs per user | 1 |
Cost And Quota Policy
Current guardrail defaults are not final pricing promises.
| Tier | Daily accepted runs | Max decision calls / run | Max active assets / run | Max estimated tokens / run |
|---|---|---|---|---|
starter | 3 | 24 | 3 | 120,000 |
builder | 10 | 100 | 6 | 500,000 |
pro | 25 | 200 | 10 | 1,000,000 |
All tiers currently use the last_6h window, 72 replay steps at 5-minute cadence, a 1,200-token minimum input estimate per decision call, and a 500-token output estimate per decision call.
Metrics
Backtest summaries are designed to be comparable without loading full artifacts:
- Starting and ending equity.
- Realized and unrealized PnL.
- Max drawdown.
- Win/loss count and win rate.
- Number of decisions, accepted actions, rejected actions, and no-ops.
- Fees, slippage, turnover, and exposure by asset.
- Model calls, estimated cost, token usage, and average decision latency.
Failure Behavior
authorizeStartincrementsbacktest_usage_daysonly when a start is accepted.- Cap/quota failures reject before job creation.
- Worker requeues expired running jobs by lease expiry.
- Claimed jobs respect global and per-user running caps.
- Progress updates extend the lease.
- Failed jobs clear lease state and preserve error details.
- Prompt/context/response artifacts expire after the retention window; summary/metrics artifacts are durable by default.
Debugging Notes
- If a job remains queued, inspect global/per-user running caps and active leases.
- If a job looks stuck running, inspect
lease_expires_at_ms,leased_by,attempt_count, and worker heartbeat. - If preflight blocks a run, compare estimated decision calls, active assets, replay steps, and tokens to tier caps.
- If quota seems wrong after an upgrade, remember usage is kept for the UTC day but later starts use the new entitlement's higher limit.
- If results differ from production Prop Trading, confirm the backtest did not write or read production prop-account state beyond reusable math.
Tests
bash
cd backend
npm run typecheck
npm run testCurrent coverage includes usage recording, daily quota exhaustion, tier-upgrade behavior, preflight cap rejection, async progress, route authorization, queue claiming, lease expiry, artifact retention, and worker boot.
Known Gaps
- Final pricing policy for included runs, paid overages, model-tier access, and cost-unit accounting.
- Per-provider concurrency limits if model providers need separate budgets.
- Cancellation and explicit retry/dead-letter policy.
- Frontend history, run result, metrics comparison, and capped configuration UI.