Skip to content

Backtesting Module

Overview

Backtesting owns guarded replay runs for clone strategies. It is backend-complete for the v1 module: durable async jobs, usage metering, queue leases, comparable metrics, artifact retention, and preflight policy live under Backtesting even though API routes remain clone-scoped for compatibility.

A backtest result is strategy-evaluation evidence, not production prop-account truth.

Responsibilities

  • Create queued backtest jobs.
  • Authorize starts against daily quota and per-run caps.
  • Estimate replay steps, active assets, decision calls, and token budget before execution.
  • Process queued jobs with leases and concurrency limits.
  • Run replay clocks over historical market context.
  • Reuse AI Trading context/decision primitives and Prop Trading fill math in isolation.
  • Store summaries, metrics, progress, and artifacts.
  • Prune expiring prompt/context/response artifacts.

Boundary Rules

  • Clone Service owns clone identity, ownership, strategy graphs, and entitlements.
  • AI Trading owns prompt/model execution internals.
  • Prop Trading owns production account truth; backtests must not write production prop tables.
  • Economy owns final pricing/payment policy; current Backtesting limits are guardrail defaults.
  • Backtesting routes are clone-scoped for API compatibility, but state is module-owned.

Runtime Flow

Start:

text
POST decision-backtest/preflight
-> estimate quota/caps without consuming usage

POST decision-backtest
-> authorize quota and per-run caps
-> increment usage-day counter
-> create queued backtest job

Worker:

text
Prune expired artifacts
-> requeue expired leases
-> claim next queued job within global/per-user limits
-> extend lease as progress updates
-> resolve replay context at timestamps
-> sample decision ticks within decision-call cap
-> simulate actions in isolated account state
-> store metrics/artifacts
-> mark completed or failed

Default replay shape is the last 6 hours, 5-minute cadence, 72 replay steps, and up to 12 inline decision calls unless capped by policy.

Main Code Paths

PathPurpose
backend/src/modules/backtesting/policy.tsTier-aware preflight, daily quota enforcement, per-run cap policy.
backend/src/modules/backtesting/planning.tsReplay window, cadence, and decision-sampling helpers.
backend/src/modules/backtesting/service.tsReplay execution service built on AI Trading and Prop Trading primitives.
backend/src/modules/backtesting/job-manager.tsAsync job orchestration, queue claiming, progress/result updates.
backend/src/modules/backtesting/artifact-store.tsBacktest artifact storage wrapper and retention policy.
backend/src/modules/backtesting/repositories/backtest-jobs.tsJob and metrics repository.
backend/src/modules/backtesting/repositories/backtest-artifacts.tsArtifact index repository.
backend/src/modules/backtesting/repositories/backtest-usage.tsDaily accepted-run and estimated usage counters.
backend/src/workers/backtesting.tsOne-shot and long-running backtest queue worker.

State And Tables

MigrationTables
0017_backtesting.sqlbacktest_jobs, backtest_job_metrics, backtest_usage_days, backtest_artifacts
0018_backtesting_worker_artifacts.sqlLease fields, attempt count, artifact expiration metadata.
TableTechnical significance
backtest_jobsQueue status, progress, lease, attempt, failure, result summary.
backtest_job_metricsComparable run metrics for history/comparison UI.
backtest_usage_daysDaily accepted-run and estimated usage counters by user/clone/day.
backtest_artifactsStored summary/metrics/prompt/context/response artifact index and expiration.

Routes And Workers

SurfacePurpose
POST /api/v1/clone/clones/:id/decision-backtest/preflightEstimate replay cost, caps, quota, and blockers without consuming usage.
POST /api/v1/clone/clones/:id/decision-backtestStart an async backtest job.
GET /api/v1/clone/clones/:id/decision-backtest/jobsList recent jobs with lightweight metrics.
GET /api/v1/clone/clones/:id/decision-backtest/jobs/:jobIdRead job status/result.
npm run backtests:run-onceProcess one queue cycle.
npm run backtests:workerLong-running queue worker.

Worker defaults:

SettingDefault
Worker interval15s
Lease duration5 minutes
Max concurrent jobs2
Max concurrent jobs per user1

Cost And Quota Policy

Current guardrail defaults are not final pricing promises.

TierDaily accepted runsMax decision calls / runMax active assets / runMax estimated tokens / run
starter3243120,000
builder101006500,000
pro25200101,000,000

All tiers currently use the last_6h window, 72 replay steps at 5-minute cadence, a 1,200-token minimum input estimate per decision call, and a 500-token output estimate per decision call.

Metrics

Backtest summaries are designed to be comparable without loading full artifacts:

  • Starting and ending equity.
  • Realized and unrealized PnL.
  • Max drawdown.
  • Win/loss count and win rate.
  • Number of decisions, accepted actions, rejected actions, and no-ops.
  • Fees, slippage, turnover, and exposure by asset.
  • Model calls, estimated cost, token usage, and average decision latency.

Failure Behavior

  • authorizeStart increments backtest_usage_days only when a start is accepted.
  • Cap/quota failures reject before job creation.
  • Worker requeues expired running jobs by lease expiry.
  • Claimed jobs respect global and per-user running caps.
  • Progress updates extend the lease.
  • Failed jobs clear lease state and preserve error details.
  • Prompt/context/response artifacts expire after the retention window; summary/metrics artifacts are durable by default.

Debugging Notes

  • If a job remains queued, inspect global/per-user running caps and active leases.
  • If a job looks stuck running, inspect lease_expires_at_ms, leased_by, attempt_count, and worker heartbeat.
  • If preflight blocks a run, compare estimated decision calls, active assets, replay steps, and tokens to tier caps.
  • If quota seems wrong after an upgrade, remember usage is kept for the UTC day but later starts use the new entitlement's higher limit.
  • If results differ from production Prop Trading, confirm the backtest did not write or read production prop-account state beyond reusable math.

Tests

bash
cd backend
npm run typecheck
npm run test

Current coverage includes usage recording, daily quota exhaustion, tier-upgrade behavior, preflight cap rejection, async progress, route authorization, queue claiming, lease expiry, artifact retention, and worker boot.

Known Gaps

  • Final pricing policy for included runs, paid overages, model-tier access, and cost-unit accounting.
  • Per-provider concurrency limits if model providers need separate budgets.
  • Cancellation and explicit retry/dead-letter policy.
  • Frontend history, run result, metrics comparison, and capped configuration UI.