Backtesting Module

Overview

Backtesting owns guarded replay runs for clone strategies. It is backend-complete for the v1 module: durable async jobs, usage metering, queue leases, comparable metrics, artifact retention, and preflight policy live under Backtesting even though API routes remain clone-scoped for compatibility.

A backtest result is strategy-evaluation evidence, not production prop-account truth.

Responsibilities

Create queued backtest jobs.
Authorize starts against daily quota and per-run caps.
Estimate replay steps, active assets, decision calls, and token budget before execution.
Process queued jobs with leases and concurrency limits.
Run replay clocks over historical market context.
Reuse AI Trading context/decision primitives and Prop Trading fill math in isolation.
Store summaries, metrics, progress, and artifacts.
Prune expiring prompt/context/response artifacts.

Boundary Rules

Clone Service owns clone identity, ownership, strategy graphs, and entitlements.
AI Trading owns prompt/model execution internals.
Prop Trading owns production account truth; backtests must not write production prop tables.
Economy owns final pricing/payment policy; current Backtesting limits are guardrail defaults.
Backtesting routes are clone-scoped for API compatibility, but state is module-owned.

Runtime Flow

Start:

text

POST decision-backtest/preflight
-> estimate quota/caps without consuming usage

POST decision-backtest
-> authorize quota and per-run caps
-> increment usage-day counter
-> create queued backtest job

Worker:

text

Prune expired artifacts
-> requeue expired leases
-> claim next queued job within global/per-user limits
-> extend lease as progress updates
-> resolve replay context at timestamps
-> sample decision ticks within decision-call cap
-> simulate actions in isolated account state
-> store metrics/artifacts
-> mark completed or failed

Default replay shape is the last 6 hours, 5-minute cadence, 72 replay steps, and up to 12 inline decision calls unless capped by policy.

Main Code Paths

Path	Purpose
`backend/src/modules/backtesting/policy.ts`	Tier-aware preflight, daily quota enforcement, per-run cap policy.
`backend/src/modules/backtesting/planning.ts`	Replay window, cadence, and decision-sampling helpers.
`backend/src/modules/backtesting/service.ts`	Replay execution service built on AI Trading and Prop Trading primitives.
`backend/src/modules/backtesting/job-manager.ts`	Async job orchestration, queue claiming, progress/result updates.
`backend/src/modules/backtesting/artifact-store.ts`	Backtest artifact storage wrapper and retention policy.
`backend/src/modules/backtesting/repositories/backtest-jobs.ts`	Job and metrics repository.
`backend/src/modules/backtesting/repositories/backtest-artifacts.ts`	Artifact index repository.
`backend/src/modules/backtesting/repositories/backtest-usage.ts`	Daily accepted-run and estimated usage counters.
`backend/src/workers/backtesting.ts`	One-shot and long-running backtest queue worker.

State And Tables

Migration	Tables
`0017_backtesting.sql`	`backtest_jobs`, `backtest_job_metrics`, `backtest_usage_days`, `backtest_artifacts`
`0018_backtesting_worker_artifacts.sql`	Lease fields, attempt count, artifact expiration metadata.

Table	Technical significance
`backtest_jobs`	Queue status, progress, lease, attempt, failure, result summary.
`backtest_job_metrics`	Comparable run metrics for history/comparison UI.
`backtest_usage_days`	Daily accepted-run and estimated usage counters by user/clone/day.
`backtest_artifacts`	Stored summary/metrics/prompt/context/response artifact index and expiration.

Routes And Workers

Surface	Purpose
`POST /api/v1/clone/clones/:id/decision-backtest/preflight`	Estimate replay cost, caps, quota, and blockers without consuming usage.
`POST /api/v1/clone/clones/:id/decision-backtest`	Start an async backtest job.
`GET /api/v1/clone/clones/:id/decision-backtest/jobs`	List recent jobs with lightweight metrics.
`GET /api/v1/clone/clones/:id/decision-backtest/jobs/:jobId`	Read job status/result.
`npm run backtests:run-once`	Process one queue cycle.
`npm run backtests:worker`	Long-running queue worker.

Worker defaults:

Setting	Default
Worker interval	15s
Lease duration	5 minutes
Max concurrent jobs	2
Max concurrent jobs per user	1

Cost And Quota Policy

Current guardrail defaults are not final pricing promises.

Tier	Daily accepted runs	Max decision calls / run	Max active assets / run	Max estimated tokens / run
`starter`	3	24	3	120,000
`builder`	10	100	6	500,000
`pro`	25	200	10	1,000,000

All tiers currently use the last_6h window, 72 replay steps at 5-minute cadence, a 1,200-token minimum input estimate per decision call, and a 500-token output estimate per decision call.

Metrics

Backtest summaries are designed to be comparable without loading full artifacts:

Starting and ending equity.
Realized and unrealized PnL.
Max drawdown.
Win/loss count and win rate.
Number of decisions, accepted actions, rejected actions, and no-ops.
Fees, slippage, turnover, and exposure by asset.
Model calls, estimated cost, token usage, and average decision latency.

Failure Behavior

authorizeStart increments backtest_usage_days only when a start is accepted.
Cap/quota failures reject before job creation.
Worker requeues expired running jobs by lease expiry.
Claimed jobs respect global and per-user running caps.
Progress updates extend the lease.
Failed jobs clear lease state and preserve error details.
Prompt/context/response artifacts expire after the retention window; summary/metrics artifacts are durable by default.

Debugging Notes

If a job remains queued, inspect global/per-user running caps and active leases.
If a job looks stuck running, inspect lease_expires_at_ms, leased_by, attempt_count, and worker heartbeat.
If preflight blocks a run, compare estimated decision calls, active assets, replay steps, and tokens to tier caps.
If quota seems wrong after an upgrade, remember usage is kept for the UTC day but later starts use the new entitlement's higher limit.
If results differ from production Prop Trading, confirm the backtest did not write or read production prop-account state beyond reusable math.

Tests

bash

cd backend
npm run typecheck
npm run test

Current coverage includes usage recording, daily quota exhaustion, tier-upgrade behavior, preflight cap rejection, async progress, route authorization, queue claiming, lease expiry, artifact retention, and worker boot.

Known Gaps

Final pricing policy for included runs, paid overages, model-tier access, and cost-unit accounting.
Per-provider concurrency limits if model providers need separate budgets.
Cancellation and explicit retry/dead-letter policy.
Frontend history, run result, metrics comparison, and capped configuration UI.

Backtesting Module ​

Overview ​

Responsibilities ​

Boundary Rules ​

Runtime Flow ​

Main Code Paths ​

State And Tables ​

Routes And Workers ​

Cost And Quota Policy ​

Metrics ​

Failure Behavior ​

Debugging Notes ​

Tests ​

Known Gaps ​