x-forecast · paper portfolio
Research · Beta · vN.3 · offline · human-gated

Agent Pipeline · 研究编排

How a small DAG of single-purpose agents turns the current macro regime into a reviewable strategy proposal — fully audited, offline, and stopped at a human gate.

This is the offline research pipeline (vN.3) behind the portfolio — NOT an auto-trader. It reuses the tested backtest engine (vN.1) and bounded search (vN.2), then layers a red-team critic and a human gate on top. Every provider call is logged to an audit trail; the pipeline writes ONLY under research/proposals/ and never touches the live book in data/.

The pipeline

Read it like an orchestrator-workers system: a control plane (the Orchestrator) drives a data plane of single-purpose agents left to right, every call is logged, and the only decision point is a human gate — the machine never writes to the live book itself.

%%{init: {"theme":"base","themeVariables":{"fontFamily":"ui-sans-serif, system-ui, -apple-system, Segoe UI, sans-serif","fontSize":"14px","background":"#faf9f5","lineColor":"#8a8576","primaryTextColor":"#141413"},"flowchart":{"htmlLabels":true,"nodeSpacing":50,"rankSpacing":58,"padding":16,"useMaxWidth":true,"curve":"basis"}}}%% flowchart TB IN(["data/regime.yaml + price panel<br/>offline inputs"]):::store subgraph CTRL["control plane"] ORCH["Orchestrator · the conductor<br/>drives every call<br/>logs audit · hashes proposal_id"]:::det end subgraph DP["single-purpose agents · data plane (offline)"] direction LR A1["RegimeAgent<br/>pure read → views<br/>no LLM · no RNG"]:::det A2["Signals<br/>factor z-scores"]:::det A3["HypothesisAgent<br/>falsifiable thesis<br/>LLM-optional"]:::llm A4["Generator + Search<br/>bounded OOS search"]:::det A5["CriticAgent · red team<br/>DSR gate + stress re-sim<br/>produces accept flag"]:::redteam A6["CuratorAgent<br/>base / bull / bear drafts"]:::det A1 --> A2 --> A3 --> A4 --> A5 --> A6 end ART[("research/proposals/ID/<br/>5 artifacts + audit.jsonl<br/>never writes data/")]:::store GATE{"HUMAN GATE<br/>reviews drafts + verdict"}:::human LIVE(["human manually copies<br/>weights → data/ live book"]):::term ARCH(["archived · zero deploy"]):::term IN --> A1 A6 --> ART --> GATE GATE -- approve --> LIVE GATE -- decline --> ARCH ORCH -. drives + logs .-> DP classDef det fill:#efede4,stroke:#a39e8f,color:#141413; classDef llm fill:#dbe8f4,stroke:#6a9bcc,color:#234a68,stroke-width:2px,stroke-dasharray:5 3; classDef redteam fill:#f6ddd0,stroke:#d97757,color:#8a3a1d,stroke-width:2px; classDef store fill:#ece8dc,stroke:#b0aea5,color:#57534b; classDef human fill:#e1e8d3,stroke:#788c5d,color:#3c4a28,stroke-width:2px; classDef term fill:#faf9f5,stroke:#cdc9bc,color:#57534b; style CTRL fill:#f4f2ea,stroke:#dcd8cb,color:#57534b; style DP fill:#f4f2ea,stroke:#dcd8cb,color:#57534b;
Figure A · Orchestration topology. Node colour marks each step's trust level (key below); the dashed arrow is control + logging, solid arrows are data flow.
deterministic — rule-based, no LLM, no RNG LLM-optional — rule-based by default; Claude only with a key red team — the critic that tries to reject store / artifact — audited, written only under research/proposals/ human gate — the only decision point; machine never auto-accepts

Step by step

  1. 1
    RegimeAgent
    PURE read of data/regime.yaml → coarse-class views (no LLM, no RNG).
  2. 2
    Signals
    Real cross-asset factor z-scores (momentum, defensive) from the price panel.
  3. 3
    HypothesisAgent
    Regime + signals → an explicit, falsifiable thesis (orientation, statement, falsification).
  4. 4
    Generator + Search
    Build a bounded vN.2 search space from the thesis, run the OOS walk-forward search.
  5. 5
    CriticAgent
    DSR gate (deflated Sharpe) + REAL asset-shock stress re-sim against the finalist.
  6. 6
    Curator + Orchestrator
    Compile drafts, write proposal + audit; hand off to a HUMAN.

Single-purpose agents

RegimeAgent

owns: the numbers

Aggregates the fine tactical matrix (OW=+1/N=0/UW=−1) up to coarse-class scores. Deterministic — owns no prose.

HypothesisAgent

owns: the thesis

Turns the regime view + real signal z-scores into an explicit orientation per class, a statement, and 4–6 falsification conditions. Audited.

Generator + Search

owns: the candidates

Derives a bounded search space from the thesis (sign fixed, magnitude searched), then ranks trials by an out-of-sample objective.

CriticAgent

owns: the red team

A strict DSR gate (null/negative evidence is rejected) plus a real stress re-sim: r_group = Σ wᵢ·shockᵢ on the finalist’s own weights.

CuratorAgent

owns: the drafts

Compiles base/bull/bear weights (each sums to 1, passes the constraints) and decision-time fields. Reuses the hypothesis’ statement + falsification.

Orchestrator

owns: provenance

Hashes a reproducible proposal_id, replays the audit log, writes the 5 artifacts, appends the leaderboard. Writes ONLY under research/proposals/.

Invariant 1 · Human gate

The pipeline writes ONLY under research/proposals/ — it never creates or edits anything under data/. A test asserts `git status data/` is unchanged across a run. A human reviews the drafts and, if accepted, manually copies weights into the live book.

Invariant 2 · Offline

No network is required. The default provider is deterministic and rule-based; the Anthropic SDK is imported only inside the Claude provider when a key is present. CI runs the deterministic path, so a proposal_id is reproducible.

Worked example (latest proposal)

proposal_id cc02400ae096 · provider rulebased · grid/seed · deterministic=true · code c2baf21 · data 4def167

Regime → Hypothesis

Regime quadrant Q4 (growth momentum -0.519, inflation momentum +0.275). Overweight tilt orientation: commodities, rates. Underweight tilt orientation: equities. Coarse-class views are aggregated from the fine tactical_matrix (OW=+1/N=0/UW=-1, mean per coarse class); the sign sets the search bound orientation, the magnitude is searched.

commodities: OWcredit: Nequities: UWrates: OW
live signal factors: defensive, momentum
Finalist
base_allocator
60_40
tilt_strength
0.5
OOS Sharpe
0.4870
on
77 obs · 3 splits · 162 trials
Critic verdict (red team)
Deflated Sharpe
0.0000 (SR0 2.70)
accept
false
stress basis
finalist_asset_shock_resim
stress flags
inf2022
In this run the critic re-simulated all 5 historical scenarios against the finalist’s own weights and flagged inf2022; the strict DSR gate returned accept=false. That is the system working as intended — on thin, single-regime data it is supposed to withhold approval. The drafts still wait at the human gate.

Audit trail · what the orchestrator actually ran

Every provider call the orchestrator drove for this proposal, in order, replayed from audit.jsonl. model is empty because this run used the deterministic rule-based provider — the Claude provider is only imported when a key is configured. RegimeAgent (a pure read) and the Orchestrator (writes artifacts only) make no provider call, so they do not appear here.

%%{init: {"theme":"base","themeVariables":{"fontFamily":"ui-sans-serif, system-ui, -apple-system, Segoe UI, sans-serif","fontSize":"16px","background":"#faf9f5","actorBkg":"#efede4","actorBorder":"#a39e8f","actorTextColor":"#141413","actorLineColor":"#c4c0b2","signalColor":"#8a8576","signalTextColor":"#3c382f","noteBkg":"#dbe8f4","noteBorderColor":"#6a9bcc","noteTextColor":"#234a68","activationBkgColor":"#f6ddd0","activationBorderColor":"#d97757","sequenceNumberColor":"#faf9f5","labelBoxBkgColor":"#e1e8d3","labelBoxBorderColor":"#788c5d","labelTextColor":"#3c4a28","loopTextColor":"#3c4a28"},"sequence":{"useMaxWidth":false,"actorMargin":60,"boxMargin":14,"noteMargin":12,"messageMargin":42,"mirrorActors":true}}}%% sequenceDiagram autonumber participant O as Orchestrator participant R as RegimeAgent participant S as Signals participant H as HypothesisAgent participant G as Generator+Search participant C as CriticAgent participant U as CuratorAgent participant L as audit.jsonl participant Hum as Human Note over O,L: offline · deterministic by default · provider=rulebased · model=none O->>R: read regime to coarse views R-->>O: views (no provider call, not logged) O->>S: compute factor z-scores S-->>O: momentum / defensive O->>H: state hypothesis + falsification H-->>L: log regime_summary, falsification H-->>O: thesis + 4 to 6 falsifiers O->>G: bounded OOS walk-forward search G-->>L: log search_space G-->>O: finalist params O->>C: critique (DSR + stress re-sim) C-->>L: log critique → accept=false, flag inf2022 C-->>O: verdict O->>U: compile drafts U-->>L: log rationale U-->>O: base / bull / bear drafts O->>O: hash proposal_id · write 5 artifacts O->>Hum: hand off drafts + verdict Note over Hum: machine NEVER auto-accepts alt human approves Hum->>Hum: manually copy weights → data/ else human declines Hum->>Hum: archive · zero deploy end
Figure B · The same run as a sequence trace (scroll horizontally to follow all nine lanes). Blue note = run mode; the orange activation bar marks the critic; only the four agents that hit a provider write to audit.jsonl; the run ends at the human gate (approve / decline).

The logged calls

  1. 01
    hypothesis · regime_summary provider=rulebased model=—
    state the macro hypothesis from the regime view
  2. 02
    hypothesis · falsification provider=rulebased model=—
    falsification conditions for the hypothesis
  3. 03
    generator · search_space provider=rulebased model=—
    generate vN.2 search_spec for the current regime
  4. 04
    critic · critique provider=rulebased model=—
    critique the finalist from DSR + stress context
  5. 05
    curator · rationale provider=rulebased model=—
    decision rationale prose

Honesty

  • The committed price history is thin and single-regime (~120 trading days, one Q4 macro regime). There is no bull/bear transition in-sample, so the regime tilt cannot be validated out-of-regime.
  • Stress shocks are window-magnitude estimates (from each scenario’s benchmark line), not per-ETF actuals — framework validation only.
  • These proposals are illustrative, NOT robust. Do not deploy on this evidence alone — which is exactly why everything stops at a human gate.
Source: research/agents/* (orchestrator + agents), research/engine/* (vN.1 engine + signals), research/search/* (vN.2 search). Every proposal ships 5 artifacts — proposal.md, rebalance_draft.yaml, decision_draft.yaml, audit.jsonl, config.yaml — committed to the public repo with a reproducible proposal_id.