← Back to Updates · 5-agent comparison → · peer chatlog →

SUPREME EDGE ENHANCEMENT

Master Plan — 5-Agent + Peer Chatlog + Buffy + DB Health Synthesis
RESEARCH SURFACE — NOT FINANCIAL ADVICE
Authoritative synthesis of 5 multi-agent plans + peer Claude Opus 4.7 session chatlog. No live capital from this page without explicit user greenlight + real-money gate clearance.
Generated 2026-05-11 · Branch feat/audit-dashboard-enhancements-hermes-2026-05-09 · Skill .claude/skills/money-maker-ready/SKILL.md v1.0 · Peer chatlog updates/2026-05-11-session-chatlog-claude-opus-47.md · PR #904 ready to merge at 6d7ccd928fd

TL;DR

Edge-stability sidecar (peer Phase G) verifies 2 stable classes + 2 decaying + 4 too-thin. COMMODITY and EQUITY are STABLE_EDGE with PF 3.61 / 2.04. CRYPTO and FOREX DECAYING_EDGE at PF 1.39 / 0.57. BOND, ETF, FUTURES, INDEX INSUFFICIENT_DATA.

Real-money posture (synthesis): Codex all-classes-first state machine + Kimi 7-check Go/No-Go gate + Cursor measurable Tier-2 criteria + Copilot 2-consecutive-weekly confirmation. No class trades live until all six major classes ≥ SHADOW.

Immediate (next 24h): Merge PR #904 (4 P1 swarm-fixes shipped, SSRF guard added, 16 smoke tests, 2 fabricated review claims rejected). Then execute 7-item P0 cluster.

Source inputs (what fed this master plan)

#SourceAuthor / modelKey contribution
1Claude Code planOpus 4.7 (this session)Flagged claude_gainer_st winner-vs-blacklist contradiction; identified multi_asset_cot PF=19.19 needing DB verification
2Cursor planCursor Plan ModeCanonical per-class baseline numbers (n=408/443/100/7875/1825/11); 4-phase fast-track with measurable n-targets
3Copilot planGitHub Copilot Chat2-consecutive-weekly Tier-2 promotion gate; ETF+COMMODITY rollout order + CRYPTO curated sleeve parallel
4Kimi planMoonshot Kimi (IDE)7-check Go/No-Go gate; named symbol-level edges (cot_positioning_CT_locked 89.8% WR, rs-breakout-scout 77.8% WR); 1-hour P0 fixes list
5Codex planOpenAI ChatGPT CodexClass state-machine BLOCKED→REHAB→OOS_READY→SHADOW→LIVE_ELIGIBLE; readiness.by_class payload contract; truth-layer-first policy
6Peer Claude chatlogClaude Opus 4.7 (1M context, peer session)9-phase work A-I; edge-stability sidecar with VERIFIED per-class verdicts; PR #904 swarm-reviewed + ready to merge; 14-item remaining backlog

Per-class action items (specific, not generic)

COMMODITY STABLE_EDGE

Edge-stability verdict: PF 3.61 / WR 55.7% / n=167 (peer Phase G). Cursor reported live asset_class_health PF 3.92 / WR 67.4% / n=408 — discrepancy is sample-window difference (edge-stability uses rolling window; asset_class_health cumulative).

Specific edges named:

  • Kimi: cot_positioning_CT_locked LONG 89.8% WR, PF 13.1 (n=49)
  • System-level: multi_asset_cot PF 19.19 / n=130 (Claude Code flagged for DB verification — implausibly high)

Actions:

  1. P0 Verify multi_asset_cot PF=19.19 via DB query against ejaguiar1_stocks — data integrity smoke test. If real, name as Tier-1 seed.
  2. P0 Disclose CT=F / KC=F symbol concentration in dashboard (Codex) — current PF may be one-symbol artifact.
  3. P1 Add COMMODITY to alpha_engine/walkforward_validator.py output path (Cursor — currently missing from walkforward.by_class). Verify surfacing in audit_trail/dashboard_generator.py.
  4. P1 Wire real CFTC COT data to validate the 89.8% WR cot_positioning_CT_locked pocket (Kimi).
  5. P2 Add term-structure / inventory / seasonality features (Codex).

Promotion gate (master): walk-forward by_class block emits real folds + concentration disclosed + DB-verified PF + sustained Tier-2 (PF≥1.5 / WR≥50 / MDD≤20) for 2 consecutive weekly snapshots → OOS_READY. Then 14-30d SHADOW.

EQUITY STABLE_EDGE

Edge-stability verdict: PF 2.04 / WR 57.4% / n=272 (peer Phase G). Cursor cumulative: PF 1.60 / WR 54.0% / n=443. Convergent — EQUITY is the strongest broad candidate per all 5 plans + verified by peer sidecar.

Specific edges named:

  • Kimi: rs-breakout-scout LONG 77.8% WR, PF 6.7 (n=18)
  • Kimi: Breakout Momentum LONG 57.9% WR / PF 1.53 (n=38)
  • System: aggregated_picks (77.3% WR / PF 6.42 / n=441) — Claude Code flagged "aggregator artifact suspect"
  • System: claude_gainer_st (78.5% WR / PF 6.12 / n=3472) — Claude Code flagged contradiction: in BLACKLISTED_STRATEGIES at alpha_engine/config.py:216 yet tops the leaderboard

Actions:

  1. P0 Reconcile claude_gainer_st winner-vs-blacklist contradiction (Claude Code). Confirm enforcement at exec gate, not just intake — memory feedback_gate_at_execution_not_generation.
  2. P0 Verify capped-vs-raw PnL gap (Kimi flagged 680% MDD anomaly; Codex made it a payload-contract field capped_vs_raw_pnl_gap).
  3. P1 Bottom-symbol pruning + High-Conviction parity (Codex). Audit HC filter against confidence-inversion on ETF/CRYPTO per memory project_performance_reality.
  4. P1 Add earnings-drift + sector-relative-strength + breadth features (Codex).
  5. P2 Push n from 443 toward 600+ with same risk-adjusted PF (Cursor target).

Promotion gate (master): capped MDD verified + claude_gainer_st reconciled + bottom-symbol pruning improves recent PF without breaking OOS consistency ≥ 80% → OOS_READY. Then 14-30d SHADOW.

CRYPTO DECAYING_EDGE

Edge-stability verdict: PF 1.39 / WR 46.5% / n=1521 (peer Phase G). Cursor cumulative: PF 1.39 / WR 47.4% / n=7875. Kimi: PF 1.26 / WR 44.8% / n=8166. All converge — class is decaying.

Specific draggers named:

  • kimi_signal_tracking: -954% PnL / PF 0.26 — named in Claude Code + Kimi + Codex as P0 quarantine target
  • baby_strats:crypto_soc_*: 12 overfit flags in fwd_vs_bt_divergence.rows — 66% BT WR vs 32% live (Kimi -33.6% decay) — surgical quarantine proposal exists at reports/baby_strats_overfit_quarantine_proposal_2026_05_10.md
  • quan_engine: 18% volume share at PF 0.70 (Kimi — cap to 12% of CRYPTO volume)
  • alpha_engine_fast: PF 0.62 / -127% PnL (Claude Code; current CRYPTO drag #1 per memory project_strategy_state_2026_05_03)

Specific edges (sleeves) inside the mediocre aggregate:

  • Kimi: st_fear_greed_contrarian 94% WR (promote to High-Conviction gating)
  • Codex: score CRYPTO by sleeve/subsystem quality rather than gross class aggregate

Actions:

  1. P0 Blacklist kimi_signal_tracking at alpha_engine/config.py:216. Verify enforcement at exec gate.
  2. P0 Ship existing baby_strats:crypto_soc_* quarantine proposal (peer Phase C already added unit tests for blocklist enforcement; 94/94 tests pass per peer chatlog Phase C-E-D1).
  3. P0 Cap quan_engine to 12% CRYPTO volume share (Kimi).
  4. P1 Promote st_fear_greed_contrarian to High-Conviction gating (Kimi).
  5. P1 Wire decay-replacement pipeline (peer remaining task): when edge_stability_CRYPTO.json::consistency_verdict == DECAYING_EDGE, trigger P1 swarm targeting "what replaces strategy X?".
  6. P2 Add funding / basis / open-interest / on-chain flow features (Codex).

Promotion gate (master): all 3 P0 quarantines shipped + post-quarantine forward window PF ≥ 1.5 on rolling 30d + decay-replacement pipeline live → OOS_READY. Then 14-30d SHADOW.

FOREX DECAYING_EDGE

Edge-stability verdict: PF 0.57 / WR 40.7% / n=1424 (peer Phase G). Cursor cumulative: PF 0.28 / WR 41.8% / n=1825. Kimi: PF 0.28 / WR 45.6% / n=1249. Sub-tier across all reads. Per memory CLAUDE.md FOREX is "genuinely sub-floor — apply mutate-before-kill protocol, do NOT silently kill".

Actions:

  1. P0 Hard-cap FOREX sizing at 0 until PF ≥ 0.8 (Kimi + Cursor). Explicit per-class gate, not silent kill.
  2. P0 PR #876 pnl_pct anomaly clamp [-100, 200]% — verify merged or merge it. "Kills forex unit corruption" per PR title. Some FOREX PF degradation may be unit-conversion bug, not real decay.
  3. P1 Spawn FOREX deep-dive subagent per CLAUDE.md major-goal mandate: reports/deep_dive_forex_*.md with per-source autopsy + external-replication options (DBMF/KMLM/MyFXBook) + 30/60/90-day rescue plan + acceptance criteria. Memory feedback_noncrypto_resolver_live_close_bug: outcome_resolver.py:384-405 closes at yfinance spot every run with 1bp WIN threshold — ~1700 picks mislabeled, root cause of every polluted non-crypto kill claim.
  4. P1 Wire COT / DXY-beta / carry-rate-differential / news-blackout features (Codex). Use mutate-before-kill per docs/MUTATION_THREE_AXIS_PROTOCOL.md.
  5. P2 Rebuild FOREX from scratch with new TP/SL caps + session timing (Kimi — Month 2 work).

Promotion gate (master): resolver bug fixed + COT/DXY/carry wired + PF ≥ 1.2 + WR ≥ 45 for sustained rolling window → REHAB. Cannot reach OOS_READY without ≥ 1 full quarter of clean post-fix data.

ETF INSUFFICIENT_DATA

Edge-stability verdict: INSUFFICIENT_DATA (peer Phase G; n below floor). Cursor cumulative: PF 1.48 / WR 60.0% / n=100. Kimi: PF 1.20 / WR 53.4% / n=88. Class is at the n floor — promotion gate blocked on sample.

Actions:

  1. P1 Expand ETF universe (XLF, XLE, XLK) to reach n=120 → 180 (Kimi + Cursor target).
  2. P1 Block leveraged ETFs (Codex — risk policy).
  3. P2 Add sector-theme concentration caps + AUM + expense-ratio filters (Codex).
  4. P2 Controlled emitter expansion only for ETF strategies with positive OOS decay + acceptable drawdown (Cursor).

Promotion gate (master): n ≥ 100 AND PF ≥ 1.5 AND consistency ≥ 80% on edge-stability sidecar → OOS_READY. ETF has the cleanest OOS profile per Codex, so this should be the fastest to promote once n threshold cleared.

BOND INSUFFICIENT_DATA

Edge-stability verdict: INSUFFICIENT_DATA. Cursor cumulative: PF 0.66 / WR 54.5% / n=11. Kimi: PF 1.72 / WR 55.6% / n=18. All plans agree: n too thin for any promotion verdict.

Actions:

  1. P1 Add BOND to alpha_engine/walkforward_validator.py output path (Cursor — currently missing from walkforward.by_class).
  2. P2 Expand BOND universe + add duration / risk filters (Codex).
  3. P2 Focus on data generation quality before optimization (Cursor).

Promotion gate (master): n ≥ 100 (multi-month effort) → BOND can re-enter the rotation. Keep paper-only until then.

FUTURES INSUFFICIENT_DATA

Edge-stability verdict: INSUFFICIENT_DATA. Memory project_futures_kill_without_replacement: Futures module silent-dead (5.9% WR, -96% PnL after 2 strategies killed + no replacements added).

Actions:

  1. P1 Document FUTURES current pipeline state — is anything emitting? If yes, why no signals? If no, was the kill intentional?
  2. P2 Per Codex: "FUTURES stay excluded until enough data to join the framework."
  3. P3 If reactivating, apply mutate-before-kill protocol on the killed strategies before recommending re-emission.

INDEX / OTHER INSUFFICIENT_DATA

Peer edge-stability sidecar reports INDEX class with INSUFFICIENT_DATA. Not enumerated in any of the 5 agent plans. Treat as "not yet a class."

Action: Defer until edge-stability sidecar emits real metrics for INDEX category.

Cross-cutting P0 cluster (next 24h, all-class impact)

#ActionEffortSource plan(s)Memory ref
1Merge PR #904 (research orchestrator + edge stability sidecar) — already MERGEABLE/CLEAN at 6d7ccd928fd0.1hpeer chatlog Phase I
2Blacklist kimi_signal_tracking at alpha_engine/config.py:216 + verify exec-gate enforcement1hClaude Code, Kimi, Codexfeedback_gate_at_execution_not_generation
3Ship baby_strats:crypto_soc_* quarantine via per-strategy BLOCKED_ASSET_STRATEGY_PAIRS at audit_trail/quality_gates.py:14991h4/5 plansreports/baby_strats_overfit_quarantine_proposal_2026_05_10.md
4Hard-cap FOREX sizing at 0 until PF ≥ 0.8 — explicit per-class gate (NOT silent kill)1hKimi, CursorCLAUDE.md FOREX directive; docs/MUTATION_THREE_AXIS_PROTOCOL.md
5Verify multi_asset_cot PF=19.19 via DB query against ejaguiar1_stocks1hClaude Code
6Reconcile claude_gainer_st winner-vs-blacklist contradiction1-2hClaude Codefeedback_gate_at_execution_not_generation
7Verify max-drawdown calc uses capped PnL (Kimi 680% MDD smell-test)1hKimi, Codex
8Cap quan_engine to 12% CRYPTO volume share1hKimi

P1 cluster (week 1, structural)

  1. Implement Codex's readiness.by_class payload contract (class state-machine fields + capped_vs_raw_pnl_gap + single_symbol_concentration + leaders.by_class + draggers.by_class).
  2. Add walk-forward coverage for COMMODITY + BOND in alpha_engine/walkforward_validator.py; surface in audit_trail/dashboard_generator.py.
  3. Fix drift detector — hf_stats.concept_drift.KS_D uncomputed-zero bug + refresh 19-day stale hf_stats. Wire drift → auto-pause sizing when D > 0.10 (peer remaining task "Drift-pause activation Phase 1").
  4. Reconcile /audit threshold text with docs/PERFORMANCE_CHARTER.md v1.0.
  5. Add last_signal_date to systems payload (Claude Code — absent for all top-6 winners).
  6. Peer high-priority backlog: v3b LLM-driven signal translator (peer Phase H §A) — likely flips several NO_EDGE → MIXED/GO once spec-faithful signals replace SMA proxy. ~$1/run dispatcher.
  7. Peer high-priority backlog: Re-fire P5 swarms with v3a numbers (~$0.35) — current P5 verdicts cached from pre-v3a stub numbers.
  8. Investigate-then-quarantine remaining top draggers (mercury2_fast, ml_bg_system_b, copy_trader_highscore) per docs/MUTATION_THREE_AXIS_PROTOCOL.md.

P2 cluster (weeks 2-4, per-class rehab in parallel)

See per-class action lists above. Cross-cutting:

  1. E-D3 — dry_run kwarg on smart_picks_engine + production_scanner + dashboard_generator (peer remaining backlog medium).
  2. CPCV upgrade — swap walk-forward for CPCV in p3_backtest_runner.py (peer remaining backlog; closes project_cpcv_gap_2026_04_28).
  3. Per-asset-class deep-dive swarm questions — 35 specific questions across 7 classes captured in DAILY_IDEAS.MD §B (peer remaining backlog).
  4. Re-run cross-permutations after data-trust + walk-forward fixes; verdict on per-class edge existence (Claude Code).
  5. Audit HC filter against confidence-inversion on ETF/CRYPTO per memory project_performance_reality.

P3 / P4 / P5 (longer horizon)

Open PR triage (24 open as of 2026-05-11 21:00 UTC)

PRTitleStatusMaster-plan action
#904research orchestrator + edge-stability sidecarMERGEABLE / CLEAN at 6d7ccd928fd (per peer chatlog Phase I)P0 MERGE NOW — unblocks the verdict-grade verdicts this master plan rests on
#903chore(loop): 2026-05-11 run findingsopenreview after #904 merges (docs-only)
#902feat(b13): per-class regime filter sidecar + quality_gates.py hook — COMPLETE (supersedes #868/#872/#889/#895/#900)1 failed checkP1 investigate failed check; this is the regime-filter sidecar that gates strategies on regime — directly enables the Codex state-machine REHAB→OOS_READY transition
#901audit hourly 05Zopenauto-merge if hourly cron
#898fix(B15): cross-asset correlation works without numpy1 failed checkP2 investigate; needed for cross-class verdicts
#893orphan_resolver_dryrun.py — 1,366 orphan closed_at preview1 failed checkP1 dry-run preview only; aligns with E-D3 work; investigate failed check
#892safe_db_archive.py — Hermes rule #1 gateopenP1 needed for any DB write that touches blocklist updates
#891mysql_sync entry_time/exit_time fallback — repairs 87% NULL closed_at orphans2 failed checksP0 directly addresses orphan rows — part of Codex truth-layer P0; investigate failed checks
#887WIN_RATE_TRAP_BLACKLIST — 6 crypto traps + 2 equity bombs1 failed checkP0 CRYPTO + EQUITY draggers — overlap with master-plan P0 #2 #3 #6; verify no duplicate blocks before merge
#885risk_policy v2 — tighten crypto per-symbol cap 10→5, per-trade 5→32 failed checksP1 CRYPTO de-concentration; investigate failed checks
#884mysql_sync infer category for NULL/empty rowsopenP0 class-attribution backfill — part of Codex truth-layer
#883quality_gates swarm-batch-1 source score retunings (5 sources)openP1 review
#881tv-orchestrator LL1 fill-relative TP/SLopenP2 TradingView paper-trade execution improvement
#879audit-dashboard Hermes 5-phase enhancements + 5000-round audit corpusopenP2 peer chatlog notes branch is 3152 commits behind main + 83 ahead, conflicting; rebase or close+cherry-pick selectively
#878short_engine BULL-regime gateopenP1 CRYPTO regime filter; complements #902
#877mysql_sync elite_score backfillopenP2
#876mysql_sync pnl_pct anomaly clamp [-100, 200]% — kills forex unit corruptionopenP0 FOREX — see FOREX P0 #2 above; this directly fixes the unit-corruption side of FOREX PF degradation
#873chore(loop): B13 statusopendocs-only
#862DB query bank: forex pnl corruption + 50 untested live pairs + JPY-cross 100% losersopenP1 FOREX investigation evidence; aligns with FOREX P1 deep-dive
#849Edge action plan + swarm peer-review harness (draft)draftupgrade-or-close; this master plan supersedes the action-plan portion
#846Shadow Probation panel on /audit Overview tabopenP1 directly supports Codex state-machine SHADOW state visualization
#900 / #895b13 regime filter earlier iterationsopenclose in favor of #902 (the COMPLETE version that supersedes per its title)

Recommended PR merge order (master-plan P0 priority):

  1. #904 (research orchestrator + edge stability) — unblocks verdicts
  2. #876 (FOREX pnl_pct clamp) — fixes FOREX unit corruption before FOREX cap goes in
  3. #891 + #884 (mysql_sync category + closed_at fixes) — truth-layer for Codex's data-trust gate
  4. #887 (WIN_RATE_TRAP_BLACKLIST) — overlaps with P0 #2 + #3; verify no duplicate blocks
  5. #902 (b13 regime filter) — enables Codex REHAB→OOS_READY transition; investigate the 1 failed check first
  6. #892 (safe_db_archive Hermes gate) — gates any subsequent DB write
  7. #893 (orphan_resolver_dryrun) — read-only preview, aligns with E-D3
  8. #846 (Shadow Probation panel) — UI for state-machine visualization
  9. #878 (short_engine BULL regime gate) — CRYPTO regime filter
  10. #885 (risk_policy v2 crypto caps) — CRYPTO de-concentration

Real-money gate (master) — synthesis

Adopt Codex's class state machine as governance scaffold, with Kimi's 7-check Go/No-Go as per-class checklist, Cursor's Tier-2 measurable criteria as the numerical floor, and Copilot's 2-consecutive-weekly as the persistence test.

Class state machine

Per-class current state (master-plan declaration)

ClassStateReason
COMMODITYREHABedge-stability STABLE but walk-forward missing + concentration not disclosed
EQUITYREHABedge-stability STABLE but capped-PnL not verified + claude_gainer_st contradiction
CRYPTOBLOCKEDedge-stability DECAYING + 3 named draggers (kimi_signal_tracking, baby_strats:crypto_soc_*, quan_engine over-share)
FOREXBLOCKEDedge-stability DECAYING + resolver bug + unit corruption (PR #876)
ETFREHABn at floor; expand universe to clear sample threshold
BONDBLOCKEDn=12 in verbatim chatlog read — way below n≥100 floor; cannot reach REHAB until sample expanded (multi-month effort)
FUTURESBLOCKEDsilent-dead per memory; need re-emission plan
INDEXREHABinsufficient data

Current LIVE_ELIGIBLE count: 0/6. Earliest LIVE_ELIGIBLE target: not before week 8 given REHAB→OOS_READY (2-4 weeks) + SHADOW (2-4 weeks) for the fastest class.

Test plan (per-class + cross-cutting)

Cross-cutting

Per-class

Peer plan v2 (swarm-revised) — already shipped

Peer Claude-B session pushed plan v2 + production code mid-document. Tracked here so this master plan stays current.

Plan v2 swarm-revised changes (commit 57d267a28e6)

  1. Phase 4 ↔ Phase 2 reorder (unanimous swarm) — measurement infrastructure before per-class scaling.
  2. NEW Phase 1.5 drift-clearance gate — must hold drift_alert=false for 7 consecutive days before Phase 2 advances.
  3. Walk-forward by_class downgraded to ADVISORY while drift hot — its numbers are themselves drifting in real-time during regime collapse. Phase 1.5.3 wire-in tags cards advisory_only=true; demotions deferred.
  4. 3 baseline numbers corrected by red-team:
    • EQUITY consistency: 87.5% (not 75% — Cursor's original cite was 12.5pp understated)
    • CRYPTO consistency: 68% (not 84% — original was 16pp overstated)
    • FOREX consistency: 48.1% (1.8pp correction, ≈ unchanged)
    Red-team read at 2026-05-11T21Z snapshot vs claude-b's read at 2026-05-10T04Z snapshot — discrepancy implies walk-forward.by_class IS drifting; argues for snapshotting at known cutoffs not live-reading.
  5. 3 risks added to plan v2 risk register:
    • Regime-overfit — Tier-1 promotion gate trained on a single high-VIX regime period may collapse when regime shifts.
    • No rollback — Opt B Tier-1 demotion has no rollback ledger; if a wrong demotion happens, no easy undo.
    • Plan cites non-existent funcswalk_forward_by_strategy() is PROPOSED-NEW, does NOT exist yet. Master plan must not assume it.

Production code shipped this session (peer)

CommitScopeFile(s)
cf4e924744aOpt B walk-forward Tier-1 promotion gate (consistency≥60 + sharpe>0). FOREX blocked from T1 per current dataaudit_trail/dashboard_generator.py
cf229ea31baW4 benchmark-relative trailing-30d return per system (primary_asset_class / pnl_30d_pct / trades_30d / benchmark_30d_pct / excess_return_30d_pct)audit_trail/dashboard_generator.py
4ea32d227cfOpt A TA-baseline panel on /audit (6-strategy benchmark cards per class) + _load_latest_ta_baseline() + renderTaBaseline() + nav hookaudit_trail/dashboard_generator.py + audit_dashboard/template.html
82a34bc0fdbtools/live_market_fetcher.py foundation (yfinance VIX/DXY/BTC/ETH/SPY/QQQ/GLD/TLT/oil + regime classifier + 1h cache)new file
5e4bc1efe63Block A fix (freebuff INDEX collision: ASSET_CLASSES "INDEX""INDEX_STOCK" + defense-in-depth in write_index() + delete stale INDEX.json + 9-class regen) + Phase 1.5.3 drift-advisory wire-in (Opt B re-tier loop now reads concept_drift.drift_alert; advisory_only=true when hot, demotions deferred)tools/edge/edge_stability.py + audit_trail/dashboard_generator.py
f740ace5c34CLAUDE2.MD A9-A13 + concept_drift root cause report + 37h quarantine verification (0 of 60 active picks match 30 quarantined pairs)docs + report

Concept-drift root cause (peer T4 verdict)

VIX -44.64% / 30d real regime collapse since 2026-04-22 — confirmed not pipeline noise. KS_D 0.31 vs 0.047 critical. Source: reports/concept_drift_root_cause_2026-05-11.md. This validates Codex's "fix truth layer first" position: drift is real and regime-driven, not a metrics artifact.

Master plan ripple-effects

Buffy enhancements (deepseek-v4-pro, PENDING) — queue into P1

Buffy session at docs/chatlogs/chatlog_2026-05-11_buffy_review.md + docs/chatlogs/progress_2026-05-11_buffy_enhancements.md. Code review of Opt-A / Opt-B / W4 found 3 issues + 4 enhancement opportunities. All status PENDING (not yet committed).

IDEnhancementProblem (verified by buffy code review)Master-plan ranking
E1 Add FOREX benchmark (DXY) to benchmark_return() live_market_fetcher.py:155-162 benchmark_map omits FOREX. FOREX systems (n=1801, PF 0.27) get benchmark_30d_pct=None in dashboard despite DXY already fetched as "DX-Y.NYB". One-line dict-add. P1 FOREX — directly supports FOREX deep-dive (master plan FOREX P1 action #3)
E2 Cache drift-pause check with 60s TTL quality_gates.py:4143 _drift_pause_active() reads dashboard_data.json from disk on every passes_active_gate() call. 60+ active picks = 60+ disk hits/cycle. Synchronous I/O in hot path. P1 perf — gates the eventual drift-pause flip; performance matters when active list grows
E4 Excess return alert (< -5%) W4 code at dashboard_generator.py:12700-12747 computes excess_return_30d_pct per system. Data exists; no monitor. Addresses Step 9 of 10-investigations plan. New w4_alerts key in dashboard_data.json. P1 monitoring — early warning system per class, complements edge-stability sidecar
E5 DRIFT staging dry-run mode DRIFT_AUTO_PAUSE_ENABLED is binary (0=advisory, 1=hard pause). No staging. Per CLAUDE2.MD: "Don't flip without staging-first per swarm consensus." Add DRIFT_STAGING_MODE=1 env var that logs would-block picks without actually blocking. P1 safety — prerequisite for ever flipping DRIFT_AUTO_PAUSE_ENABLED=1; aligns with peer T1 contingent task

Buffy methodology note: each enhancement follows before/change/after/verify/evidence pattern with line-range citations + swarm cross-check round + typecheck + code review log. Three rows in tracking tables currently empty — work hasn't started yet.

Issues found but not yet enhanced (queue):

Action: Buffy to proceed with E1/E2/E4/E5. Each lands as separate small PR with co-author + commit message citing buffy-progress file. Expected to clear the buffy queue within 1-2 days.

Peer remaining backlog — incorporated into master plan

Per peer reports/session_summary_2026-05-11.md:

P0 (immediate, blocking)

  1. Confirm Block A INDEX→INDEX_STOCK fix didn't break peer's PR #904 — verify before next force-push window.
  2. Verify drift_alert precondition runs in production — next hourly .github/workflows/audit-dashboard.yml regen check tier2_proven_strategies.cards[*].walkforward_gate.advisory_only.

P1 (this week)

  1. Phase 1.5.1 — 7-consecutive-day drift_alert=false history check (needs daily snapshot scaffolding).
  2. Phase 2.1 — extend alpha_engine/walkforward_validator.py to add COMMODITY + BOND + ETF to by_class output (currently 4 of 7 classes).
  3. Phase 2.2 — implement walk_forward_by_strategy() (PROPOSED-NEW; does NOT exist yet). Pairs with peer's per-strategy edge_stability table.
  4. Phase 4.1 — env-gate DRIFT_AUTO_PAUSE_DRY_RUN=1; 7d "would-pause" logging. Buffy E5 overlaps.
  5. Phase 4.2 — class_capital_gate(asset_class) with capital_gate_log.jsonl. PROPOSED-NEW.

P2 (next 2 weeks)

  1. Phase 5 Wave 1 (#1 + #2) — rolling-window profiling at 7/30/90/365/1095d + edge-decay heatmap.
  2. Phase 5 Wave 2 (#3 + #5) — cross-symbol std-dev block + regime tag persistence on closed picks (closes regime_validation.regime_wr_breakdown all-zero rows).
  3. Phase 5 Wave 3 (#7) — top-N portfolio Monte Carlo simulator (settles concentration debate before any real-money sizing — directly mitigates multi_asset_cot PF 19.19 / n=130 outlier risk).

Contingent (user-gated)

  1. T1 — Flip DRIFT_AUTO_PAUSE_ENABLED=1 (HIGH risk; Phase 4.1 must be clean for 7d first).
  2. T2 — Mutation autopsy on 14 quarantined strategies (LOW risk, read-only; due 2026-05-17, 7d post-quarantine).
  3. T3 — Verify multi_asset_cot PF 19.19 via MySQL (LOW risk; needs DB_STOCKS_PASSWORD). Master plan P0 #5.

Areas worth further investigation (from peer summary)

✓ Wave 1 SHIPPED — circuit-breaker HALT state removed

Commit 81bd0b86388 on main 2026-05-11 21:30Z. Removed alpha_engine/data/circuit_breaker_state.json (HALT, max_picks=0, 48d stale from 2026-03-24).

Verification PASSED: direct query against ejaguiar1_stocks::bt_backtest_trades shows MAX(imported_at) WHERE status IN ('WON','LOST') = 2026-05-11 20:00:59 with n=1,819,839. Forward validator unfrozen; WON/LOST writes resumed within 1 hour. Wave 1.5 independent fixes (lm_signals expire-cron, signal_tier writer, at_consensus_picks time-travel) still queued per kilo carveout.

Subsequent shipped commits

SHAChange
4a2d337a5dcP0 #2 + #3 + #4 — blacklist kimi_signal_tracking + 3 named crypto_soc_* draggers to BLOCKED_ASSET_STRATEGY_PAIRS + raise elite-score floors (FOREX 50→70, COMMODITY 50→65, EQUITY 50→60)

⚠ Kimi swarm audit (4-agent, 2026-05-11) — RAW-DB read contradicts dashboard verdicts

Archive: reports/kimi_edge_audit_2026-05-11/ (32 files — comprehensive_analysis_report.md, edge_audit_report.md, industry_standards_research.md, 7 rolling PNGs, 7 CSVs).

Kimi top-line claim: "4x inflated dashboard vs RAW DB"

MetricDashboard (post-filter)Kimi RAW DB
Win Rate (all classes)34-43%11.13% (6,178 / 55,510)
Total PnL+949% claimed-3.56% avg/trade × 55,510 = -197,487%
Sharpe Ratiovaries-2.34 annualized
Profit Factor1.490.46
ML accuracy32.6% (worse than coin flip)
Calibration96% conf → 0.9% actual WR (INVERTED)
Backtest vs Live (CRYPTO)42.4% BT WR / 11.3% live → -31.3pp gap

Reconciliation: same DB, different filters

The gap is real but BOTH reads are technically correct:

Master-plan adjustment: the dashboard "STABLE_EDGE" verdicts on COMMODITY + EQUITY still hold for the currently-active sleeve. But Kimi's WR 11.1% / Sharpe -2.34 RAW floor must be disclosed on /audit alongside the post-filter numbers — currently it isn't, which is the "4x inflation" Kimi flagged. Codex's readiness.by_class payload contract (P1) is the right place to surface both reads side-by-side.

Kimi findings that ARE net-new (not in current plan)

  1. ML calibration system-wide inverted (not just ETF/CRYPTO per memory). 96% confidence → 0.9% actual WR. Promote calibration verification from P2 → P0 #9. Memory project_performance_reality confirmed it on ETF/CRYPTO subset — Kimi extends to all classes.
  2. alpha_engine/anti_overfit_validator.py EXISTS but orphan. 13,886 bytes, last modified 2026-05-02. Contains CPCV / PBO / DSR code per Kimi inspection. P1 wire-up per CLAUDE.md Wire-Up Rule: production caller in calculate_smart_score / passes_smart_gate / score_pick. Once wired, automatically rejects strategies with PBO > 0.05 — the most critical defense against the 26-baby_strats-overfit pattern.
  3. Empty model files: random_forest.py, xgboost_model.py, lstm_model.py are 14-byte placeholders per Kimi. P2 verification — confirm + remove or implement.
  4. paper_trades table is EMPTY. Codex's SHADOW state requires 14-30d forward shadow tracking. Without paper_trades populated, SHADOW cannot be measured. P1 add — wire paper-trade variance recorder before any class can reach SHADOW.
  5. Concentration risk: USDCHF=X = 215% of total PnL impact; INJUSDT = 89% of alpha_engine PnL. USDCHF=X almost certainly the FOREX unit-corruption bug (PR #876 pnl_pct clamp [-100, 200]%). P0 dependency — merge #876 before any FOREX verdict.
  6. Day-of-week pattern (CRYPTO): Tuesday -3.37%, Friday -9.42%, Wednesday -13.85%, Monday -16.92%. Wednesday + Monday are worst. Future enhancement: time-of-day / day-of-week gate.
  7. EQUITY t-test p=0.115 closest to significance (Sharpe +0.67) but RAW WR 1.84% (15 wins / 814 picks). Means the EQUITY POST-FILTER numbers (Cursor 54%, peer 57.4%) come from a small sliver of "good" strategies — the other 799 unfiltered picks are noise. Filter-criteria audit needed before EQUITY promotion. P0 #10.

Kimi DSR / PBO / WFE framework — adopt for real-money gate

MetricMinimumSource
Deflated Sharpe Ratio (DSR)> 0.95Lopez de Prado AFML
Probability of Backtest Overfitting (PBO)< 0.05Bailey + Lopez de Prado
Walk-Forward Efficiency (WFE)> 60%Pardo
Min Track Record Length> 2 yearsAFML
Live Sharpe Ratio> 0.5Industry standard
Max Drawdown< 20%Charter Tier 2
Win Rate> 50%Codex T2 (or PF>1.5 substitute)

Adopt Kimi's 10-step validation pipeline as the real-money gate before LIVE_ELIGIBLE: pre-register hypothesis → in-sample → WFA → CPCV → DSR → structural break tests → sensitivity analysis → transaction cost analysis → 3-6mo paper trading → graduated deployment (5% → 25% → 100%).

Net: Kimi's verdict is harsher than master plan but doesn't invalidate the COMMODITY/EQUITY STABLE_EDGE finding for the FILTERED sleeve. It does add 3 new P0 actions (calibration verify, EQUITY filter-criteria audit, PR #876 dependency surfaced) + 1 P1 (anti_overfit_validator wire-up) + a stricter real-money gate (10-step / DSR / PBO / WFE).

▶ /audit Decay Alerts — Action Required (2026-05-12)

14 HIGH rolling-7d WR drop alerts (>20pp baseline decay) + 3 MEDIUM staleness alerts surfaced on /audit. Per CLAUDE.md MUTATION_THREE_AXIS_PROTOCOL: mutation-before-kill applies; REDUCE-not-BLOCK is the soft action.

SevStrategy7d WR vs baselineStatus this sessionMaster-plan action
HIGHmyfxbook_retail_contrarian19% vs 46% (-27pp)BLOCKED commit a64e80e70d1Done
HIGHforex_rsi2_mean_reversion10% vs 44% (-34pp)BLOCKED via fx_kill_switch (commit a64e80e70d1)Done
HIGHcta_cross_asset_tsmom28% vs 46% (-18pp)OpenP1 Add to BLOCKED_ASSET_STRATEGY_PAIRS (FUTURES) — alert is class-level drag
HIGHig_contrarian_sentiment20% vs 45% (-25pp)OpenP1 CRYPTO sentiment-contrarian; quarantine pending mutation-axis analysis
HIGHfutures_momentum4% vs 42% (-38pp)OpenP0 Largest drop. FUTURES is BLOCKED per master plan — add to BLOCKED_ASSET_STRATEGY_PAIRS (FUTURES) outright
HIGHst_multi_day_momentum47% vs 68% (-21pp)OpenP1 Soft-demote — still positive Sharpe likely; reduce sizing 50%
HIGHmacd_rsi_m04853% vs 73% (-20pp)OpenP1 Boundary; monitor 7d before action
HIGHema_momentum_m00636% vs 56% (-20pp)OpenP1 Below 50% sub-floor; quarantine
HIGHhs_lb_None0% vs 34% (-34pp)OpenP0 0% recent WR = total decay. hs_lb_None = head-shoulders + None-leverage suffix = likely a parsing bug emitting null variants. Investigate before block.
HIGHcrypto_rsi_whaleconfirmed_v118% vs 55% (-37pp)OpenP1 CRYPTO momentum-with-whale-confirm; large drop suggests whale signal degraded
HIGHkeltner_compression_expansion_eth_v129% vs 51% (-22pp)OpenP1 ETH-specific Keltner; consider symbol-axis mutation per MUTATION_THREE_AXIS_PROTOCOL
HIGHvwap_deviation_reversion_sol_v127% vs 47% (-20pp)OpenP1 SOL-specific VWAP; similar mutation analysis
HIGHMeanReversionBB25% vs 60% (-35pp)OpenP0 Large baseline-to-recent drop; BB mean-reversion likely regime-broken (drift_alert TRUE confirms)
HIGHclaude_ml_moderate_mut42% vs 68% (-26pp)OpenP1 Mutation-axis name suggests genetic-evolved variant; check anti_overfit_audit.json for DSR before action
MEDcopy_trader_clonessilent 93hOpenP2 Likely Wave 1 will unblock — monitor 24h post-Wave-1 then escalate
MEDstocksunify2silent 96hOpenP2 Same — monitor
MEDkimi_live_signalssilent 98hOpenP2 Distinct from blacklisted kimi_signal_tracking; verify if intentionally stopped (peer chatlog signal_tier writer pause) or accidental

Recommended next-session work

▶ HIGH CONVICTION filter audit — 2026-05-12 vs session data

Current /audit HC panel shows per-class thresholds from 2026-04-15 data. Re-verified against session work (Kimi RAW DB read, anti-overfit DSR sidecar, P0 #10 EQUITY filter trace).

ClassCurrent HC verdictSession data checkAction required
CRYPTO EDGE — FWD WR≥60% + Score≥55 + Trust≥4 → "WR 60.3% on N=562 (+9.7pp)" Anti-overfit DSR sidecar (post-Wave-1): 4 EDGE_LIKELY_REAL ml_enhanced sleeves (INJUSDT/FETUSDT/DYDXUSDT/RENDERUSDT 1d+1h variants). 33 OVERFIT_LIKELY in same family (mostly _15m_*). P1 Add anti-overfit gate: HC pass requires strategy IN anti_overfit_audit.json::strategies WHERE verdict='EDGE_LIKELY_REAL'. Auto-rejects the 33 OVERFIT_LIKELY sleeves even if FWD WR ≥ 60% passes.
EQUITY EDGE — FWD WR≥55% + Score≥50 + Trust≥5 → "WR 68.1% on N=72 (+29pp)" P0 #10 verify: dashboard 54% WR is honest for tagged-EQUITY subset. stocks_rsi2_pullback (n=70, WR 62.9%, avg +0.78%) is the real EQUITY edge sleeve. n=72 HC cohort likely overlaps this. Note: n=72 still below master-plan n≥100 charter floor. P2 Verify HC cohort overlap with stocks_rsi2_pullback. Either rename verdict to "EDGE (thin n=72)" or widen FWD WR floor to admit more samples.
FOREX EDGE — FWD WR≥45% + Score≥50 + Trust≥5 → "WR 65.8% on N=73" RED FLAG. Anti-overfit DSR sidecar has ZERO FOREX EDGE_LIKELY_REAL. Kimi RAW FOREX WR 9.9% n=605, PF 0.28 (stat-significantly LOSING). Master plan FOREX state = BLOCKED per Codex state machine + elite_score floor raised to 70 (commit 4a2d337a5dc) + 3 toxic strategies re-blocked (commit a64e80e70d1). Showing "EDGE" with N=73 sample is small-sample artifact masking systematic decay. P0 Downgrade FOREX HC verdict from EDGE to BLOCKED or DEAD. The "WR 65.8% on N=73" is statistically vulnerable; bootstrap CI almost certainly straddles 0.50. Cite master plan FOREX BLOCKED state. This is the most dangerous current verdict on the /audit page.
COMMODITY WEAK — Trust≥5 → "PF 1.28 on n=273" DSR sidecar: cot_positioning n=104, WR 86.5%, Sharpe +1.377, DSR=1.0000 (highest of any strategy). Antigravity audit confirmed cot_positioning_CT_locked LONG = 89.8% WR / PF 13.1 (n=49). COMMODITY aggregate is mediocre because it includes non-COT strategies dragging the average; the COT sleeve specifically has REAL edge. P1 Add COMMODITY HC carve-out: strategy IN (cot_positioning, cftc_cot_commercial_signal, cot_positioning_CT_locked) AND elite_score ≥ 65. Mark as EDGE via that filter. Class aggregate stays WEAK without the carve-out — that's correct.
BOND NO DATA — n=8 Kimi BOND: n=18 PF 1.72 WR 55.6%. Sub-floor (n<100 master plan charter). P2 Update n from 8 → 18 (live count) + note "needs n≥100 multi-month accumulation per Wave 1 unfreeze." No filter change.
ETF DEAD — PF 0.28 n=19 Kimi ETF: n=88-100 / WR 53.4% / PF 1.20. Master plan ETF state = REHAB (not DEAD). Class is at sample floor, not dead. Cleanest OOS profile per Codex. P1 Reclassify ETF from DEAD → THIN_REHAB. Update text "PF 0.28 n=19" → "PF 1.20 n=88-100, needs n≥150 for OOS_READY promotion (Codex state machine)". The DEAD verdict is stale + miscalibrated.
FUTURES DEAD — WR 5.9% n=17 Per memory project_futures_kill_without_replacement: silent-dead, 5.9% WR / -96% PnL post-2-kills. Master plan FUTURES state = BLOCKED. Verdict accurate. P3 Keep DEAD verdict; flag for re-emission plan or formal retire from /audit per master plan FUTURES section.

Summary — required HC filter updates

  1. P0 most urgent: FOREX downgrade from "EDGE" to BLOCKED/DEAD. The current EDGE claim is the single most misleading verdict on /audit and could lead a user to size into a sub-floor (PF 0.28) class.
  2. P1: ETF reclassify DEAD → THIN_REHAB (verdict was based on stale n=19 data).
  3. P1: COMMODITY carve-out — admit cot_positioning family as EDGE while keeping class aggregate WEAK.
  4. P1: CRYPTO + EQUITY + COMMODITY add anti-overfit DSR gate (require DSR ≥ 0.95 per anti_overfit_audit.json).
  5. P2: Bump "2026-04-15" refresh date to "2026-05-12" once gates rewired.
  6. P2: Refresh BOND n=8 → 18 (live).
  7. P2: Display DSR verdict-counts inline (e.g. "CRYPTO: 4 EDGE_LIKELY_REAL / 33 OVERFIT_LIKELY") sourced from anti_overfit_audit.json::verdict_counts.

Implementation surface: audit_dashboard/hc_filter.js (per-asset-class HC gates per CLAUDE.md) + audit_dashboard/template.html:~1203-1300 (HC overlay text). Each verdict update is a small JSON-config tweak per config/hc_thresholds.json if it exists.

▶ DB Health red-tier — remediation status (refreshed 2026-05-12 03:30Z)

Per-metric progress on the 6 red-tier items in the DB Health — 2026-05-08T15:00Z panel. "Action required" commentary added inline. Updated 2026-05-12 03:30Z with sign-coherence guard, ghost-row triple-axis block, and CI commit-list fix.

MetricOriginal valueAction shipped this sessionExpected post-fixStatus
Forward Validator Freshness 840h since last WON/LOST (2026-04-02) Wave 1 commit 81bd0b86388 — rm alpha_engine/data/circuit_breaker_state.json (HALT max_picks=0 from 2026-03-24) VERIFIED: bt_backtest_trades.MAX(imported_at) = 2026-05-11 20:00:59 (n=1.8M) RESOLVED
WON-vs-PnL contradiction YES (avg pnl per status — writer bug) 2026-05-12 03:00Z — direct writer fix: commit 22b677c1167 adds sign-coherence guard to both atomic status+pnl writers — alpha_engine/outcome_resolver.py:1670 (resolver path) and audit_trail/mysql_client.py:628 (mysql_close_trade canonical write). When source supplies exit_reason=TP + pnl_pct < 0 (or SL + pnl > 0) the guard now trusts the pnl sign and logs won_pnl_contradiction: WARNING. Plus earlier confidence-normalizer migration (613c65cb, all 9/9 callsites). Stops new contradiction rows; existing contradicted rows still in DB until backfill SQL pass. PARTIAL (forward-fixed)
PnL Integrity (sampled) 42.0% (58k/100k mismatch >1pp) PR #876 merged 818ff966222 — writer-side clamp [-100, 200]% in mysql_trading_sync.py (kills USDCHF=X -106,700% outlier); P0 #7 1c535a19105 read-side clamp at dashboard_generator.py:9309 max_dd cumulation Future rows clamped; legacy poisoned rows still in DB until backfill PARTIAL (FORWARD-FIXED)
Phantom EXPIRED rows 100.0% (1 class, worst-case) Wave 1 unfreeze + PR #891 merged 486f7bf2989 — mysql_sync entry_time/exit_time fallback (repairs 87% NULL closed_at orphans on future syncs) Resolver lag still present; expected reduction over next 3-7 cron cycles; Wave 1.5a/b/c (lm_signals + signal_tier + at_consensus_picks) still queued PARTIAL
Raw-Pick Outcome Coverage 0.09% (121/136,374 resolved) 2026-05-12 03:00Z root cause confirmed: Resolver itself is correct — it correctly returns 0 because all 8,151 entries in closed_picks.json already have terminal status (per Investigator B). The real bug is upstream: NO writer reads ACTIVE rows from at_raw_picks, detects TP/SL/time-exit, and feeds new entries into closed_picks.json. Existing references in alpha_engine/outcome_resolver.py:1931, :2317, crypto_risk_gates.py:179, scanner.py:4781 all RE-write the same 8,151 entries. mysql_client.py:601 mysql_close_trade() exists but has no caller from TP/SL detection. Independent of Wave 1 unfreeze. Needs a new sync_active_mysql_picks_to_json() that reads ACTIVE at_raw_picks, computes per-class TP/SL hit logic, writes terminal entries to closed_picks.json + back to at_raw_picks. Queued as P0 follow-up. DIAGNOSED — IMPLEMENTATION QUEUED
Ghost Rows (constant pnl_pct) 655,000 (18 cohorts, n>1000, distinct<5) 2026-05-12 03:15Z — symbol-axis quarantine shipped: commit 597819d79c7 introduces BLOCKED_ASSET_STRATEGY_SYMBOL_TRIPLES in audit_trail/quality_gates.py with 5 documented cohorts: (CRYPTO, quan_engine, MATICUSDT) 215k rows, (CRYPTO, KIMI_signal_tracker, ETHUSDT/BTCUSDT) multi-bucket, (CRYPTO, irb_hoffman, ADAUSDT), (CRYPTO, funding_rate_carry, ROBOUSDT). Enforced at passes_active_gate (kills new emissions) AND dashboard_generator.py::_is_historical_blocked_pick (excludes from historical aggregates). meta_strategy 1.6M-row template family deferred until db_health.json::ghost_rows.top_cohorts repopulates (currently [] on the 2026-05-08 snapshot). Expected ~220k+ rows excluded from CRYPTO aggregates on next generator run; total 655k → ~440k. Remaining ~430k = meta_strategy family + small long-tail. PARTIAL (5 of ~18 cohorts blocked)

Action required (master-plan commentary — refreshed 2026-05-12 03:30Z)

Recommended next checkpoint (post-fix re-snapshot)

Re-pull db_health.json at 2026-05-12 06:00Z after 2-3 hourly cron cycles. Expected deltas if fixes hold:

⚠ DB Health red-tier crisis — original P0 remediation plan (Wave 0.5 → 4)

Live /audit DB Health panel (snapshot 2026-05-08T15Z but values persisted) shows 6/6 red metrics. This is the truth layer Codex says must be fixed first. Remediation already documented at reports/db_evidence_graded_final_2026-05-08.md (222 lines).

MetricValueSeverityRoot cause
PnL Integrity (sampled)42.0% (58k / 100k mismatch >1pp)REDresolver stalled — F1 cascade
Ghost Rows (constant pnl_pct)655,000 (18 cohorts n>1000, distinct_entries<5)REDF2 — synthetic stamping in writer
Forward Validator Freshness840h since last WON/LOST (2026-04-02 12:00)REDF1 — circuit_breaker_state.json HALT persisted from 2026-03-24, MAX_ACTIVE_PICKS=0 chokes validator
Phantom EXPIRED rows100.0% (1 class, worst-case)REDF3 race condition — resolver doesn't run before expire-cron
Raw-Pick Outcome Coverage0.09% (121 / 136,374 resolved)REDdownstream of F1 cascade
WON-vs-PnL contradictionYES (avg pnl per status — writer bug)REDF5 confidence inversion — STRONGEST evidence (kilo + deepseek confirm)

Wave 0.5 — Pre-deploy verification (5 min, READ-ONLY)

  1. Confirm alpha_engine/data/circuit_breaker_state.json is git-committed (last touched commit fa9b6b38109 2026-03-28).
  2. Run staleness SQL: SELECT MAX(imported_at) FROM bt_backtest_trades WHERE status IN ('WON','LOST') — expect 2026-04-02.
  3. Gating-check grep: grep -n "circuit_breaker\|is_locked" alpha_engine/{forward_validator,outcome_resolver,production_scanner}.py.
  4. Survivor signal_tier check: SELECT signal_tier, MIN(ts), MAX(ts) FROM at_discord_notifications WHERE signal_tier IS NOT NULL GROUP BY signal_tier.

Wave 1 — Unfreeze (5 min)

  1. rm alpha_engine/data/circuit_breaker_state.json
  2. Commit: fix(circuit-breaker): rm 2026-03-24 stale HALT state — unfreezes forward_validator (35d freeze)
  3. Push to main; watch one cycle of .github/workflows/audit-dashboard.yml (hourly cron).
  4. Verify: MAX(imported_at) for WON/LOST advances past 2026-05-11.

Precedent: PR #497 fixed R3 stale-state on 2026-04-27 (same class of bug). Same 2026-03-24 leak still live — referenced in freeze_2026_04_02_root_cause_2026-05-08.md:115. Memory ref: feedback_circuit_breaker_stale_state_leak.

Wave 1.5 — Independent pipeline checks (~30 min each)

Kilo carveout: lm_signals + at_consensus_picks + at_discord_notifications fail independently — NOT auto-fixed by circuit-breaker deletion. Each has its own cron.

  1. lm_signals expire-cron: exit_price=0 in 96.2% of expire-cron rows (F10) — patch cron to skip expire-write when exit_price unset.
  2. at_discord_notifications.signal_tier: 99.99% NULL (F8) — locate writer; backfill schema.
  3. at_consensus_picks time-travel: 57.3% rows with future-dated entries (F4) — writer guard against future timestamps.

Wave 2-4 — Schema + route fixes

Tie-in to SUPREME EDGE ENHANCEMENT

Verbatim chatlog cross-check (peer docs/chatlog_verbatim_2026-05-11.md, commit 77f42fa5c3e)

Subagent scan of peer's 526-line verbatim chatlog identified 3 mandatory blockers + 1 class downgrade:

  1. Phase reorder mandatory (swarm-unanimous): Phase 4 (measurement infrastructure) MUST precede Phase 2 (fast-track classes). Already captured in §"Peer plan v2" above; reaffirmed here. No per-class scaling until measurement gates exist.
  2. Phase 1.5 drift-clearance gate: require drift_alert=false for 7 consecutive days before Phase 2 kickoff. Treat walk-forward as advisory-only while drift hot. Halt + recalibrate if drift persists > 30d.
  3. BOND state downgrade: verbatim chatlog reads n=12 (not n=11 from Cursor or n=18 from Kimi). Either way well below n≥100 floor. Master plan BOND state changed from REHAB → BLOCKED above.
  4. INDEX_STOCK class rename adopted: 8 classes now (CRYPTO/EQUITY/COMMODITY/FOREX/ETF/BOND/FUTURES/INDEX_STOCK). Peer Block A commit 5e4bc1efe63 ships the rename + defense-in-depth.

COMMODITY / EQUITY — STABLE_EDGE but NOT auto-promoted to OOS_READY

Verbatim chatlog edge-stability verdicts (COMMODITY STABLE_EDGE n=167, EQUITY STABLE_EDGE n=272) do NOT auto-promote these classes to OOS_READY. Per Codex truth-layer-first policy + master-plan promotion gates, these classes hold at REHAB until:

edge-stability sidecar verdict alone is necessary but NOT sufficient for promotion.

User policy quotes (from verbatim chatlog)

Buffy enhancements — review confirmation

Full-file re-read of both buffy artifacts (chatlog 43 lines + progress 134 lines). Confirmed:

Why this plan is not generic (verification checklist)