← Back to Updates · 5-agent comparison → · peer chatlog →

SUPREME EDGE ENHANCEMENT

Master Plan — 5-Agent + Peer Chatlog + Buffy + DB Health Synthesis

Generated 2026-05-11 · Branch feat/audit-dashboard-enhancements-hermes-2026-05-09 · Skill .claude/skills/money-maker-ready/SKILL.md v1.0 · Peer chatlog updates/2026-05-11-session-chatlog-claude-opus-47.md · PR #904 ready to merge at 6d7ccd928fd

TL;DR

Edge-stability sidecar (peer Phase G) verifies 2 stable classes + 2 decaying + 4 too-thin. COMMODITY and EQUITY are STABLE_EDGE with PF 3.61 / 2.04. CRYPTO and FOREX DECAYING_EDGE at PF 1.39 / 0.57. BOND, ETF, FUTURES, INDEX INSUFFICIENT_DATA.

Real-money posture (synthesis): Codex all-classes-first state machine + Kimi 7-check Go/No-Go gate + Cursor measurable Tier-2 criteria + Copilot 2-consecutive-weekly confirmation. No class trades live until all six major classes ≥ SHADOW.

Immediate (next 24h): Merge PR #904 (4 P1 swarm-fixes shipped, SSRF guard added, 16 smoke tests, 2 fabricated review claims rejected). Then execute 7-item P0 cluster.

Source inputs (what fed this master plan)

#	Source	Author / model	Key contribution
1	Claude Code plan	Opus 4.7 (this session)	Flagged `claude_gainer_st` winner-vs-blacklist contradiction; identified `multi_asset_cot` PF=19.19 needing DB verification
2	Cursor plan	Cursor Plan Mode	Canonical per-class baseline numbers (n=408/443/100/7875/1825/11); 4-phase fast-track with measurable n-targets
3	Copilot plan	GitHub Copilot Chat	2-consecutive-weekly Tier-2 promotion gate; ETF+COMMODITY rollout order + CRYPTO curated sleeve parallel
4	Kimi plan	Moonshot Kimi (IDE)	7-check Go/No-Go gate; named symbol-level edges (`cot_positioning_CT_locked` 89.8% WR, `rs-breakout-scout` 77.8% WR); 1-hour P0 fixes list
5	Codex plan	OpenAI ChatGPT Codex	Class state-machine BLOCKED→REHAB→OOS_READY→SHADOW→LIVE_ELIGIBLE; `readiness.by_class` payload contract; truth-layer-first policy
6	Peer Claude chatlog	Claude Opus 4.7 (1M context, peer session)	9-phase work A-I; edge-stability sidecar with VERIFIED per-class verdicts; PR #904 swarm-reviewed + ready to merge; 14-item remaining backlog

Per-class action items (specific, not generic)

COMMODITY STABLE_EDGE

Edge-stability verdict: PF 3.61 / WR 55.7% / n=167 (peer Phase G). Cursor reported live asset_class_health PF 3.92 / WR 67.4% / n=408 — discrepancy is sample-window difference (edge-stability uses rolling window; asset_class_health cumulative).

Specific edges named:

Kimi: cot_positioning_CT_locked LONG 89.8% WR, PF 13.1 (n=49)
System-level: multi_asset_cot PF 19.19 / n=130 (Claude Code flagged for DB verification — implausibly high)

Actions:

P0 Verify multi_asset_cot PF=19.19 via DB query against ejaguiar1_stocks — data integrity smoke test. If real, name as Tier-1 seed.
P0 Disclose CT=F / KC=F symbol concentration in dashboard (Codex) — current PF may be one-symbol artifact.
P1 Add COMMODITY to alpha_engine/walkforward_validator.py output path (Cursor — currently missing from walkforward.by_class). Verify surfacing in audit_trail/dashboard_generator.py.
P1 Wire real CFTC COT data to validate the 89.8% WR cot_positioning_CT_locked pocket (Kimi).
P2 Add term-structure / inventory / seasonality features (Codex).

Promotion gate (master): walk-forward by_class block emits real folds + concentration disclosed + DB-verified PF + sustained Tier-2 (PF≥1.5 / WR≥50 / MDD≤20) for 2 consecutive weekly snapshots → OOS_READY. Then 14-30d SHADOW.

EQUITY STABLE_EDGE

Edge-stability verdict: PF 2.04 / WR 57.4% / n=272 (peer Phase G). Cursor cumulative: PF 1.60 / WR 54.0% / n=443. Convergent — EQUITY is the strongest broad candidate per all 5 plans + verified by peer sidecar.

Specific edges named:

Kimi: rs-breakout-scout LONG 77.8% WR, PF 6.7 (n=18)
Kimi: Breakout Momentum LONG 57.9% WR / PF 1.53 (n=38)
System: aggregated_picks (77.3% WR / PF 6.42 / n=441) — Claude Code flagged "aggregator artifact suspect"
System: claude_gainer_st (78.5% WR / PF 6.12 / n=3472) — Claude Code flagged contradiction: in BLACKLISTED_STRATEGIES at alpha_engine/config.py:216 yet tops the leaderboard

Actions:

P0 Reconcile claude_gainer_st winner-vs-blacklist contradiction (Claude Code). Confirm enforcement at exec gate, not just intake — memory feedback_gate_at_execution_not_generation.
P0 Verify capped-vs-raw PnL gap (Kimi flagged 680% MDD anomaly; Codex made it a payload-contract field capped_vs_raw_pnl_gap).
P1 Bottom-symbol pruning + High-Conviction parity (Codex). Audit HC filter against confidence-inversion on ETF/CRYPTO per memory project_performance_reality.
P1 Add earnings-drift + sector-relative-strength + breadth features (Codex).
P2 Push n from 443 toward 600+ with same risk-adjusted PF (Cursor target).

Promotion gate (master): capped MDD verified + claude_gainer_st reconciled + bottom-symbol pruning improves recent PF without breaking OOS consistency ≥ 80% → OOS_READY. Then 14-30d SHADOW.

CRYPTO DECAYING_EDGE

Edge-stability verdict: PF 1.39 / WR 46.5% / n=1521 (peer Phase G). Cursor cumulative: PF 1.39 / WR 47.4% / n=7875. Kimi: PF 1.26 / WR 44.8% / n=8166. All converge — class is decaying.

Specific draggers named:

kimi_signal_tracking: -954% PnL / PF 0.26 — named in Claude Code + Kimi + Codex as P0 quarantine target
baby_strats:crypto_soc_*: 12 overfit flags in fwd_vs_bt_divergence.rows — 66% BT WR vs 32% live (Kimi -33.6% decay) — surgical quarantine proposal exists at reports/baby_strats_overfit_quarantine_proposal_2026_05_10.md
quan_engine: 18% volume share at PF 0.70 (Kimi — cap to 12% of CRYPTO volume)
alpha_engine_fast: PF 0.62 / -127% PnL (Claude Code; current CRYPTO drag #1 per memory project_strategy_state_2026_05_03)

Specific edges (sleeves) inside the mediocre aggregate:

Kimi: st_fear_greed_contrarian 94% WR (promote to High-Conviction gating)
Codex: score CRYPTO by sleeve/subsystem quality rather than gross class aggregate

Actions:

P0 Blacklist kimi_signal_tracking at alpha_engine/config.py:216. Verify enforcement at exec gate.
P0 Ship existing baby_strats:crypto_soc_* quarantine proposal (peer Phase C already added unit tests for blocklist enforcement; 94/94 tests pass per peer chatlog Phase C-E-D1).
P0 Cap quan_engine to 12% CRYPTO volume share (Kimi).
P1 Promote st_fear_greed_contrarian to High-Conviction gating (Kimi).
P1 Wire decay-replacement pipeline (peer remaining task): when edge_stability_CRYPTO.json::consistency_verdict == DECAYING_EDGE, trigger P1 swarm targeting "what replaces strategy X?".
P2 Add funding / basis / open-interest / on-chain flow features (Codex).

Promotion gate (master): all 3 P0 quarantines shipped + post-quarantine forward window PF ≥ 1.5 on rolling 30d + decay-replacement pipeline live → OOS_READY. Then 14-30d SHADOW.

FOREX DECAYING_EDGE

Edge-stability verdict: PF 0.57 / WR 40.7% / n=1424 (peer Phase G). Cursor cumulative: PF 0.28 / WR 41.8% / n=1825. Kimi: PF 0.28 / WR 45.6% / n=1249. Sub-tier across all reads. Per memory CLAUDE.md FOREX is "genuinely sub-floor — apply mutate-before-kill protocol, do NOT silently kill".

Actions:

P0 Hard-cap FOREX sizing at 0 until PF ≥ 0.8 (Kimi + Cursor). Explicit per-class gate, not silent kill.
P0 PR #876 pnl_pct anomaly clamp [-100, 200]% — verify merged or merge it. "Kills forex unit corruption" per PR title. Some FOREX PF degradation may be unit-conversion bug, not real decay.
P1 Spawn FOREX deep-dive subagent per CLAUDE.md major-goal mandate: reports/deep_dive_forex_*.md with per-source autopsy + external-replication options (DBMF/KMLM/MyFXBook) + 30/60/90-day rescue plan + acceptance criteria. Memory feedback_noncrypto_resolver_live_close_bug: outcome_resolver.py:384-405 closes at yfinance spot every run with 1bp WIN threshold — ~1700 picks mislabeled, root cause of every polluted non-crypto kill claim.
P1 Wire COT / DXY-beta / carry-rate-differential / news-blackout features (Codex). Use mutate-before-kill per docs/MUTATION_THREE_AXIS_PROTOCOL.md.
P2 Rebuild FOREX from scratch with new TP/SL caps + session timing (Kimi — Month 2 work).

Promotion gate (master): resolver bug fixed + COT/DXY/carry wired + PF ≥ 1.2 + WR ≥ 45 for sustained rolling window → REHAB. Cannot reach OOS_READY without ≥ 1 full quarter of clean post-fix data.

ETF INSUFFICIENT_DATA

Edge-stability verdict: INSUFFICIENT_DATA (peer Phase G; n below floor). Cursor cumulative: PF 1.48 / WR 60.0% / n=100. Kimi: PF 1.20 / WR 53.4% / n=88. Class is at the n floor — promotion gate blocked on sample.

Actions:

P1 Expand ETF universe (XLF, XLE, XLK) to reach n=120 → 180 (Kimi + Cursor target).
P1 Block leveraged ETFs (Codex — risk policy).
P2 Add sector-theme concentration caps + AUM + expense-ratio filters (Codex).
P2 Controlled emitter expansion only for ETF strategies with positive OOS decay + acceptable drawdown (Cursor).

Promotion gate (master): n ≥ 100 AND PF ≥ 1.5 AND consistency ≥ 80% on edge-stability sidecar → OOS_READY. ETF has the cleanest OOS profile per Codex, so this should be the fastest to promote once n threshold cleared.

BOND INSUFFICIENT_DATA

Edge-stability verdict: INSUFFICIENT_DATA. Cursor cumulative: PF 0.66 / WR 54.5% / n=11. Kimi: PF 1.72 / WR 55.6% / n=18. All plans agree: n too thin for any promotion verdict.

Actions:

P1 Add BOND to alpha_engine/walkforward_validator.py output path (Cursor — currently missing from walkforward.by_class).
P2 Expand BOND universe + add duration / risk filters (Codex).
P2 Focus on data generation quality before optimization (Cursor).

Promotion gate (master): n ≥ 100 (multi-month effort) → BOND can re-enter the rotation. Keep paper-only until then.

FUTURES INSUFFICIENT_DATA

Edge-stability verdict: INSUFFICIENT_DATA. Memory project_futures_kill_without_replacement: Futures module silent-dead (5.9% WR, -96% PnL after 2 strategies killed + no replacements added).

Actions:

P1 Document FUTURES current pipeline state — is anything emitting? If yes, why no signals? If no, was the kill intentional?
P2 Per Codex: "FUTURES stay excluded until enough data to join the framework."
P3 If reactivating, apply mutate-before-kill protocol on the killed strategies before recommending re-emission.

INDEX / OTHER INSUFFICIENT_DATA

Peer edge-stability sidecar reports INDEX class with INSUFFICIENT_DATA. Not enumerated in any of the 5 agent plans. Treat as "not yet a class."

Action: Defer until edge-stability sidecar emits real metrics for INDEX category.

Cross-cutting P0 cluster (next 24h, all-class impact)

#	Action	Effort	Source plan(s)	Memory ref
1	Merge PR #904 (research orchestrator + edge stability sidecar) — already MERGEABLE/CLEAN at `6d7ccd928fd`	0.1h	peer chatlog Phase I	—
2	Blacklist `kimi_signal_tracking` at `alpha_engine/config.py:216` + verify exec-gate enforcement	1h	Claude Code, Kimi, Codex	`feedback_gate_at_execution_not_generation`
3	Ship `baby_strats:crypto_soc_*` quarantine via per-strategy BLOCKED_ASSET_STRATEGY_PAIRS at `audit_trail/quality_gates.py:1499`	1h	4/5 plans	`reports/baby_strats_overfit_quarantine_proposal_2026_05_10.md`
4	Hard-cap FOREX sizing at 0 until PF ≥ 0.8 — explicit per-class gate (NOT silent kill)	1h	Kimi, Cursor	CLAUDE.md FOREX directive; `docs/MUTATION_THREE_AXIS_PROTOCOL.md`
5	Verify `multi_asset_cot` PF=19.19 via DB query against `ejaguiar1_stocks`	1h	Claude Code	—
6	Reconcile `claude_gainer_st` winner-vs-blacklist contradiction	1-2h	Claude Code	`feedback_gate_at_execution_not_generation`
7	Verify max-drawdown calc uses capped PnL (Kimi 680% MDD smell-test)	1h	Kimi, Codex	—
8	Cap `quan_engine` to 12% CRYPTO volume share	1h	Kimi	—

P1 cluster (week 1, structural)

Implement Codex's readiness.by_class payload contract (class state-machine fields + capped_vs_raw_pnl_gap + single_symbol_concentration + leaders.by_class + draggers.by_class).
Add walk-forward coverage for COMMODITY + BOND in alpha_engine/walkforward_validator.py; surface in audit_trail/dashboard_generator.py.
Fix drift detector — hf_stats.concept_drift.KS_D uncomputed-zero bug + refresh 19-day stale hf_stats. Wire drift → auto-pause sizing when D > 0.10 (peer remaining task "Drift-pause activation Phase 1").
Reconcile /audit threshold text with docs/PERFORMANCE_CHARTER.md v1.0.
Add last_signal_date to systems payload (Claude Code — absent for all top-6 winners).
Peer high-priority backlog: v3b LLM-driven signal translator (peer Phase H §A) — likely flips several NO_EDGE → MIXED/GO once spec-faithful signals replace SMA proxy. ~$1/run dispatcher.
Peer high-priority backlog: Re-fire P5 swarms with v3a numbers (~$0.35) — current P5 verdicts cached from pre-v3a stub numbers.
Investigate-then-quarantine remaining top draggers (mercury2_fast, ml_bg_system_b, copy_trader_highscore) per docs/MUTATION_THREE_AXIS_PROTOCOL.md.

P2 cluster (weeks 2-4, per-class rehab in parallel)

See per-class action lists above. Cross-cutting:

E-D3 — dry_run kwarg on smart_picks_engine + production_scanner + dashboard_generator (peer remaining backlog medium).
CPCV upgrade — swap walk-forward for CPCV in p3_backtest_runner.py (peer remaining backlog; closes project_cpcv_gap_2026_04_28).
Per-asset-class deep-dive swarm questions — 35 specific questions across 7 classes captured in DAILY_IDEAS.MD §B (peer remaining backlog).
Re-run cross-permutations after data-trust + walk-forward fixes; verdict on per-class edge existence (Claude Code).
Audit HC filter against confidence-inversion on ETF/CRYPTO per memory project_performance_reality.

P3 / P4 / P5 (longer horizon)

P3: Decay-replacement pipeline (peer remaining backlog low); HEAD-check rate-limiter; cross-link research_index.html ↔ edge_stability.html; NO_EDGE knowledge base; tests for tools/research/ (peer remaining backlog low). Riskfolio-Lib status: 9 files killed by peer Phase D (dep no-install on Py 3.14 + duplicates pypfopt fallback). Reconsider only if Py-version path opens.
P4: FRED macro-filter wire-up (FRED_API_KEY to secrets, gate strategies on regime); Kalshi pairwise consensus with Polymarket (pm_consensus_overlay.py sidecar).
P5: Pilot paper-trade real-time variance test only AFTER class state machine reaches SHADOW for all six major classes (Codex all-classes-first).

Open PR triage (24 open as of 2026-05-11 21:00 UTC)

PR	Title	Status	Master-plan action
#904	research orchestrator + edge-stability sidecar	MERGEABLE / CLEAN at `6d7ccd928fd` (per peer chatlog Phase I)	P0 MERGE NOW — unblocks the verdict-grade verdicts this master plan rests on
#903	chore(loop): 2026-05-11 run findings	open	review after #904 merges (docs-only)
#902	feat(b13): per-class regime filter sidecar + quality_gates.py hook — COMPLETE (supersedes #868/#872/#889/#895/#900)	1 failed check	P1 investigate failed check; this is the regime-filter sidecar that gates strategies on regime — directly enables the Codex state-machine REHAB→OOS_READY transition
#901	audit hourly 05Z	open	auto-merge if hourly cron
#898	fix(B15): cross-asset correlation works without numpy	1 failed check	P2 investigate; needed for cross-class verdicts
#893	orphan_resolver_dryrun.py — 1,366 orphan closed_at preview	1 failed check	P1 dry-run preview only; aligns with E-D3 work; investigate failed check
#892	safe_db_archive.py — Hermes rule #1 gate	open	P1 needed for any DB write that touches blocklist updates
#891	mysql_sync entry_time/exit_time fallback — repairs 87% NULL closed_at orphans	2 failed checks	P0 directly addresses orphan rows — part of Codex truth-layer P0; investigate failed checks
#887	WIN_RATE_TRAP_BLACKLIST — 6 crypto traps + 2 equity bombs	1 failed check	P0 CRYPTO + EQUITY draggers — overlap with master-plan P0 #2 #3 #6; verify no duplicate blocks before merge
#885	risk_policy v2 — tighten crypto per-symbol cap 10→5, per-trade 5→3	2 failed checks	P1 CRYPTO de-concentration; investigate failed checks
#884	mysql_sync infer category for NULL/empty rows	open	P0 class-attribution backfill — part of Codex truth-layer
#883	quality_gates swarm-batch-1 source score retunings (5 sources)	open	P1 review
#881	tv-orchestrator LL1 fill-relative TP/SL	open	P2 TradingView paper-trade execution improvement
#879	audit-dashboard Hermes 5-phase enhancements + 5000-round audit corpus	open	P2 peer chatlog notes branch is 3152 commits behind main + 83 ahead, conflicting; rebase or close+cherry-pick selectively
#878	short_engine BULL-regime gate	open	P1 CRYPTO regime filter; complements #902
#877	mysql_sync elite_score backfill	open	P2
#876	mysql_sync pnl_pct anomaly clamp [-100, 200]% — kills forex unit corruption	open	P0 FOREX — see FOREX P0 #2 above; this directly fixes the unit-corruption side of FOREX PF degradation
#873	chore(loop): B13 status	open	docs-only
#862	DB query bank: forex pnl corruption + 50 untested live pairs + JPY-cross 100% losers	open	P1 FOREX investigation evidence; aligns with FOREX P1 deep-dive
#849	Edge action plan + swarm peer-review harness (draft)	draft	upgrade-or-close; this master plan supersedes the action-plan portion
#846	Shadow Probation panel on /audit Overview tab	open	P1 directly supports Codex state-machine SHADOW state visualization
#900 / #895	b13 regime filter earlier iterations	open	close in favor of #902 (the COMPLETE version that supersedes per its title)

Recommended PR merge order (master-plan P0 priority):

#904 (research orchestrator + edge stability) — unblocks verdicts
#876 (FOREX pnl_pct clamp) — fixes FOREX unit corruption before FOREX cap goes in
#891 + #884 (mysql_sync category + closed_at fixes) — truth-layer for Codex's data-trust gate
#887 (WIN_RATE_TRAP_BLACKLIST) — overlaps with P0 #2 + #3; verify no duplicate blocks
#902 (b13 regime filter) — enables Codex REHAB→OOS_READY transition; investigate the 1 failed check first
#892 (safe_db_archive Hermes gate) — gates any subsequent DB write
#893 (orphan_resolver_dryrun) — read-only preview, aligns with E-D3
#846 (Shadow Probation panel) — UI for state-machine visualization
#878 (short_engine BULL regime gate) — CRYPTO regime filter
#885 (risk_policy v2 crypto caps) — CRYPTO de-concentration

Real-money gate (master) — synthesis

Adopt Codex's class state machine as governance scaffold, with Kimi's 7-check Go/No-Go as per-class checklist, Cursor's Tier-2 measurable criteria as the numerical floor, and Copilot's 2-consecutive-weekly as the persistence test.

Class state machine

BLOCKED — any global gate red (stale payload, DB-health red, drift_alert, walk-forward missing)
REHAB — global gates green; class still under quarantine/feature-add
OOS_READY — n ≥ 100 AND Tier-2 AND positive OOS Sharpe AND OOS consistency ≥ 80% AND concentration caps OK
SHADOW — 14-30d forward shadow with realized-vs-expected variance inside tolerance band
LIVE_ELIGIBLE — all six major classes (CRYPTO/EQUITY/ETF/FOREX/COMMODITY/BOND) at SHADOW simultaneously; truth-layer green for 7 consecutive days; user personally approves

Per-class current state (master-plan declaration)

Class	State	Reason
COMMODITY	REHAB	edge-stability STABLE but walk-forward missing + concentration not disclosed
EQUITY	REHAB	edge-stability STABLE but capped-PnL not verified + claude_gainer_st contradiction
CRYPTO	BLOCKED	edge-stability DECAYING + 3 named draggers (kimi_signal_tracking, baby_strats:crypto_soc_*, quan_engine over-share)
FOREX	BLOCKED	edge-stability DECAYING + resolver bug + unit corruption (PR #876)
ETF	REHAB	n at floor; expand universe to clear sample threshold
BOND	BLOCKED	n=12 in verbatim chatlog read — way below n≥100 floor; cannot reach REHAB until sample expanded (multi-month effort)
FUTURES	BLOCKED	silent-dead per memory; need re-emission plan
INDEX	REHAB	insufficient data

Current LIVE_ELIGIBLE count: 0/6. Earliest LIVE_ELIGIBLE target: not before week 8 given REHAB→OOS_READY (2-4 weeks) + SHADOW (2-4 weeks) for the fastest class.

Test plan (per-class + cross-cutting)

Cross-cutting

Re-run money-maker-ready skill on freshest deployed payload after every pipeline change; require class-labeled claims in every report.
Contract tests for readiness.by_class, leaders.by_class, draggers.by_class, concentration/caveat fields (Codex).
Regression test: fail CI if dashboard threshold text diverges from docs/PERFORMANCE_CHARTER.md.
Integration test: red DB-health or active drift forces BLOCKED in state machine.
Shadow-tracking acceptance test: variance ≤ agreed tolerance band before LIVE_ELIGIBLE.
Existing tests verified by peer Phase C: 94/94 unit tests for blocklist enforcement (tests/test_quarantine_unit_blocklist.py); 16 smoke tests for edge stability (tests/test_edge_stability_smoke.py) via peer Phase I commit a9e045a757f.

Per-class

COMMODITY: walk-forward.by_class[COMMODITY].folds > 0; concentration disclosure renders; multi_asset_cot DB-verified PF within 2pp of systems payload.
EQUITY: bottom-symbol pruning improves recent PF without OOS consistency drop < 80%; capped MDD verified; claude_gainer_st blocklist enforcement test (intake + exec gates).
CRYPTO: next fwd_vs_bt_divergence build shows ≤ 2 baby_strats flags (down from 12); kimi_signal_tracking generates 0 new picks in test env; quan_engine volume share ≤ 12% in next 7-day window.
FOREX: PR #876 clamp test passes; OOS Sharpe non-negative on post-fix 30d window before any sizing un-cap.
ETF: universe expansion adds ≥ 3 leveraged-ETF-free new symbols; n reaches 120 within 30d.
BOND: walk-forward.by_class[BOND] emits ≥ 5 folds; promotion only at n ≥ 100.
FUTURES: document why module is silent; either ship re-emission plan or formally retire from /audit.

Peer plan v2 (swarm-revised) — already shipped

Peer Claude-B session pushed plan v2 + production code mid-document. Tracked here so this master plan stays current.

Plan v2 swarm-revised changes (commit `57d267a28e6`)

Phase 4 ↔ Phase 2 reorder (unanimous swarm) — measurement infrastructure before per-class scaling.
NEW Phase 1.5 drift-clearance gate — must hold drift_alert=false for 7 consecutive days before Phase 2 advances.
Walk-forward by_class downgraded to ADVISORY while drift hot — its numbers are themselves drifting in real-time during regime collapse. Phase 1.5.3 wire-in tags cards advisory_only=true; demotions deferred.
3 baseline numbers corrected by red-team:
- EQUITY consistency: 87.5% (not 75% — Cursor's original cite was 12.5pp understated)
- CRYPTO consistency: 68% (not 84% — original was 16pp overstated)
- FOREX consistency: 48.1% (1.8pp correction, ≈ unchanged)
Red-team read at 2026-05-11T21Z snapshot vs claude-b's read at 2026-05-10T04Z snapshot — discrepancy implies walk-forward.by_class IS drifting; argues for snapshotting at known cutoffs not live-reading.
3 risks added to plan v2 risk register:
- Regime-overfit — Tier-1 promotion gate trained on a single high-VIX regime period may collapse when regime shifts.
- No rollback — Opt B Tier-1 demotion has no rollback ledger; if a wrong demotion happens, no easy undo.
- Plan cites non-existent funcs — walk_forward_by_strategy() is PROPOSED-NEW, does NOT exist yet. Master plan must not assume it.

Production code shipped this session (peer)

Commit	Scope	File(s)
`cf4e924744a`	Opt B walk-forward Tier-1 promotion gate (consistency≥60 + sharpe>0). FOREX blocked from T1 per current data	`audit_trail/dashboard_generator.py`
`cf229ea31ba`	W4 benchmark-relative trailing-30d return per system (primary_asset_class / pnl_30d_pct / trades_30d / benchmark_30d_pct / excess_return_30d_pct)	`audit_trail/dashboard_generator.py`
`4ea32d227cf`	Opt A TA-baseline panel on /audit (6-strategy benchmark cards per class) + `_load_latest_ta_baseline()` + `renderTaBaseline()` + nav hook	`audit_trail/dashboard_generator.py` + `audit_dashboard/template.html`
`82a34bc0fdb`	`tools/live_market_fetcher.py` foundation (yfinance VIX/DXY/BTC/ETH/SPY/QQQ/GLD/TLT/oil + regime classifier + 1h cache)	new file
`5e4bc1efe63`	Block A fix (freebuff INDEX collision: ASSET_CLASSES `"INDEX"` → `"INDEX_STOCK"` + defense-in-depth in `write_index()` + delete stale INDEX.json + 9-class regen) + Phase 1.5.3 drift-advisory wire-in (Opt B re-tier loop now reads `concept_drift.drift_alert`; advisory_only=true when hot, demotions deferred)	`tools/edge/edge_stability.py` + `audit_trail/dashboard_generator.py`
`f740ace5c34`	CLAUDE2.MD A9-A13 + concept_drift root cause report + 37h quarantine verification (0 of 60 active picks match 30 quarantined pairs)	docs + report

Concept-drift root cause (peer T4 verdict)

VIX -44.64% / 30d real regime collapse since 2026-04-22 — confirmed not pipeline noise. KS_D 0.31 vs 0.047 critical. Source: reports/concept_drift_root_cause_2026-05-11.md. This validates Codex's "fix truth layer first" position: drift is real and regime-driven, not a metrics artifact.

Master plan ripple-effects

Re-rank P1 #3 (drift detector fix): KS_D is computed; the drift is REAL. Action shifts from "fix KS_D uncomputed-zero" to "wire 7-consecutive-day drift_alert=false check before Phase 2 advances" (Phase 1.5.1).
De-rank walk-forward.by_class trust while drift hot. Per-class promotion gates above should use edge_stability sidecar as primary verdict, walk-forward.by_class as advisory.
Add Phase 1.5.1 + 4.1 + 4.2 to P1:
- 1.5.1: 7-consecutive-day drift_alert=false history check (needs daily snapshot scaffolding)
- 4.1: DRIFT_AUTO_PAUSE_DRY_RUN=1 env-gate for 7d "would-pause" logging
- 4.2: class_capital_gate(asset_class) → (allow_size, max_pct, reason) with capital_gate_log.jsonl trail
EQUITY edge stronger than thought: consistency 87.5% (not 75%). Strengthens EQUITY's case for first promotion.
CRYPTO edge weaker than thought: consistency 68% (not 84%). CRYPTO promotion gate now requires consistency lift of ≥ 12pp post-quarantine, not 0pp.

Buffy enhancements (deepseek-v4-pro, PENDING) — queue into P1

Buffy session at docs/chatlogs/chatlog_2026-05-11_buffy_review.md + docs/chatlogs/progress_2026-05-11_buffy_enhancements.md. Code review of Opt-A / Opt-B / W4 found 3 issues + 4 enhancement opportunities. All status PENDING (not yet committed).

ID	Enhancement	Problem (verified by buffy code review)	Master-plan ranking
E1	Add FOREX benchmark (DXY) to `benchmark_return()`	`live_market_fetcher.py:155-162` `benchmark_map` omits FOREX. FOREX systems (n=1801, PF 0.27) get `benchmark_30d_pct=None` in dashboard despite DXY already fetched as `"DX-Y.NYB"`. One-line dict-add.	P1 FOREX — directly supports FOREX deep-dive (master plan FOREX P1 action #3)
E2	Cache drift-pause check with 60s TTL	`quality_gates.py:4143 _drift_pause_active()` reads `dashboard_data.json` from disk on every `passes_active_gate()` call. 60+ active picks = 60+ disk hits/cycle. Synchronous I/O in hot path.	P1 perf — gates the eventual drift-pause flip; performance matters when active list grows
E4	Excess return alert (< -5%)	W4 code at `dashboard_generator.py:12700-12747` computes `excess_return_30d_pct` per system. Data exists; no monitor. Addresses Step 9 of 10-investigations plan. New `w4_alerts` key in `dashboard_data.json`.	P1 monitoring — early warning system per class, complements edge-stability sidecar
E5	DRIFT staging dry-run mode	`DRIFT_AUTO_PAUSE_ENABLED` is binary (0=advisory, 1=hard pause). No staging. Per CLAUDE2.MD: "Don't flip without staging-first per swarm consensus." Add `DRIFT_STAGING_MODE=1` env var that logs would-block picks without actually blocking.	P1 safety — prerequisite for ever flipping `DRIFT_AUTO_PAUSE_ENABLED=1`; aligns with peer T1 contingent task

Buffy methodology note: each enhancement follows before/change/after/verify/evidence pattern with line-range citations + swarm cross-check round + typecheck + code review log. Three rows in tracking tables currently empty — work hasn't started yet.

Issues found but not yet enhanced (queue):

benchmark_return('FOREX') returns None → E1 fix
Per-pick disk I/O on hot path → E2 fix
No staging mode for drift pause → E5 fix

Action: Buffy to proceed with E1/E2/E4/E5. Each lands as separate small PR with co-author + commit message citing buffy-progress file. Expected to clear the buffy queue within 1-2 days.

Peer remaining backlog — incorporated into master plan

Per peer reports/session_summary_2026-05-11.md:

P0 (immediate, blocking)

Confirm Block A INDEX→INDEX_STOCK fix didn't break peer's PR #904 — verify before next force-push window.
Verify drift_alert precondition runs in production — next hourly .github/workflows/audit-dashboard.yml regen check tier2_proven_strategies.cards[*].walkforward_gate.advisory_only.

P1 (this week)

Phase 1.5.1 — 7-consecutive-day drift_alert=false history check (needs daily snapshot scaffolding).
Phase 2.1 — extend alpha_engine/walkforward_validator.py to add COMMODITY + BOND + ETF to by_class output (currently 4 of 7 classes).
Phase 2.2 — implement walk_forward_by_strategy() (PROPOSED-NEW; does NOT exist yet). Pairs with peer's per-strategy edge_stability table.
Phase 4.1 — env-gate DRIFT_AUTO_PAUSE_DRY_RUN=1; 7d "would-pause" logging. Buffy E5 overlaps.
Phase 4.2 — class_capital_gate(asset_class) with capital_gate_log.jsonl. PROPOSED-NEW.

P2 (next 2 weeks)

Phase 5 Wave 1 (#1 + #2) — rolling-window profiling at 7/30/90/365/1095d + edge-decay heatmap.
Phase 5 Wave 2 (#3 + #5) — cross-symbol std-dev block + regime tag persistence on closed picks (closes regime_validation.regime_wr_breakdown all-zero rows).
Phase 5 Wave 3 (#7) — top-N portfolio Monte Carlo simulator (settles concentration debate before any real-money sizing — directly mitigates multi_asset_cot PF 19.19 / n=130 outlier risk).

Contingent (user-gated)

T1 — Flip DRIFT_AUTO_PAUSE_ENABLED=1 (HIGH risk; Phase 4.1 must be clean for 7d first).
T2 — Mutation autopsy on 14 quarantined strategies (LOW risk, read-only; due 2026-05-17, 7d post-quarantine).
T3 — Verify multi_asset_cot PF 19.19 via MySQL (LOW risk; needs DB_STOCKS_PASSWORD). Master plan P0 #5.

Areas worth further investigation (from peer summary)

Walk-forward by_class drifting in real-time during regime collapse — argues for snapshotting at known cutoffs not live-reading.
Cerebras rolling KS-D 30d monitor + xai stress-test layer — both queued for Phase 5; needs swarm round to choose before drift-pause flip.
INDEX_STOCK class is empty (n=0) — remove or scaffolding? Investigate whether any signal generator emits asset_class="INDEX_STOCK".
Shared .git peer-collision risk — 4+ silent branch auto-switches observed. Recommend isolated worktrees (git worktree add) protocol per agent.
Mixed commit ownership — peer's git add -A swept other agents' files. Add project rule: explicit git add <path> only.
dashboard_data.json local staleness — pre-Phase-2 protocol: git checkout origin/main -- audit_dashboard/data/dashboard_data.json to refresh just that file before any planning read.
No backtest-cutoff disclosure on /audit — some systems show last_signal_at stale 2.5mo (e.g. ml_crypto_pred_v12) but cumulative WR/PF still surfaced. Misleading. Phase 4.3 DB-lineage card needs to include per-system cutoff date.
Goal #2 (sports) + Goal #3 (events) untouched this session. Worth a parallel measurement-infra gap check before next session.

✓ Wave 1 SHIPPED — circuit-breaker HALT state removed

Commit 81bd0b86388 on main 2026-05-11 21:30Z. Removed alpha_engine/data/circuit_breaker_state.json (HALT, max_picks=0, 48d stale from 2026-03-24).

Verification PASSED: direct query against ejaguiar1_stocks::bt_backtest_trades shows MAX(imported_at) WHERE status IN ('WON','LOST') = 2026-05-11 20:00:59 with n=1,819,839. Forward validator unfrozen; WON/LOST writes resumed within 1 hour. Wave 1.5 independent fixes (lm_signals expire-cron, signal_tier writer, at_consensus_picks time-travel) still queued per kilo carveout.

Subsequent shipped commits

SHA	Change
`4a2d337a5dc`	P0 #2 + #3 + #4 — blacklist `kimi_signal_tracking` + 3 named `crypto_soc_*` draggers to BLOCKED_ASSET_STRATEGY_PAIRS + raise elite-score floors (FOREX 50→70, COMMODITY 50→65, EQUITY 50→60)

⚠ Kimi swarm audit (4-agent, 2026-05-11) — RAW-DB read contradicts dashboard verdicts

Archive: reports/kimi_edge_audit_2026-05-11/ (32 files — comprehensive_analysis_report.md, edge_audit_report.md, industry_standards_research.md, 7 rolling PNGs, 7 CSVs).

Kimi top-line claim: "4x inflated dashboard vs RAW DB"

Metric	Dashboard (post-filter)	Kimi RAW DB
Win Rate (all classes)	34-43%	11.13% (6,178 / 55,510)
Total PnL	+949% claimed	-3.56% avg/trade × 55,510 = -197,487%
Sharpe Ratio	varies	-2.34 annualized
Profit Factor	1.49	0.46
ML accuracy	—	32.6% (worse than coin flip)
Calibration	—	96% conf → 0.9% actual WR (INVERTED)
Backtest vs Live (CRYPTO)	—	42.4% BT WR / 11.3% live → -31.3pp gap

Reconciliation: same DB, different filters

The gap is real but BOTH reads are technically correct:

Kimi RAW reads include all 55,510 closed picks in at_raw_picks / trading_picks — including ghost rows, pre-resolver-v2 (Apr-27 fix) noise, dead-strategy artifacts, post-Wave-1 phantom EXPIRED rows. This is the pessimistic floor.
Dashboard reads use asset_class_health (post-resolver-v2 filtered, CLAUDE.md verdict-grade) — Cursor reads n=7,875 CRYPTO vs Kimi n=51,049. The 7x difference is the resolver-v2 + ghost-quarantine filtering.
Peer edge-stability sidecar reads further filtered (only currently-active sources, rolling window) — n=1,521 CRYPTO PF 1.39.

Master-plan adjustment: the dashboard "STABLE_EDGE" verdicts on COMMODITY + EQUITY still hold for the currently-active sleeve. But Kimi's WR 11.1% / Sharpe -2.34 RAW floor must be disclosed on /audit alongside the post-filter numbers — currently it isn't, which is the "4x inflation" Kimi flagged. Codex's readiness.by_class payload contract (P1) is the right place to surface both reads side-by-side.

Kimi findings that ARE net-new (not in current plan)

ML calibration system-wide inverted (not just ETF/CRYPTO per memory). 96% confidence → 0.9% actual WR. Promote calibration verification from P2 → P0 #9. Memory project_performance_reality confirmed it on ETF/CRYPTO subset — Kimi extends to all classes.
alpha_engine/anti_overfit_validator.py EXISTS but orphan. 13,886 bytes, last modified 2026-05-02. Contains CPCV / PBO / DSR code per Kimi inspection. P1 wire-up per CLAUDE.md Wire-Up Rule: production caller in calculate_smart_score / passes_smart_gate / score_pick. Once wired, automatically rejects strategies with PBO > 0.05 — the most critical defense against the 26-baby_strats-overfit pattern.
Empty model files: random_forest.py, xgboost_model.py, lstm_model.py are 14-byte placeholders per Kimi. P2 verification — confirm + remove or implement.
paper_trades table is EMPTY. Codex's SHADOW state requires 14-30d forward shadow tracking. Without paper_trades populated, SHADOW cannot be measured. P1 add — wire paper-trade variance recorder before any class can reach SHADOW.
Concentration risk: USDCHF=X = 215% of total PnL impact; INJUSDT = 89% of alpha_engine PnL. USDCHF=X almost certainly the FOREX unit-corruption bug (PR #876 pnl_pct clamp [-100, 200]%). P0 dependency — merge #876 before any FOREX verdict.
Day-of-week pattern (CRYPTO): Tuesday -3.37%, Friday -9.42%, Wednesday -13.85%, Monday -16.92%. Wednesday + Monday are worst. Future enhancement: time-of-day / day-of-week gate.
EQUITY t-test p=0.115 closest to significance (Sharpe +0.67) but RAW WR 1.84% (15 wins / 814 picks). Means the EQUITY POST-FILTER numbers (Cursor 54%, peer 57.4%) come from a small sliver of "good" strategies — the other 799 unfiltered picks are noise. Filter-criteria audit needed before EQUITY promotion. P0 #10.

Kimi DSR / PBO / WFE framework — adopt for real-money gate

Metric	Minimum	Source
Deflated Sharpe Ratio (DSR)	> 0.95	Lopez de Prado AFML
Probability of Backtest Overfitting (PBO)	< 0.05	Bailey + Lopez de Prado
Walk-Forward Efficiency (WFE)	> 60%	Pardo
Min Track Record Length	> 2 years	AFML
Live Sharpe Ratio	> 0.5	Industry standard
Max Drawdown	< 20%	Charter Tier 2
Win Rate	> 50%	Codex T2 (or PF>1.5 substitute)

Adopt Kimi's 10-step validation pipeline as the real-money gate before LIVE_ELIGIBLE: pre-register hypothesis → in-sample → WFA → CPCV → DSR → structural break tests → sensitivity analysis → transaction cost analysis → 3-6mo paper trading → graduated deployment (5% → 25% → 100%).

Net: Kimi's verdict is harsher than master plan but doesn't invalidate the COMMODITY/EQUITY STABLE_EDGE finding for the FILTERED sleeve. It does add 3 new P0 actions (calibration verify, EQUITY filter-criteria audit, PR #876 dependency surfaced) + 1 P1 (anti_overfit_validator wire-up) + a stricter real-money gate (10-step / DSR / PBO / WFE).

▶ /audit Decay Alerts — Action Required (2026-05-12)

14 HIGH rolling-7d WR drop alerts (>20pp baseline decay) + 3 MEDIUM staleness alerts surfaced on /audit. Per CLAUDE.md MUTATION_THREE_AXIS_PROTOCOL: mutation-before-kill applies; REDUCE-not-BLOCK is the soft action.

Sev	Strategy	7d WR vs baseline	Status this session	Master-plan action
HIGH	`myfxbook_retail_contrarian`	19% vs 46% (-27pp)	BLOCKED commit a64e80e70d1	Done
HIGH	`forex_rsi2_mean_reversion`	10% vs 44% (-34pp)	BLOCKED via fx_kill_switch (commit a64e80e70d1)	Done
HIGH	`cta_cross_asset_tsmom`	28% vs 46% (-18pp)	Open	P1 Add to BLOCKED_ASSET_STRATEGY_PAIRS (FUTURES) — alert is class-level drag
HIGH	`ig_contrarian_sentiment`	20% vs 45% (-25pp)	Open	P1 CRYPTO sentiment-contrarian; quarantine pending mutation-axis analysis
HIGH	`futures_momentum`	4% vs 42% (-38pp)	Open	P0 Largest drop. FUTURES is BLOCKED per master plan — add to BLOCKED_ASSET_STRATEGY_PAIRS (FUTURES) outright
HIGH	`st_multi_day_momentum`	47% vs 68% (-21pp)	Open	P1 Soft-demote — still positive Sharpe likely; reduce sizing 50%
HIGH	`macd_rsi_m048`	53% vs 73% (-20pp)	Open	P1 Boundary; monitor 7d before action
HIGH	`ema_momentum_m006`	36% vs 56% (-20pp)	Open	P1 Below 50% sub-floor; quarantine
HIGH	`hs_lb_None`	0% vs 34% (-34pp)	Open	P0 0% recent WR = total decay. `hs_lb_None` = head-shoulders + None-leverage suffix = likely a parsing bug emitting null variants. Investigate before block.
HIGH	`crypto_rsi_whaleconfirmed_v1`	18% vs 55% (-37pp)	Open	P1 CRYPTO momentum-with-whale-confirm; large drop suggests whale signal degraded
HIGH	`keltner_compression_expansion_eth_v1`	29% vs 51% (-22pp)	Open	P1 ETH-specific Keltner; consider symbol-axis mutation per MUTATION_THREE_AXIS_PROTOCOL
HIGH	`vwap_deviation_reversion_sol_v1`	27% vs 47% (-20pp)	Open	P1 SOL-specific VWAP; similar mutation analysis
HIGH	`MeanReversionBB`	25% vs 60% (-35pp)	Open	P0 Large baseline-to-recent drop; BB mean-reversion likely regime-broken (drift_alert TRUE confirms)
HIGH	`claude_ml_moderate_mut`	42% vs 68% (-26pp)	Open	P1 Mutation-axis name suggests genetic-evolved variant; check anti_overfit_audit.json for DSR before action
MED	`copy_trader_clones`	silent 93h	Open	P2 Likely Wave 1 will unblock — monitor 24h post-Wave-1 then escalate
MED	`stocksunify2`	silent 96h	Open	P2 Same — monitor
MED	`kimi_live_signals`	silent 98h	Open	P2 Distinct from blacklisted `kimi_signal_tracking`; verify if intentionally stopped (peer chatlog signal_tier writer pause) or accidental

Recommended next-session work

2 P0 hard-blocks: futures_momentum (FUTURES) + MeanReversionBB (regime-driven decay) into BLOCKED_ASSET_STRATEGY_PAIRS. hs_lb_None investigation precedes block.
1 P1 soft-demote framework: add DECAY_ALERT_REDUCE set to audit_trail/quality_gates.py; deduct 8 pts from calculate_smart_score for any strategy in the set. Cleaner than hard-block since these may rehab once drift clears.
3 MED monitor: revisit at 2026-05-13 12:00Z (24h post-Wave-1) — silence likely resolves via outcome_resolver backlog flush.

▶ HIGH CONVICTION filter audit — 2026-05-12 vs session data

Current /audit HC panel shows per-class thresholds from 2026-04-15 data. Re-verified against session work (Kimi RAW DB read, anti-overfit DSR sidecar, P0 #10 EQUITY filter trace).

Class	Current HC verdict	Session data check	Action required
CRYPTO	EDGE — FWD WR≥60% + Score≥55 + Trust≥4 → "WR 60.3% on N=562 (+9.7pp)"	Anti-overfit DSR sidecar (post-Wave-1): 4 EDGE_LIKELY_REAL ml_enhanced sleeves (INJUSDT/FETUSDT/DYDXUSDT/RENDERUSDT 1d+1h variants). 33 OVERFIT_LIKELY in same family (mostly _15m_*).	P1 Add anti-overfit gate: HC pass requires strategy IN `anti_overfit_audit.json::strategies WHERE verdict='EDGE_LIKELY_REAL'`. Auto-rejects the 33 OVERFIT_LIKELY sleeves even if FWD WR ≥ 60% passes.
EQUITY	EDGE — FWD WR≥55% + Score≥50 + Trust≥5 → "WR 68.1% on N=72 (+29pp)"	P0 #10 verify: dashboard 54% WR is honest for tagged-EQUITY subset. `stocks_rsi2_pullback` (n=70, WR 62.9%, avg +0.78%) is the real EQUITY edge sleeve. n=72 HC cohort likely overlaps this. Note: n=72 still below master-plan n≥100 charter floor.	P2 Verify HC cohort overlap with `stocks_rsi2_pullback`. Either rename verdict to "EDGE (thin n=72)" or widen FWD WR floor to admit more samples.
FOREX	EDGE — FWD WR≥45% + Score≥50 + Trust≥5 → "WR 65.8% on N=73"	RED FLAG. Anti-overfit DSR sidecar has ZERO FOREX EDGE_LIKELY_REAL. Kimi RAW FOREX WR 9.9% n=605, PF 0.28 (stat-significantly LOSING). Master plan FOREX state = `BLOCKED` per Codex state machine + elite_score floor raised to 70 (commit 4a2d337a5dc) + 3 toxic strategies re-blocked (commit a64e80e70d1). Showing "EDGE" with N=73 sample is small-sample artifact masking systematic decay.	P0 Downgrade FOREX HC verdict from EDGE to BLOCKED or DEAD. The "WR 65.8% on N=73" is statistically vulnerable; bootstrap CI almost certainly straddles 0.50. Cite master plan FOREX BLOCKED state. This is the most dangerous current verdict on the /audit page.
COMMODITY	WEAK — Trust≥5 → "PF 1.28 on n=273"	DSR sidecar: `cot_positioning` n=104, WR 86.5%, Sharpe +1.377, DSR=1.0000 (highest of any strategy). Antigravity audit confirmed `cot_positioning_CT_locked` LONG = 89.8% WR / PF 13.1 (n=49). COMMODITY aggregate is mediocre because it includes non-COT strategies dragging the average; the COT sleeve specifically has REAL edge.	P1 Add COMMODITY HC carve-out: strategy IN (`cot_positioning`, `cftc_cot_commercial_signal`, `cot_positioning_CT_locked`) AND elite_score ≥ 65. Mark as EDGE via that filter. Class aggregate stays WEAK without the carve-out — that's correct.
BOND	NO DATA — n=8	Kimi BOND: n=18 PF 1.72 WR 55.6%. Sub-floor (n<100 master plan charter).	P2 Update n from 8 → 18 (live count) + note "needs n≥100 multi-month accumulation per Wave 1 unfreeze." No filter change.
ETF	DEAD — PF 0.28 n=19	Kimi ETF: n=88-100 / WR 53.4% / PF 1.20. Master plan ETF state = REHAB (not DEAD). Class is at sample floor, not dead. Cleanest OOS profile per Codex.	P1 Reclassify ETF from DEAD → THIN_REHAB. Update text "PF 0.28 n=19" → "PF 1.20 n=88-100, needs n≥150 for OOS_READY promotion (Codex state machine)". The DEAD verdict is stale + miscalibrated.
FUTURES	DEAD — WR 5.9% n=17	Per memory `project_futures_kill_without_replacement`: silent-dead, 5.9% WR / -96% PnL post-2-kills. Master plan FUTURES state = BLOCKED. Verdict accurate.	P3 Keep DEAD verdict; flag for re-emission plan or formal retire from /audit per master plan FUTURES section.

Summary — required HC filter updates

P0 most urgent: FOREX downgrade from "EDGE" to BLOCKED/DEAD. The current EDGE claim is the single most misleading verdict on /audit and could lead a user to size into a sub-floor (PF 0.28) class.
P1: ETF reclassify DEAD → THIN_REHAB (verdict was based on stale n=19 data).
P1: COMMODITY carve-out — admit cot_positioning family as EDGE while keeping class aggregate WEAK.
P1: CRYPTO + EQUITY + COMMODITY add anti-overfit DSR gate (require DSR ≥ 0.95 per anti_overfit_audit.json).
P2: Bump "2026-04-15" refresh date to "2026-05-12" once gates rewired.
P2: Refresh BOND n=8 → 18 (live).
P2: Display DSR verdict-counts inline (e.g. "CRYPTO: 4 EDGE_LIKELY_REAL / 33 OVERFIT_LIKELY") sourced from anti_overfit_audit.json::verdict_counts.

Implementation surface: audit_dashboard/hc_filter.js (per-asset-class HC gates per CLAUDE.md) + audit_dashboard/template.html:~1203-1300 (HC overlay text). Each verdict update is a small JSON-config tweak per config/hc_thresholds.json if it exists.

▶ DB Health red-tier — remediation status (refreshed 2026-05-12 03:30Z)

Per-metric progress on the 6 red-tier items in the DB Health — 2026-05-08T15:00Z panel. "Action required" commentary added inline. Updated 2026-05-12 03:30Z with sign-coherence guard, ghost-row triple-axis block, and CI commit-list fix.

Metric	Original value	Action shipped this session	Expected post-fix	Status
Forward Validator Freshness	840h since last WON/LOST (2026-04-02)	Wave 1 commit `81bd0b86388` — rm `alpha_engine/data/circuit_breaker_state.json` (HALT max_picks=0 from 2026-03-24)	VERIFIED: `bt_backtest_trades.MAX(imported_at) = 2026-05-11 20:00:59` (n=1.8M)	RESOLVED
WON-vs-PnL contradiction	YES (avg pnl per status — writer bug)	2026-05-12 03:00Z — direct writer fix: commit `22b677c1167` adds sign-coherence guard to both atomic status+pnl writers — `alpha_engine/outcome_resolver.py:1670` (resolver path) and `audit_trail/mysql_client.py:628` (`mysql_close_trade` canonical write). When source supplies `exit_reason=TP` + `pnl_pct < 0` (or SL + pnl > 0) the guard now trusts the pnl sign and logs `won_pnl_contradiction:` WARNING. Plus earlier confidence-normalizer migration (`613c65cb`, all 9/9 callsites).	Stops new contradiction rows; existing contradicted rows still in DB until backfill SQL pass.	PARTIAL (forward-fixed)
PnL Integrity (sampled)	42.0% (58k/100k mismatch >1pp)	PR #876 merged `818ff966222` — writer-side clamp [-100, 200]% in `mysql_trading_sync.py` (kills USDCHF=X -106,700% outlier); P0 #7 `1c535a19105` read-side clamp at `dashboard_generator.py:9309` max_dd cumulation	Future rows clamped; legacy poisoned rows still in DB until backfill	PARTIAL (FORWARD-FIXED)
Phantom EXPIRED rows	100.0% (1 class, worst-case)	Wave 1 unfreeze + PR #891 merged `486f7bf2989` — mysql_sync entry_time/exit_time fallback (repairs 87% NULL closed_at orphans on future syncs)	Resolver lag still present; expected reduction over next 3-7 cron cycles; Wave 1.5a/b/c (lm_signals + signal_tier + at_consensus_picks) still queued	PARTIAL
Raw-Pick Outcome Coverage	0.09% (121/136,374 resolved)	2026-05-12 03:00Z root cause confirmed: Resolver itself is correct — it correctly returns 0 because all 8,151 entries in `closed_picks.json` already have terminal status (per Investigator B). The real bug is upstream: NO writer reads ACTIVE rows from `at_raw_picks`, detects TP/SL/time-exit, and feeds new entries into `closed_picks.json`. Existing references in `alpha_engine/outcome_resolver.py:1931`, `:2317`, `crypto_risk_gates.py:179`, `scanner.py:4781` all RE-write the same 8,151 entries. `mysql_client.py:601 mysql_close_trade()` exists but has no caller from TP/SL detection. Independent of Wave 1 unfreeze.	Needs a new `sync_active_mysql_picks_to_json()` that reads ACTIVE at_raw_picks, computes per-class TP/SL hit logic, writes terminal entries to closed_picks.json + back to at_raw_picks. Queued as P0 follow-up.	DIAGNOSED — IMPLEMENTATION QUEUED
Ghost Rows (constant pnl_pct)	655,000 (18 cohorts, n>1000, distinct<5)	2026-05-12 03:15Z — symbol-axis quarantine shipped: commit `597819d79c7` introduces `BLOCKED_ASSET_STRATEGY_SYMBOL_TRIPLES` in `audit_trail/quality_gates.py` with 5 documented cohorts: (CRYPTO, quan_engine, MATICUSDT) 215k rows, (CRYPTO, KIMI_signal_tracker, ETHUSDT/BTCUSDT) multi-bucket, (CRYPTO, irb_hoffman, ADAUSDT), (CRYPTO, funding_rate_carry, ROBOUSDT). Enforced at `passes_active_gate` (kills new emissions) AND `dashboard_generator.py::_is_historical_blocked_pick` (excludes from historical aggregates). meta_strategy 1.6M-row template family deferred until `db_health.json::ghost_rows.top_cohorts` repopulates (currently `[]` on the 2026-05-08 snapshot).	Expected ~220k+ rows excluded from CRYPTO aggregates on next generator run; total 655k → ~440k. Remaining ~430k = meta_strategy family + small long-tail.	PARTIAL (5 of ~18 cohorts blocked)

Action required (master-plan commentary — refreshed 2026-05-12 03:30Z)

Resolved (1): Forward Validator Freshness — full fix via Wave 1 unfreeze.
Partial / forward-fixed (4): WON-vs-PnL (sign-coherence guard 22b677c1167), PnL Integrity (PR #876 + P0 #7), Phantom EXPIRED (PR #891), Ghost Rows (triple-axis quarantine 597819d79c7, 5 cohorts ~220k rows). Forward writes guarded; legacy rows still in DB until a backfill SQL pass.
Diagnosed / implementation queued (1): Raw-Pick Outcome Coverage — root cause confirmed (missing upstream sync_active_mysql_picks_to_json()). Needs a new bridge that reads ACTIVE at_raw_picks, detects TP/SL/time-exit per asset class, and writes terminal entries into closed_picks.json + back into at_raw_picks. Promoted to P0 follow-up — the resolver itself works; it just has an empty queue forever.
CI plumbing fix (also 2026-05-12 03:30Z): commit d317560ac9c adds audit_dashboard/data/db_health.json + audit_dashboard/data/cot_paper_pilot_status.json to the workflow commit-list (.github/workflows/audit-dashboard.yml:600). Both files were being FTP-deployed each cycle but never committed to git, so origin/main stayed 2026-05-08T15:00Z for 4 days. Next hourly cron will commit refreshed metrics — this DB Health panel will finally reflect post-fix state instead of the stale snapshot.

Recommended next checkpoint (post-fix re-snapshot)

Re-pull db_health.json at 2026-05-12 06:00Z after 2-3 hourly cron cycles. Expected deltas if fixes hold:

Generated_at timestamp: 2026-05-08T15:00:00Z → current hour (CI commit-list fix landed)
Forward Validator Freshness: 840h → <6h (RESOLVED)
WON-vs-PnL contradiction: YES → likely still YES on first snapshot (legacy rows) but new resolver logs will show won_pnl_contradiction: WARNING count = 0 after one cycle if guard is hot
Phantom EXPIRED: 100% → <80% (still high until Wave 1.5)
PnL Integrity: 42% → 45-50% (forward writes only; legacy unchanged)
Ghost Rows: 655k → ~440k (5 cohorts ~220k filtered from CRYPTO bucket)
Raw-Pick Outcome Coverage: 0.09% → 0.09% (UNCHANGED until sync_active_mysql_picks_to_json ships)

⚠ DB Health red-tier crisis — original P0 remediation plan (Wave 0.5 → 4)

Live /audit DB Health panel (snapshot 2026-05-08T15Z but values persisted) shows 6/6 red metrics. This is the truth layer Codex says must be fixed first. Remediation already documented at reports/db_evidence_graded_final_2026-05-08.md (222 lines).

Metric	Value	Severity	Root cause
PnL Integrity (sampled)	42.0% (58k / 100k mismatch >1pp)	RED	resolver stalled — F1 cascade
Ghost Rows (constant pnl_pct)	655,000 (18 cohorts n>1000, distinct_entries<5)	RED	F2 — synthetic stamping in writer
Forward Validator Freshness	840h since last WON/LOST (2026-04-02 12:00)	RED	F1 — circuit_breaker_state.json HALT persisted from 2026-03-24, MAX_ACTIVE_PICKS=0 chokes validator
Phantom EXPIRED rows	100.0% (1 class, worst-case)	RED	F3 race condition — resolver doesn't run before expire-cron
Raw-Pick Outcome Coverage	0.09% (121 / 136,374 resolved)	RED	downstream of F1 cascade
WON-vs-PnL contradiction	YES (avg pnl per status — writer bug)	RED	F5 confidence inversion — STRONGEST evidence (kilo + deepseek confirm)

Wave 0.5 — Pre-deploy verification (5 min, READ-ONLY)

Confirm alpha_engine/data/circuit_breaker_state.json is git-committed (last touched commit fa9b6b38109 2026-03-28).
Run staleness SQL: SELECT MAX(imported_at) FROM bt_backtest_trades WHERE status IN ('WON','LOST') — expect 2026-04-02.
Gating-check grep: grep -n "circuit_breaker\|is_locked" alpha_engine/{forward_validator,outcome_resolver,production_scanner}.py.
Survivor signal_tier check: SELECT signal_tier, MIN(ts), MAX(ts) FROM at_discord_notifications WHERE signal_tier IS NOT NULL GROUP BY signal_tier.

Wave 1 — Unfreeze (5 min)

rm alpha_engine/data/circuit_breaker_state.json
Commit: fix(circuit-breaker): rm 2026-03-24 stale HALT state — unfreezes forward_validator (35d freeze)
Push to main; watch one cycle of .github/workflows/audit-dashboard.yml (hourly cron).
Verify: MAX(imported_at) for WON/LOST advances past 2026-05-11.

Precedent: PR #497 fixed R3 stale-state on 2026-04-27 (same class of bug). Same 2026-03-24 leak still live — referenced in freeze_2026_04_02_root_cause_2026-05-08.md:115. Memory ref: feedback_circuit_breaker_stale_state_leak.

Wave 1.5 — Independent pipeline checks (~30 min each)

Kilo carveout: lm_signals + at_consensus_picks + at_discord_notifications fail independently — NOT auto-fixed by circuit-breaker deletion. Each has its own cron.

lm_signals expire-cron: exit_price=0 in 96.2% of expire-cron rows (F10) — patch cron to skip expire-write when exit_price unset.
at_discord_notifications.signal_tier: 99.99% NULL (F8) — locate writer; backfill schema.
at_consensus_picks time-travel: 57.3% rows with future-dated entries (F4) — writer guard against future timestamps.

Wave 2-4 — Schema + route fixes

STORED terminal_outcome column to eliminate WON-vs-PnL contradictions at read time.
EQUITY route rewiring → challenge_200_trades (per remediation doc).
Drop phantom row sets (after Wave 1+1.5 stable for 7d).
Ghost-row cohort (655k rows, 18 cohorts) audit: identify the constant-pnl_pct writer; either repair or hard-drop.

Tie-in to SUPREME EDGE ENHANCEMENT

Validates Codex truth-layer-first: all 6 red metrics are data-trust failures, not alpha failures. Cannot promote any class to OOS_READY until at least Wave 1 + 1.5 clear.
Corrupts walk-forward.by_class: resolver stall → stale closed_picks → frozen algorithm_rolling_perf → walk-forward sees pre-2026-04-02 data only. Master plan already de-ranked walk-forward to advisory; this explains why.
Existing PRs adjacent: #876 (pnl_pct clamp), #862 (DB query bank), #891 (mysql_sync closed_at fallback), #884 (category infer), #892 (safe_db_archive Hermes gate). Merge #876 + #891 + #884 + #892 before Wave 2 to lock in safety surface.
Promotes DB Health to a master-plan P0: insert as P0 #0 (highest priority — precedes the existing 8-item P0 cluster).

Verbatim chatlog cross-check (peer `docs/chatlog_verbatim_2026-05-11.md`, commit `77f42fa5c3e`)

Subagent scan of peer's 526-line verbatim chatlog identified 3 mandatory blockers + 1 class downgrade:

Phase reorder mandatory (swarm-unanimous): Phase 4 (measurement infrastructure) MUST precede Phase 2 (fast-track classes). Already captured in §"Peer plan v2" above; reaffirmed here. No per-class scaling until measurement gates exist.
Phase 1.5 drift-clearance gate: require drift_alert=false for 7 consecutive days before Phase 2 kickoff. Treat walk-forward as advisory-only while drift hot. Halt + recalibrate if drift persists > 30d.
BOND state downgrade: verbatim chatlog reads n=12 (not n=11 from Cursor or n=18 from Kimi). Either way well below n≥100 floor. Master plan BOND state changed from REHAB → BLOCKED above.
INDEX_STOCK class rename adopted: 8 classes now (CRYPTO/EQUITY/COMMODITY/FOREX/ETF/BOND/FUTURES/INDEX_STOCK). Peer Block A commit 5e4bc1efe63 ships the rename + defense-in-depth.

COMMODITY / EQUITY — STABLE_EDGE but NOT auto-promoted to OOS_READY

Verbatim chatlog edge-stability verdicts (COMMODITY STABLE_EDGE n=167, EQUITY STABLE_EDGE n=272) do NOT auto-promote these classes to OOS_READY. Per Codex truth-layer-first policy + master-plan promotion gates, these classes hold at REHAB until:

COMMODITY: multi_asset_cot PF=19.19 DB-verified + CT=F/KC=F concentration disclosed + walk-forward.by_class emits real folds
EQUITY: claude_gainer_st winner-vs-blacklist contradiction reconciled + capped-vs-raw PnL gap verified + bottom-symbol pruning lands without OOS consistency < 80%

edge-stability sidecar verdict alone is necessary but NOT sufficient for promotion.

User policy quotes (from verbatim chatlog)

Commit ownership protocol (line 211 of verbatim chatlog): "wait so daily_ideas.md is being written by the other agent, check in with them before changing that" → establishes coordinated commit ownership, not parallel overwrites. Adds to peer summary's recommendation: explicit git add <path> only; check with file's primary author before edits.
Cadence directive (line 18): "proceed. create a set of todos based on this new items, and fire whatever is needed and continue going! ensuring progress every 30 minutes... talk to peers also" → affirms multi-agent coordination pattern.

Buffy enhancements — review confirmation

Full-file re-read of both buffy artifacts (chatlog 43 lines + progress 134 lines). Confirmed:

4 enhancements (E1/E2/E4/E5) all PENDING — Swarm Cross-Check log, Typecheck Results, Code Review Results, and Git Commits tracking tables are all empty (one row each, all "—"). Work has not yet started post-design.
3 issues directly verified by buffy code review (review file lines 33-38):
- FOREX benchmark missing — benchmark_return('FOREX') returns None (no FOREX in benchmark_map at live_market_fetcher.py:155-162) despite DXY data fetched as "DX-Y.NYB"
- Per-pick disk I/O — _drift_pause_active() at quality_gates.py:4143 reads dashboard_data.json on every passes_active_gate() call (synchronous file read in hot path; 60+ active picks = 60+ unnecessary disk hits/cycle)
- No staging mode for drift pause — only on/off toggle; no dry-run to test before flipping DRIFT_AUTO_PAUSE_ENABLED=1
Master-plan ranking unchanged: all 4 buffy enhancements stay P1. E1+E2+E5 directly support the master-plan FOREX P0 cluster and Codex truth-layer; E4 (excess-return alert < -5%) complements edge-stability sidecar as an early-warning monitor.
No new buffy items beyond E1/E2/E4/E5. Tracking-table rows for E3 absent (skipped in numbering — likely an earlier idea dropped during scoping).

Why this plan is not generic (verification checklist)

✓ Each of 8 asset classes has its own action list with named strategies + specific file:line references + per-class promotion gate
✓ COMMODITY actions cite cot_positioning_CT_locked + multi_asset_cot + CT=F/KC=F concentration disclosure
✓ EQUITY actions cite rs-breakout-scout + aggregated_picks + claude_gainer_st contradiction + Breakout Momentum
✓ CRYPTO actions name kimi_signal_tracking + baby_strats:crypto_soc_* + quan_engine 18% drag + st_fear_greed_contrarian 94% WR + alpha_engine_fast drag #1
✓ FOREX actions cite PR #876 unit-corruption fix + memory feedback_noncrypto_resolver_live_close_bug root cause + CLAUDE.md mutate-before-kill directive
✓ ETF actions name specific symbols (XLF, XLE, XLK) + leveraged-ETF exclusion
✓ BOND + FUTURES + INDEX have explicit "keep paper / re-emission plan / defer" verdicts not just "expand sample"
✓ Cross-cutting P0 cluster names 8 specific actions with effort estimates + source-plan attribution
✓ Open PR triage assigns master-plan priority to all 24 open PRs, with recommended merge order anchored to per-class fixes
✓ Each promotion gate has measurable numerical thresholds (PF / WR / MDD / n / consistency / variance)

TL;DR

Source inputs (what fed this master plan)

Per-class action items (specific, not generic)

COMMODITY STABLE_EDGE

EQUITY STABLE_EDGE

CRYPTO DECAYING_EDGE

FOREX DECAYING_EDGE

ETF INSUFFICIENT_DATA

BOND INSUFFICIENT_DATA

FUTURES INSUFFICIENT_DATA

INDEX / OTHER INSUFFICIENT_DATA

Cross-cutting P0 cluster (next 24h, all-class impact)

P1 cluster (week 1, structural)

P2 cluster (weeks 2-4, per-class rehab in parallel)

P3 / P4 / P5 (longer horizon)

Open PR triage (24 open as of 2026-05-11 21:00 UTC)

Real-money gate (master) — synthesis

Class state machine

Per-class current state (master-plan declaration)

Test plan (per-class + cross-cutting)

Cross-cutting

Per-class

Peer plan v2 (swarm-revised) — already shipped

Plan v2 swarm-revised changes (commit 57d267a28e6)

Production code shipped this session (peer)

Concept-drift root cause (peer T4 verdict)

Master plan ripple-effects

Buffy enhancements (deepseek-v4-pro, PENDING) — queue into P1

Peer remaining backlog — incorporated into master plan

P0 (immediate, blocking)

P1 (this week)

P2 (next 2 weeks)

Contingent (user-gated)

Areas worth further investigation (from peer summary)

✓ Wave 1 SHIPPED — circuit-breaker HALT state removed

Subsequent shipped commits

⚠ Kimi swarm audit (4-agent, 2026-05-11) — RAW-DB read contradicts dashboard verdicts

Kimi top-line claim: "4x inflated dashboard vs RAW DB"

Reconciliation: same DB, different filters

Kimi findings that ARE net-new (not in current plan)

Kimi DSR / PBO / WFE framework — adopt for real-money gate

▶ /audit Decay Alerts — Action Required (2026-05-12)

Recommended next-session work

▶ HIGH CONVICTION filter audit — 2026-05-12 vs session data

Summary — required HC filter updates

▶ DB Health red-tier — remediation status (refreshed 2026-05-12 03:30Z)

Action required (master-plan commentary — refreshed 2026-05-12 03:30Z)

Recommended next checkpoint (post-fix re-snapshot)

⚠ DB Health red-tier crisis — original P0 remediation plan (Wave 0.5 → 4)

Wave 0.5 — Pre-deploy verification (5 min, READ-ONLY)

Wave 1 — Unfreeze (5 min)

Wave 1.5 — Independent pipeline checks (~30 min each)

Wave 2-4 — Schema + route fixes

Tie-in to SUPREME EDGE ENHANCEMENT

Verbatim chatlog cross-check (peer docs/chatlog_verbatim_2026-05-11.md, commit 77f42fa5c3e)

COMMODITY / EQUITY — STABLE_EDGE but NOT auto-promoted to OOS_READY

User policy quotes (from verbatim chatlog)

Buffy enhancements — review confirmation

Why this plan is not generic (verification checklist)

Plan v2 swarm-revised changes (commit `57d267a28e6`)

Verbatim chatlog cross-check (peer `docs/chatlog_verbatim_2026-05-11.md`, commit `77f42fa5c3e`)