Edge-stability sidecar (peer Phase G) verifies 2 stable classes + 2 decaying + 4 too-thin. COMMODITY and EQUITY are STABLE_EDGE with PF 3.61 / 2.04. CRYPTO and FOREX DECAYING_EDGE at PF 1.39 / 0.57. BOND, ETF, FUTURES, INDEX INSUFFICIENT_DATA.
Real-money posture (synthesis): Codex all-classes-first state machine + Kimi 7-check Go/No-Go gate + Cursor measurable Tier-2 criteria + Copilot 2-consecutive-weekly confirmation. No class trades live until all six major classes ≥ SHADOW.
Immediate (next 24h): Merge PR #904 (4 P1 swarm-fixes shipped, SSRF guard added, 16 smoke tests, 2 fabricated review claims rejected). Then execute 7-item P0 cluster.
| # | Source | Author / model | Key contribution |
|---|---|---|---|
| 1 | Claude Code plan | Opus 4.7 (this session) | Flagged claude_gainer_st winner-vs-blacklist contradiction; identified multi_asset_cot PF=19.19 needing DB verification |
| 2 | Cursor plan | Cursor Plan Mode | Canonical per-class baseline numbers (n=408/443/100/7875/1825/11); 4-phase fast-track with measurable n-targets |
| 3 | Copilot plan | GitHub Copilot Chat | 2-consecutive-weekly Tier-2 promotion gate; ETF+COMMODITY rollout order + CRYPTO curated sleeve parallel |
| 4 | Kimi plan | Moonshot Kimi (IDE) | 7-check Go/No-Go gate; named symbol-level edges (cot_positioning_CT_locked 89.8% WR, rs-breakout-scout 77.8% WR); 1-hour P0 fixes list |
| 5 | Codex plan | OpenAI ChatGPT Codex | Class state-machine BLOCKED→REHAB→OOS_READY→SHADOW→LIVE_ELIGIBLE; readiness.by_class payload contract; truth-layer-first policy |
| 6 | Peer Claude chatlog | Claude Opus 4.7 (1M context, peer session) | 9-phase work A-I; edge-stability sidecar with VERIFIED per-class verdicts; PR #904 swarm-reviewed + ready to merge; 14-item remaining backlog |
Edge-stability verdict: PF 3.61 / WR 55.7% / n=167 (peer Phase G). Cursor reported live asset_class_health PF 3.92 / WR 67.4% / n=408 — discrepancy is sample-window difference (edge-stability uses rolling window; asset_class_health cumulative).
Specific edges named:
cot_positioning_CT_locked LONG 89.8% WR, PF 13.1 (n=49)multi_asset_cot PF 19.19 / n=130 (Claude Code flagged for DB verification — implausibly high)Actions:
multi_asset_cot PF=19.19 via DB query against ejaguiar1_stocks — data integrity smoke test. If real, name as Tier-1 seed.alpha_engine/walkforward_validator.py output path (Cursor — currently missing from walkforward.by_class). Verify surfacing in audit_trail/dashboard_generator.py.cot_positioning_CT_locked pocket (Kimi).Promotion gate (master): walk-forward by_class block emits real folds + concentration disclosed + DB-verified PF + sustained Tier-2 (PF≥1.5 / WR≥50 / MDD≤20) for 2 consecutive weekly snapshots → OOS_READY. Then 14-30d SHADOW.
Edge-stability verdict: PF 2.04 / WR 57.4% / n=272 (peer Phase G). Cursor cumulative: PF 1.60 / WR 54.0% / n=443. Convergent — EQUITY is the strongest broad candidate per all 5 plans + verified by peer sidecar.
Specific edges named:
rs-breakout-scout LONG 77.8% WR, PF 6.7 (n=18)Breakout Momentum LONG 57.9% WR / PF 1.53 (n=38)aggregated_picks (77.3% WR / PF 6.42 / n=441) — Claude Code flagged "aggregator artifact suspect"claude_gainer_st (78.5% WR / PF 6.12 / n=3472) — Claude Code flagged contradiction: in BLACKLISTED_STRATEGIES at alpha_engine/config.py:216 yet tops the leaderboardActions:
claude_gainer_st winner-vs-blacklist contradiction (Claude Code). Confirm enforcement at exec gate, not just intake — memory feedback_gate_at_execution_not_generation.capped_vs_raw_pnl_gap).project_performance_reality.Promotion gate (master): capped MDD verified + claude_gainer_st reconciled + bottom-symbol pruning improves recent PF without breaking OOS consistency ≥ 80% → OOS_READY. Then 14-30d SHADOW.
Edge-stability verdict: INSUFFICIENT_DATA (peer Phase G; n below floor). Cursor cumulative: PF 1.48 / WR 60.0% / n=100. Kimi: PF 1.20 / WR 53.4% / n=88. Class is at the n floor — promotion gate blocked on sample.
Actions:
Promotion gate (master): n ≥ 100 AND PF ≥ 1.5 AND consistency ≥ 80% on edge-stability sidecar → OOS_READY. ETF has the cleanest OOS profile per Codex, so this should be the fastest to promote once n threshold cleared.
Edge-stability verdict: INSUFFICIENT_DATA. Cursor cumulative: PF 0.66 / WR 54.5% / n=11. Kimi: PF 1.72 / WR 55.6% / n=18. All plans agree: n too thin for any promotion verdict.
Actions:
alpha_engine/walkforward_validator.py output path (Cursor — currently missing from walkforward.by_class).Promotion gate (master): n ≥ 100 (multi-month effort) → BOND can re-enter the rotation. Keep paper-only until then.
Edge-stability verdict: INSUFFICIENT_DATA. Memory project_futures_kill_without_replacement: Futures module silent-dead (5.9% WR, -96% PnL after 2 strategies killed + no replacements added).
Actions:
Peer edge-stability sidecar reports INDEX class with INSUFFICIENT_DATA. Not enumerated in any of the 5 agent plans. Treat as "not yet a class."
Action: Defer until edge-stability sidecar emits real metrics for INDEX category.
| # | Action | Effort | Source plan(s) | Memory ref |
|---|---|---|---|---|
| 1 | Merge PR #904 (research orchestrator + edge stability sidecar) — already MERGEABLE/CLEAN at 6d7ccd928fd | 0.1h | peer chatlog Phase I | — |
| 2 | Blacklist kimi_signal_tracking at alpha_engine/config.py:216 + verify exec-gate enforcement | 1h | Claude Code, Kimi, Codex | feedback_gate_at_execution_not_generation |
| 3 | Ship baby_strats:crypto_soc_* quarantine via per-strategy BLOCKED_ASSET_STRATEGY_PAIRS at audit_trail/quality_gates.py:1499 | 1h | 4/5 plans | reports/baby_strats_overfit_quarantine_proposal_2026_05_10.md |
| 4 | Hard-cap FOREX sizing at 0 until PF ≥ 0.8 — explicit per-class gate (NOT silent kill) | 1h | Kimi, Cursor | CLAUDE.md FOREX directive; docs/MUTATION_THREE_AXIS_PROTOCOL.md |
| 5 | Verify multi_asset_cot PF=19.19 via DB query against ejaguiar1_stocks | 1h | Claude Code | — |
| 6 | Reconcile claude_gainer_st winner-vs-blacklist contradiction | 1-2h | Claude Code | feedback_gate_at_execution_not_generation |
| 7 | Verify max-drawdown calc uses capped PnL (Kimi 680% MDD smell-test) | 1h | Kimi, Codex | — |
| 8 | Cap quan_engine to 12% CRYPTO volume share | 1h | Kimi | — |
readiness.by_class payload contract (class state-machine fields + capped_vs_raw_pnl_gap + single_symbol_concentration + leaders.by_class + draggers.by_class).alpha_engine/walkforward_validator.py; surface in audit_trail/dashboard_generator.py.hf_stats.concept_drift.KS_D uncomputed-zero bug + refresh 19-day stale hf_stats. Wire drift → auto-pause sizing when D > 0.10 (peer remaining task "Drift-pause activation Phase 1")./audit threshold text with docs/PERFORMANCE_CHARTER.md v1.0.last_signal_date to systems payload (Claude Code — absent for all top-6 winners).mercury2_fast, ml_bg_system_b, copy_trader_highscore) per docs/MUTATION_THREE_AXIS_PROTOCOL.md.See per-class action lists above. Cross-cutting:
dry_run kwarg on smart_picks_engine + production_scanner + dashboard_generator (peer remaining backlog medium).p3_backtest_runner.py (peer remaining backlog; closes project_cpcv_gap_2026_04_28).project_performance_reality.FRED_API_KEY to secrets, gate strategies on regime); Kalshi pairwise consensus with Polymarket (pm_consensus_overlay.py sidecar).SHADOW for all six major classes (Codex all-classes-first).| PR | Title | Status | Master-plan action |
|---|---|---|---|
| #904 | research orchestrator + edge-stability sidecar | MERGEABLE / CLEAN at 6d7ccd928fd (per peer chatlog Phase I) | P0 MERGE NOW — unblocks the verdict-grade verdicts this master plan rests on |
| #903 | chore(loop): 2026-05-11 run findings | open | review after #904 merges (docs-only) |
| #902 | feat(b13): per-class regime filter sidecar + quality_gates.py hook — COMPLETE (supersedes #868/#872/#889/#895/#900) | 1 failed check | P1 investigate failed check; this is the regime-filter sidecar that gates strategies on regime — directly enables the Codex state-machine REHAB→OOS_READY transition |
| #901 | audit hourly 05Z | open | auto-merge if hourly cron |
| #898 | fix(B15): cross-asset correlation works without numpy | 1 failed check | P2 investigate; needed for cross-class verdicts |
| #893 | orphan_resolver_dryrun.py — 1,366 orphan closed_at preview | 1 failed check | P1 dry-run preview only; aligns with E-D3 work; investigate failed check |
| #892 | safe_db_archive.py — Hermes rule #1 gate | open | P1 needed for any DB write that touches blocklist updates |
| #891 | mysql_sync entry_time/exit_time fallback — repairs 87% NULL closed_at orphans | 2 failed checks | P0 directly addresses orphan rows — part of Codex truth-layer P0; investigate failed checks |
| #887 | WIN_RATE_TRAP_BLACKLIST — 6 crypto traps + 2 equity bombs | 1 failed check | P0 CRYPTO + EQUITY draggers — overlap with master-plan P0 #2 #3 #6; verify no duplicate blocks before merge |
| #885 | risk_policy v2 — tighten crypto per-symbol cap 10→5, per-trade 5→3 | 2 failed checks | P1 CRYPTO de-concentration; investigate failed checks |
| #884 | mysql_sync infer category for NULL/empty rows | open | P0 class-attribution backfill — part of Codex truth-layer |
| #883 | quality_gates swarm-batch-1 source score retunings (5 sources) | open | P1 review |
| #881 | tv-orchestrator LL1 fill-relative TP/SL | open | P2 TradingView paper-trade execution improvement |
| #879 | audit-dashboard Hermes 5-phase enhancements + 5000-round audit corpus | open | P2 peer chatlog notes branch is 3152 commits behind main + 83 ahead, conflicting; rebase or close+cherry-pick selectively |
| #878 | short_engine BULL-regime gate | open | P1 CRYPTO regime filter; complements #902 |
| #877 | mysql_sync elite_score backfill | open | P2 |
| #876 | mysql_sync pnl_pct anomaly clamp [-100, 200]% — kills forex unit corruption | open | P0 FOREX — see FOREX P0 #2 above; this directly fixes the unit-corruption side of FOREX PF degradation |
| #873 | chore(loop): B13 status | open | docs-only |
| #862 | DB query bank: forex pnl corruption + 50 untested live pairs + JPY-cross 100% losers | open | P1 FOREX investigation evidence; aligns with FOREX P1 deep-dive |
| #849 | Edge action plan + swarm peer-review harness (draft) | draft | upgrade-or-close; this master plan supersedes the action-plan portion |
| #846 | Shadow Probation panel on /audit Overview tab | open | P1 directly supports Codex state-machine SHADOW state visualization |
| #900 / #895 | b13 regime filter earlier iterations | open | close in favor of #902 (the COMPLETE version that supersedes per its title) |
Recommended PR merge order (master-plan P0 priority):
Adopt Codex's class state machine as governance scaffold, with Kimi's 7-check Go/No-Go as per-class checklist, Cursor's Tier-2 measurable criteria as the numerical floor, and Copilot's 2-consecutive-weekly as the persistence test.
SHADOW simultaneously; truth-layer green for 7 consecutive days; user personally approves| Class | State | Reason |
|---|---|---|
| COMMODITY | REHAB | edge-stability STABLE but walk-forward missing + concentration not disclosed |
| EQUITY | REHAB | edge-stability STABLE but capped-PnL not verified + claude_gainer_st contradiction |
| CRYPTO | BLOCKED | edge-stability DECAYING + 3 named draggers (kimi_signal_tracking, baby_strats:crypto_soc_*, quan_engine over-share) |
| FOREX | BLOCKED | edge-stability DECAYING + resolver bug + unit corruption (PR #876) |
| ETF | REHAB | n at floor; expand universe to clear sample threshold |
| BOND | BLOCKED | n=12 in verbatim chatlog read — way below n≥100 floor; cannot reach REHAB until sample expanded (multi-month effort) |
| FUTURES | BLOCKED | silent-dead per memory; need re-emission plan |
| INDEX | REHAB | insufficient data |
Current LIVE_ELIGIBLE count: 0/6. Earliest LIVE_ELIGIBLE target: not before week 8 given REHAB→OOS_READY (2-4 weeks) + SHADOW (2-4 weeks) for the fastest class.
money-maker-ready skill on freshest deployed payload after every pipeline change; require class-labeled claims in every report.readiness.by_class, leaders.by_class, draggers.by_class, concentration/caveat fields (Codex).docs/PERFORMANCE_CHARTER.md.BLOCKED in state machine.LIVE_ELIGIBLE.tests/test_quarantine_unit_blocklist.py); 16 smoke tests for edge stability (tests/test_edge_stability_smoke.py) via peer Phase I commit a9e045a757f.multi_asset_cot DB-verified PF within 2pp of systems payload.claude_gainer_st blocklist enforcement test (intake + exec gates).fwd_vs_bt_divergence build shows ≤ 2 baby_strats flags (down from 12); kimi_signal_tracking generates 0 new picks in test env; quan_engine volume share ≤ 12% in next 7-day window./audit.Peer Claude-B session pushed plan v2 + production code mid-document. Tracked here so this master plan stays current.
57d267a28e6)advisory_only=true; demotions deferred.walk_forward_by_strategy() is PROPOSED-NEW, does NOT exist yet. Master plan must not assume it.| Commit | Scope | File(s) |
|---|---|---|
cf4e924744a | Opt B walk-forward Tier-1 promotion gate (consistency≥60 + sharpe>0). FOREX blocked from T1 per current data | audit_trail/dashboard_generator.py |
cf229ea31ba | W4 benchmark-relative trailing-30d return per system (primary_asset_class / pnl_30d_pct / trades_30d / benchmark_30d_pct / excess_return_30d_pct) | audit_trail/dashboard_generator.py |
4ea32d227cf | Opt A TA-baseline panel on /audit (6-strategy benchmark cards per class) + _load_latest_ta_baseline() + renderTaBaseline() + nav hook | audit_trail/dashboard_generator.py + audit_dashboard/template.html |
82a34bc0fdb | tools/live_market_fetcher.py foundation (yfinance VIX/DXY/BTC/ETH/SPY/QQQ/GLD/TLT/oil + regime classifier + 1h cache) | new file |
5e4bc1efe63 | Block A fix (freebuff INDEX collision: ASSET_CLASSES "INDEX" → "INDEX_STOCK" + defense-in-depth in write_index() + delete stale INDEX.json + 9-class regen) + Phase 1.5.3 drift-advisory wire-in (Opt B re-tier loop now reads concept_drift.drift_alert; advisory_only=true when hot, demotions deferred) | tools/edge/edge_stability.py + audit_trail/dashboard_generator.py |
f740ace5c34 | CLAUDE2.MD A9-A13 + concept_drift root cause report + 37h quarantine verification (0 of 60 active picks match 30 quarantined pairs) | docs + report |
VIX -44.64% / 30d real regime collapse since 2026-04-22 — confirmed not pipeline noise. KS_D 0.31 vs 0.047 critical. Source: reports/concept_drift_root_cause_2026-05-11.md. This validates Codex's "fix truth layer first" position: drift is real and regime-driven, not a metrics artifact.
DRIFT_AUTO_PAUSE_DRY_RUN=1 env-gate for 7d "would-pause" loggingclass_capital_gate(asset_class) → (allow_size, max_pct, reason) with capital_gate_log.jsonl trailBuffy session at docs/chatlogs/chatlog_2026-05-11_buffy_review.md + docs/chatlogs/progress_2026-05-11_buffy_enhancements.md. Code review of Opt-A / Opt-B / W4 found 3 issues + 4 enhancement opportunities. All status PENDING (not yet committed).
| ID | Enhancement | Problem (verified by buffy code review) | Master-plan ranking |
|---|---|---|---|
| E1 | Add FOREX benchmark (DXY) to benchmark_return() |
live_market_fetcher.py:155-162 benchmark_map omits FOREX. FOREX systems (n=1801, PF 0.27) get benchmark_30d_pct=None in dashboard despite DXY already fetched as "DX-Y.NYB". One-line dict-add. |
P1 FOREX — directly supports FOREX deep-dive (master plan FOREX P1 action #3) |
| E2 | Cache drift-pause check with 60s TTL | quality_gates.py:4143 _drift_pause_active() reads dashboard_data.json from disk on every passes_active_gate() call. 60+ active picks = 60+ disk hits/cycle. Synchronous I/O in hot path. |
P1 perf — gates the eventual drift-pause flip; performance matters when active list grows |
| E4 | Excess return alert (< -5%) | W4 code at dashboard_generator.py:12700-12747 computes excess_return_30d_pct per system. Data exists; no monitor. Addresses Step 9 of 10-investigations plan. New w4_alerts key in dashboard_data.json. |
P1 monitoring — early warning system per class, complements edge-stability sidecar |
| E5 | DRIFT staging dry-run mode | DRIFT_AUTO_PAUSE_ENABLED is binary (0=advisory, 1=hard pause). No staging. Per CLAUDE2.MD: "Don't flip without staging-first per swarm consensus." Add DRIFT_STAGING_MODE=1 env var that logs would-block picks without actually blocking. |
P1 safety — prerequisite for ever flipping DRIFT_AUTO_PAUSE_ENABLED=1; aligns with peer T1 contingent task |
Buffy methodology note: each enhancement follows before/change/after/verify/evidence pattern with line-range citations + swarm cross-check round + typecheck + code review log. Three rows in tracking tables currently empty — work hasn't started yet.
Issues found but not yet enhanced (queue):
benchmark_return('FOREX') returns None → E1 fixAction: Buffy to proceed with E1/E2/E4/E5. Each lands as separate small PR with co-author + commit message citing buffy-progress file. Expected to clear the buffy queue within 1-2 days.
Per peer reports/session_summary_2026-05-11.md:
.github/workflows/audit-dashboard.yml regen check tier2_proven_strategies.cards[*].walkforward_gate.advisory_only.alpha_engine/walkforward_validator.py to add COMMODITY + BOND + ETF to by_class output (currently 4 of 7 classes).walk_forward_by_strategy() (PROPOSED-NEW; does NOT exist yet). Pairs with peer's per-strategy edge_stability table.DRIFT_AUTO_PAUSE_DRY_RUN=1; 7d "would-pause" logging. Buffy E5 overlaps.class_capital_gate(asset_class) with capital_gate_log.jsonl. PROPOSED-NEW.regime_validation.regime_wr_breakdown all-zero rows).multi_asset_cot PF 19.19 / n=130 outlier risk).DRIFT_AUTO_PAUSE_ENABLED=1 (HIGH risk; Phase 4.1 must be clean for 7d first).multi_asset_cot PF 19.19 via MySQL (LOW risk; needs DB_STOCKS_PASSWORD). Master plan P0 #5.asset_class="INDEX_STOCK"..git peer-collision risk — 4+ silent branch auto-switches observed. Recommend isolated worktrees (git worktree add) protocol per agent.git add -A swept other agents' files. Add project rule: explicit git add <path> only.git checkout origin/main -- audit_dashboard/data/dashboard_data.json to refresh just that file before any planning read.last_signal_at stale 2.5mo (e.g. ml_crypto_pred_v12) but cumulative WR/PF still surfaced. Misleading. Phase 4.3 DB-lineage card needs to include per-system cutoff date.Commit 81bd0b86388 on main 2026-05-11 21:30Z. Removed alpha_engine/data/circuit_breaker_state.json (HALT, max_picks=0, 48d stale from 2026-03-24).
Verification PASSED: direct query against ejaguiar1_stocks::bt_backtest_trades shows MAX(imported_at) WHERE status IN ('WON','LOST') = 2026-05-11 20:00:59 with n=1,819,839. Forward validator unfrozen; WON/LOST writes resumed within 1 hour. Wave 1.5 independent fixes (lm_signals expire-cron, signal_tier writer, at_consensus_picks time-travel) still queued per kilo carveout.
| SHA | Change |
|---|---|
4a2d337a5dc | P0 #2 + #3 + #4 — blacklist kimi_signal_tracking + 3 named crypto_soc_* draggers to BLOCKED_ASSET_STRATEGY_PAIRS + raise elite-score floors (FOREX 50→70, COMMODITY 50→65, EQUITY 50→60) |
Archive: reports/kimi_edge_audit_2026-05-11/ (32 files — comprehensive_analysis_report.md, edge_audit_report.md, industry_standards_research.md, 7 rolling PNGs, 7 CSVs).
| Metric | Dashboard (post-filter) | Kimi RAW DB |
|---|---|---|
| Win Rate (all classes) | 34-43% | 11.13% (6,178 / 55,510) |
| Total PnL | +949% claimed | -3.56% avg/trade × 55,510 = -197,487% |
| Sharpe Ratio | varies | -2.34 annualized |
| Profit Factor | 1.49 | 0.46 |
| ML accuracy | — | 32.6% (worse than coin flip) |
| Calibration | — | 96% conf → 0.9% actual WR (INVERTED) |
| Backtest vs Live (CRYPTO) | — | 42.4% BT WR / 11.3% live → -31.3pp gap |
The gap is real but BOTH reads are technically correct:
at_raw_picks / trading_picks — including ghost rows, pre-resolver-v2 (Apr-27 fix) noise, dead-strategy artifacts, post-Wave-1 phantom EXPIRED rows. This is the pessimistic floor.asset_class_health (post-resolver-v2 filtered, CLAUDE.md verdict-grade) — Cursor reads n=7,875 CRYPTO vs Kimi n=51,049. The 7x difference is the resolver-v2 + ghost-quarantine filtering.Master-plan adjustment: the dashboard "STABLE_EDGE" verdicts on COMMODITY + EQUITY still hold for the currently-active sleeve. But Kimi's WR 11.1% / Sharpe -2.34 RAW floor must be disclosed on /audit alongside the post-filter numbers — currently it isn't, which is the "4x inflation" Kimi flagged. Codex's readiness.by_class payload contract (P1) is the right place to surface both reads side-by-side.
project_performance_reality confirmed it on ETF/CRYPTO subset — Kimi extends to all classes.alpha_engine/anti_overfit_validator.py EXISTS but orphan. 13,886 bytes, last modified 2026-05-02. Contains CPCV / PBO / DSR code per Kimi inspection. P1 wire-up per CLAUDE.md Wire-Up Rule: production caller in calculate_smart_score / passes_smart_gate / score_pick. Once wired, automatically rejects strategies with PBO > 0.05 — the most critical defense against the 26-baby_strats-overfit pattern.random_forest.py, xgboost_model.py, lstm_model.py are 14-byte placeholders per Kimi. P2 verification — confirm + remove or implement.paper_trades table is EMPTY. Codex's SHADOW state requires 14-30d forward shadow tracking. Without paper_trades populated, SHADOW cannot be measured. P1 add — wire paper-trade variance recorder before any class can reach SHADOW.alpha_engine PnL. USDCHF=X almost certainly the FOREX unit-corruption bug (PR #876 pnl_pct clamp [-100, 200]%). P0 dependency — merge #876 before any FOREX verdict.| Metric | Minimum | Source |
|---|---|---|
| Deflated Sharpe Ratio (DSR) | > 0.95 | Lopez de Prado AFML |
| Probability of Backtest Overfitting (PBO) | < 0.05 | Bailey + Lopez de Prado |
| Walk-Forward Efficiency (WFE) | > 60% | Pardo |
| Min Track Record Length | > 2 years | AFML |
| Live Sharpe Ratio | > 0.5 | Industry standard |
| Max Drawdown | < 20% | Charter Tier 2 |
| Win Rate | > 50% | Codex T2 (or PF>1.5 substitute) |
Adopt Kimi's 10-step validation pipeline as the real-money gate before LIVE_ELIGIBLE: pre-register hypothesis → in-sample → WFA → CPCV → DSR → structural break tests → sensitivity analysis → transaction cost analysis → 3-6mo paper trading → graduated deployment (5% → 25% → 100%).
Net: Kimi's verdict is harsher than master plan but doesn't invalidate the COMMODITY/EQUITY STABLE_EDGE finding for the FILTERED sleeve. It does add 3 new P0 actions (calibration verify, EQUITY filter-criteria audit, PR #876 dependency surfaced) + 1 P1 (anti_overfit_validator wire-up) + a stricter real-money gate (10-step / DSR / PBO / WFE).
14 HIGH rolling-7d WR drop alerts (>20pp baseline decay) + 3 MEDIUM staleness alerts surfaced on /audit. Per CLAUDE.md MUTATION_THREE_AXIS_PROTOCOL: mutation-before-kill applies; REDUCE-not-BLOCK is the soft action.
| Sev | Strategy | 7d WR vs baseline | Status this session | Master-plan action |
|---|---|---|---|---|
| HIGH | myfxbook_retail_contrarian | 19% vs 46% (-27pp) | BLOCKED commit a64e80e70d1 | Done |
| HIGH | forex_rsi2_mean_reversion | 10% vs 44% (-34pp) | BLOCKED via fx_kill_switch (commit a64e80e70d1) | Done |
| HIGH | cta_cross_asset_tsmom | 28% vs 46% (-18pp) | Open | P1 Add to BLOCKED_ASSET_STRATEGY_PAIRS (FUTURES) — alert is class-level drag |
| HIGH | ig_contrarian_sentiment | 20% vs 45% (-25pp) | Open | P1 CRYPTO sentiment-contrarian; quarantine pending mutation-axis analysis |
| HIGH | futures_momentum | 4% vs 42% (-38pp) | Open | P0 Largest drop. FUTURES is BLOCKED per master plan — add to BLOCKED_ASSET_STRATEGY_PAIRS (FUTURES) outright |
| HIGH | st_multi_day_momentum | 47% vs 68% (-21pp) | Open | P1 Soft-demote — still positive Sharpe likely; reduce sizing 50% |
| HIGH | macd_rsi_m048 | 53% vs 73% (-20pp) | Open | P1 Boundary; monitor 7d before action |
| HIGH | ema_momentum_m006 | 36% vs 56% (-20pp) | Open | P1 Below 50% sub-floor; quarantine |
| HIGH | hs_lb_None | 0% vs 34% (-34pp) | Open | P0 0% recent WR = total decay. hs_lb_None = head-shoulders + None-leverage suffix = likely a parsing bug emitting null variants. Investigate before block. |
| HIGH | crypto_rsi_whaleconfirmed_v1 | 18% vs 55% (-37pp) | Open | P1 CRYPTO momentum-with-whale-confirm; large drop suggests whale signal degraded |
| HIGH | keltner_compression_expansion_eth_v1 | 29% vs 51% (-22pp) | Open | P1 ETH-specific Keltner; consider symbol-axis mutation per MUTATION_THREE_AXIS_PROTOCOL |
| HIGH | vwap_deviation_reversion_sol_v1 | 27% vs 47% (-20pp) | Open | P1 SOL-specific VWAP; similar mutation analysis |
| HIGH | MeanReversionBB | 25% vs 60% (-35pp) | Open | P0 Large baseline-to-recent drop; BB mean-reversion likely regime-broken (drift_alert TRUE confirms) |
| HIGH | claude_ml_moderate_mut | 42% vs 68% (-26pp) | Open | P1 Mutation-axis name suggests genetic-evolved variant; check anti_overfit_audit.json for DSR before action |
| MED | copy_trader_clones | silent 93h | Open | P2 Likely Wave 1 will unblock — monitor 24h post-Wave-1 then escalate |
| MED | stocksunify2 | silent 96h | Open | P2 Same — monitor |
| MED | kimi_live_signals | silent 98h | Open | P2 Distinct from blacklisted kimi_signal_tracking; verify if intentionally stopped (peer chatlog signal_tier writer pause) or accidental |
futures_momentum (FUTURES) + MeanReversionBB (regime-driven decay) into BLOCKED_ASSET_STRATEGY_PAIRS. hs_lb_None investigation precedes block.DECAY_ALERT_REDUCE set to audit_trail/quality_gates.py; deduct 8 pts from calculate_smart_score for any strategy in the set. Cleaner than hard-block since these may rehab once drift clears.Current /audit HC panel shows per-class thresholds from 2026-04-15 data. Re-verified against session work (Kimi RAW DB read, anti-overfit DSR sidecar, P0 #10 EQUITY filter trace).
| Class | Current HC verdict | Session data check | Action required |
|---|---|---|---|
| CRYPTO | EDGE — FWD WR≥60% + Score≥55 + Trust≥4 → "WR 60.3% on N=562 (+9.7pp)" | Anti-overfit DSR sidecar (post-Wave-1): 4 EDGE_LIKELY_REAL ml_enhanced sleeves (INJUSDT/FETUSDT/DYDXUSDT/RENDERUSDT 1d+1h variants). 33 OVERFIT_LIKELY in same family (mostly _15m_*). | P1 Add anti-overfit gate: HC pass requires strategy IN anti_overfit_audit.json::strategies WHERE verdict='EDGE_LIKELY_REAL'. Auto-rejects the 33 OVERFIT_LIKELY sleeves even if FWD WR ≥ 60% passes. |
| EQUITY | EDGE — FWD WR≥55% + Score≥50 + Trust≥5 → "WR 68.1% on N=72 (+29pp)" | P0 #10 verify: dashboard 54% WR is honest for tagged-EQUITY subset. stocks_rsi2_pullback (n=70, WR 62.9%, avg +0.78%) is the real EQUITY edge sleeve. n=72 HC cohort likely overlaps this. Note: n=72 still below master-plan n≥100 charter floor. |
P2 Verify HC cohort overlap with stocks_rsi2_pullback. Either rename verdict to "EDGE (thin n=72)" or widen FWD WR floor to admit more samples. |
| FOREX | EDGE — FWD WR≥45% + Score≥50 + Trust≥5 → "WR 65.8% on N=73" | RED FLAG. Anti-overfit DSR sidecar has ZERO FOREX EDGE_LIKELY_REAL. Kimi RAW FOREX WR 9.9% n=605, PF 0.28 (stat-significantly LOSING). Master plan FOREX state = BLOCKED per Codex state machine + elite_score floor raised to 70 (commit 4a2d337a5dc) + 3 toxic strategies re-blocked (commit a64e80e70d1). Showing "EDGE" with N=73 sample is small-sample artifact masking systematic decay. |
P0 Downgrade FOREX HC verdict from EDGE to BLOCKED or DEAD. The "WR 65.8% on N=73" is statistically vulnerable; bootstrap CI almost certainly straddles 0.50. Cite master plan FOREX BLOCKED state. This is the most dangerous current verdict on the /audit page. |
| COMMODITY | WEAK — Trust≥5 → "PF 1.28 on n=273" | DSR sidecar: cot_positioning n=104, WR 86.5%, Sharpe +1.377, DSR=1.0000 (highest of any strategy). Antigravity audit confirmed cot_positioning_CT_locked LONG = 89.8% WR / PF 13.1 (n=49). COMMODITY aggregate is mediocre because it includes non-COT strategies dragging the average; the COT sleeve specifically has REAL edge. |
P1 Add COMMODITY HC carve-out: strategy IN (cot_positioning, cftc_cot_commercial_signal, cot_positioning_CT_locked) AND elite_score ≥ 65. Mark as EDGE via that filter. Class aggregate stays WEAK without the carve-out — that's correct. |
| BOND | NO DATA — n=8 | Kimi BOND: n=18 PF 1.72 WR 55.6%. Sub-floor (n<100 master plan charter). | P2 Update n from 8 → 18 (live count) + note "needs n≥100 multi-month accumulation per Wave 1 unfreeze." No filter change. |
| ETF | DEAD — PF 0.28 n=19 | Kimi ETF: n=88-100 / WR 53.4% / PF 1.20. Master plan ETF state = REHAB (not DEAD). Class is at sample floor, not dead. Cleanest OOS profile per Codex. | P1 Reclassify ETF from DEAD → THIN_REHAB. Update text "PF 0.28 n=19" → "PF 1.20 n=88-100, needs n≥150 for OOS_READY promotion (Codex state machine)". The DEAD verdict is stale + miscalibrated. |
| FUTURES | DEAD — WR 5.9% n=17 | Per memory project_futures_kill_without_replacement: silent-dead, 5.9% WR / -96% PnL post-2-kills. Master plan FUTURES state = BLOCKED. Verdict accurate. |
P3 Keep DEAD verdict; flag for re-emission plan or formal retire from /audit per master plan FUTURES section. |
cot_positioning family as EDGE while keeping class aggregate WEAK.anti_overfit_audit.json).anti_overfit_audit.json::verdict_counts.Implementation surface: audit_dashboard/hc_filter.js (per-asset-class HC gates per CLAUDE.md) + audit_dashboard/template.html:~1203-1300 (HC overlay text). Each verdict update is a small JSON-config tweak per config/hc_thresholds.json if it exists.
Per-metric progress on the 6 red-tier items in the DB Health — 2026-05-08T15:00Z panel. "Action required" commentary added inline. Updated 2026-05-12 03:30Z with sign-coherence guard, ghost-row triple-axis block, and CI commit-list fix.
| Metric | Original value | Action shipped this session | Expected post-fix | Status |
|---|---|---|---|---|
| Forward Validator Freshness | 840h since last WON/LOST (2026-04-02) | Wave 1 commit 81bd0b86388 — rm alpha_engine/data/circuit_breaker_state.json (HALT max_picks=0 from 2026-03-24) |
VERIFIED: bt_backtest_trades.MAX(imported_at) = 2026-05-11 20:00:59 (n=1.8M) |
RESOLVED |
| WON-vs-PnL contradiction | YES (avg pnl per status — writer bug) | 2026-05-12 03:00Z — direct writer fix: commit 22b677c1167 adds sign-coherence guard to both atomic status+pnl writers — alpha_engine/outcome_resolver.py:1670 (resolver path) and audit_trail/mysql_client.py:628 (mysql_close_trade canonical write). When source supplies exit_reason=TP + pnl_pct < 0 (or SL + pnl > 0) the guard now trusts the pnl sign and logs won_pnl_contradiction: WARNING. Plus earlier confidence-normalizer migration (613c65cb, all 9/9 callsites). |
Stops new contradiction rows; existing contradicted rows still in DB until backfill SQL pass. | PARTIAL (forward-fixed) |
| PnL Integrity (sampled) | 42.0% (58k/100k mismatch >1pp) | PR #876 merged 818ff966222 — writer-side clamp [-100, 200]% in mysql_trading_sync.py (kills USDCHF=X -106,700% outlier); P0 #7 1c535a19105 read-side clamp at dashboard_generator.py:9309 max_dd cumulation |
Future rows clamped; legacy poisoned rows still in DB until backfill | PARTIAL (FORWARD-FIXED) |
| Phantom EXPIRED rows | 100.0% (1 class, worst-case) | Wave 1 unfreeze + PR #891 merged 486f7bf2989 — mysql_sync entry_time/exit_time fallback (repairs 87% NULL closed_at orphans on future syncs) |
Resolver lag still present; expected reduction over next 3-7 cron cycles; Wave 1.5a/b/c (lm_signals + signal_tier + at_consensus_picks) still queued | PARTIAL |
| Raw-Pick Outcome Coverage | 0.09% (121/136,374 resolved) | 2026-05-12 03:00Z root cause confirmed: Resolver itself is correct — it correctly returns 0 because all 8,151 entries in closed_picks.json already have terminal status (per Investigator B). The real bug is upstream: NO writer reads ACTIVE rows from at_raw_picks, detects TP/SL/time-exit, and feeds new entries into closed_picks.json. Existing references in alpha_engine/outcome_resolver.py:1931, :2317, crypto_risk_gates.py:179, scanner.py:4781 all RE-write the same 8,151 entries. mysql_client.py:601 mysql_close_trade() exists but has no caller from TP/SL detection. Independent of Wave 1 unfreeze. |
Needs a new sync_active_mysql_picks_to_json() that reads ACTIVE at_raw_picks, computes per-class TP/SL hit logic, writes terminal entries to closed_picks.json + back to at_raw_picks. Queued as P0 follow-up. |
DIAGNOSED — IMPLEMENTATION QUEUED |
| Ghost Rows (constant pnl_pct) | 655,000 (18 cohorts, n>1000, distinct<5) | 2026-05-12 03:15Z — symbol-axis quarantine shipped: commit 597819d79c7 introduces BLOCKED_ASSET_STRATEGY_SYMBOL_TRIPLES in audit_trail/quality_gates.py with 5 documented cohorts: (CRYPTO, quan_engine, MATICUSDT) 215k rows, (CRYPTO, KIMI_signal_tracker, ETHUSDT/BTCUSDT) multi-bucket, (CRYPTO, irb_hoffman, ADAUSDT), (CRYPTO, funding_rate_carry, ROBOUSDT). Enforced at passes_active_gate (kills new emissions) AND dashboard_generator.py::_is_historical_blocked_pick (excludes from historical aggregates). meta_strategy 1.6M-row template family deferred until db_health.json::ghost_rows.top_cohorts repopulates (currently [] on the 2026-05-08 snapshot). |
Expected ~220k+ rows excluded from CRYPTO aggregates on next generator run; total 655k → ~440k. Remaining ~430k = meta_strategy family + small long-tail. | PARTIAL (5 of ~18 cohorts blocked) |
22b677c1167), PnL Integrity (PR #876 + P0 #7), Phantom EXPIRED (PR #891), Ghost Rows (triple-axis quarantine 597819d79c7, 5 cohorts ~220k rows). Forward writes guarded; legacy rows still in DB until a backfill SQL pass.sync_active_mysql_picks_to_json()). Needs a new bridge that reads ACTIVE at_raw_picks, detects TP/SL/time-exit per asset class, and writes terminal entries into closed_picks.json + back into at_raw_picks. Promoted to P0 follow-up — the resolver itself works; it just has an empty queue forever.d317560ac9c adds audit_dashboard/data/db_health.json + audit_dashboard/data/cot_paper_pilot_status.json to the workflow commit-list (.github/workflows/audit-dashboard.yml:600). Both files were being FTP-deployed each cycle but never committed to git, so origin/main stayed 2026-05-08T15:00Z for 4 days. Next hourly cron will commit refreshed metrics — this DB Health panel will finally reflect post-fix state instead of the stale snapshot.Re-pull db_health.json at 2026-05-12 06:00Z after 2-3 hourly cron cycles. Expected deltas if fixes hold:
won_pnl_contradiction: WARNING count = 0 after one cycle if guard is hotLive /audit DB Health panel (snapshot 2026-05-08T15Z but values persisted) shows 6/6 red metrics. This is the truth layer Codex says must be fixed first. Remediation already documented at reports/db_evidence_graded_final_2026-05-08.md (222 lines).
| Metric | Value | Severity | Root cause |
|---|---|---|---|
| PnL Integrity (sampled) | 42.0% (58k / 100k mismatch >1pp) | RED | resolver stalled — F1 cascade |
| Ghost Rows (constant pnl_pct) | 655,000 (18 cohorts n>1000, distinct_entries<5) | RED | F2 — synthetic stamping in writer |
| Forward Validator Freshness | 840h since last WON/LOST (2026-04-02 12:00) | RED | F1 — circuit_breaker_state.json HALT persisted from 2026-03-24, MAX_ACTIVE_PICKS=0 chokes validator |
| Phantom EXPIRED rows | 100.0% (1 class, worst-case) | RED | F3 race condition — resolver doesn't run before expire-cron |
| Raw-Pick Outcome Coverage | 0.09% (121 / 136,374 resolved) | RED | downstream of F1 cascade |
| WON-vs-PnL contradiction | YES (avg pnl per status — writer bug) | RED | F5 confidence inversion — STRONGEST evidence (kilo + deepseek confirm) |
alpha_engine/data/circuit_breaker_state.json is git-committed (last touched commit fa9b6b38109 2026-03-28).SELECT MAX(imported_at) FROM bt_backtest_trades WHERE status IN ('WON','LOST') — expect 2026-04-02.grep -n "circuit_breaker\|is_locked" alpha_engine/{forward_validator,outcome_resolver,production_scanner}.py.SELECT signal_tier, MIN(ts), MAX(ts) FROM at_discord_notifications WHERE signal_tier IS NOT NULL GROUP BY signal_tier.rm alpha_engine/data/circuit_breaker_state.jsonfix(circuit-breaker): rm 2026-03-24 stale HALT state — unfreezes forward_validator (35d freeze).github/workflows/audit-dashboard.yml (hourly cron).MAX(imported_at) for WON/LOST advances past 2026-05-11.Precedent: PR #497 fixed R3 stale-state on 2026-04-27 (same class of bug). Same 2026-03-24 leak still live — referenced in freeze_2026_04_02_root_cause_2026-05-08.md:115. Memory ref: feedback_circuit_breaker_stale_state_leak.
Kilo carveout: lm_signals + at_consensus_picks + at_discord_notifications fail independently — NOT auto-fixed by circuit-breaker deletion. Each has its own cron.
lm_signals expire-cron: exit_price=0 in 96.2% of expire-cron rows (F10) — patch cron to skip expire-write when exit_price unset.at_discord_notifications.signal_tier: 99.99% NULL (F8) — locate writer; backfill schema.at_consensus_picks time-travel: 57.3% rows with future-dated entries (F4) — writer guard against future timestamps.terminal_outcome column to eliminate WON-vs-PnL contradictions at read time.challenge_200_trades (per remediation doc).OOS_READY until at least Wave 1 + 1.5 clear.algorithm_rolling_perf → walk-forward sees pre-2026-04-02 data only. Master plan already de-ranked walk-forward to advisory; this explains why.docs/chatlog_verbatim_2026-05-11.md, commit 77f42fa5c3e)Subagent scan of peer's 526-line verbatim chatlog identified 3 mandatory blockers + 1 class downgrade:
drift_alert=false for 7 consecutive days before Phase 2 kickoff. Treat walk-forward as advisory-only while drift hot. Halt + recalibrate if drift persists > 30d.BLOCKED above.5e4bc1efe63 ships the rename + defense-in-depth.Verbatim chatlog edge-stability verdicts (COMMODITY STABLE_EDGE n=167, EQUITY STABLE_EDGE n=272) do NOT auto-promote these classes to OOS_READY. Per Codex truth-layer-first policy + master-plan promotion gates, these classes hold at REHAB until:
multi_asset_cot PF=19.19 DB-verified + CT=F/KC=F concentration disclosed + walk-forward.by_class emits real foldsclaude_gainer_st winner-vs-blacklist contradiction reconciled + capped-vs-raw PnL gap verified + bottom-symbol pruning lands without OOS consistency < 80%edge-stability sidecar verdict alone is necessary but NOT sufficient for promotion.
git add <path> only; check with file's primary author before edits.Full-file re-read of both buffy artifacts (chatlog 43 lines + progress 134 lines). Confirmed:
benchmark_return('FOREX') returns None (no FOREX in benchmark_map at live_market_fetcher.py:155-162) despite DXY data fetched as "DX-Y.NYB"_drift_pause_active() at quality_gates.py:4143 reads dashboard_data.json on every passes_active_gate() call (synchronous file read in hot path; 60+ active picks = 60+ unnecessary disk hits/cycle)DRIFT_AUTO_PAUSE_ENABLED=1cot_positioning_CT_locked + multi_asset_cot + CT=F/KC=F concentration disclosurers-breakout-scout + aggregated_picks + claude_gainer_st contradiction + Breakout Momentumkimi_signal_tracking + baby_strats:crypto_soc_* + quan_engine 18% drag + st_fear_greed_contrarian 94% WR + alpha_engine_fast drag #1feedback_noncrypto_resolver_live_close_bug root cause + CLAUDE.md mutate-before-kill directive