EAGLE2 Swarm Session Summary

One-line verdict: Production /audit has zero money-ready asset classes today. Real edge lives in AI tournament (paper) and verified lab sleeves (shadow) — not in the main policy-clean book. We shipped measurement honesty, a daily operator bundle, walk-forward gates, and H-102 pre-registration for ETF dual momentum.

1. Key findings

Research edge ≠ production edge

Lab and tournament books can show PF 2–6 while the live policy-clean aggregate loses money (CRYPTO PF 0.92, EQUITY PF 0.33). Hundreds of weak emitters dilute any sub-cohort that works in isolation.

ELI5We have a great recipe in the test kitchen, but the restaurant is still serving yesterday's burnt batch to paying customers.

0/9 asset classes are `MONEY_READY`

Authoritative layer: audit_dashboard/data/money_ready_verdict.json (policy-clean, post-resolver). CRYPTO and EQUITY are NOT_READY; ETF/FOREX/etc. lack sample size or fail PF gates.

ELI5Our report card says "not ready for real money" on every subject — we shouldn't increase bet size yet.

Three different "books" must not be mixed

Book	URL / source	Use for sizing?
Production	/audit + `money_ready_verdict`	Only this one (today: none)
Tournament paper	ai-tournament	Paper watch
Lab + forward pilot	`verified_strategies/`	Shadow ≤0.5%

ELI5Three scoreboards measure different games — only the main audit scoreboard counts for real money, and it's empty right now.

Concentration & resolver pollution inflate headline stats

Single sources (e.g. battleground_luxalgo, mega_mutation) dominate CRYPTO volume. EXPIRED→WON labels and duplicate signal timestamps distort WR/PF — especially FOREX (high WR, PF<1).

ELI5One loud friend is skewing the class average, and some grades were marked wrong — fix the grading before trusting the GPA.

Bonferroni / DSR: ~80+ hypotheses tested historically

Family-wise α=0.05 ⇒ per-test α_adj ≈ 0.000625. Raw green funnel cells often fail after SPA/DSR — that's expected, not a bug.

ELI5If you try 80 lottery numbers, one will look lucky by accident — we need stricter math before calling it skill.

2. Achievements shipped (live site)

Public documentation on /updates

This session summary — findings, rationale, plans, ELI5 on every block.
Best picks guide — production vs tournament tables + EAGLE file index.
Updates index — top card links here (newest-first).

ELI5We posted a report card and watch list on the website so nobody has to read 60 markdown files to understand what is safe to trade.

Audit honesty banners (deploy via FTP after merge)

AI Tournament — capital-lock strip: 0/9 money-ready.
pf.html — research-tier strip on portfolio pages.
verified_edge_status.json — includes best_picks_guide URL.

ELI5Warning labels on practice-game pages so practice wins are not mistaken for permission to bet real money.

Production gates (code on main)

EAGLE-4 — CRYPTO LONG→SHORT flip, persona kills, direction kills in production_scanner.py.
EAGLE-5 — tournament-validated symbol/persona confidence boosts via eagle_gates.py.
Swarm evidence — reports/best_picks_swarm_review_2026-06-02.json (LiteLLM second check).

ELI5The pick robot flips bad crypto directions and ignores noisy strategies before picks fight for slots.

3. Tasks accomplished in repo

4. Where profit lives today (ranked)

4. Why the PF page did not mean "empty portfolio"

Rank	Surface	Edge?	Evidence
1	AI tournament + pf.html	Best paper	deepseek_v4 WR 57.7% PF 3.46 n=208; 66/81 portfolios have opens
2	ETF dual momentum (lab)	Shadow pilot	Lab PF 1.60 n=104; OOS WF PF 1.21 n=32; live ETF n=3
3	Crypto VWAP / Bollinger (lab)	Shadow	WF PASS; production CRYPTO NOT_READY
4	pick_funnel	Discovery only	Cells often fail concentration / SPA
5	ai_leaderboard	Stale / thin	Swarm book ≠ production; index often dated
6	/audit policy-clean	No deployable edge	0 money-ready classes

Live PF books existed; the empty-state message was the misleading part

The cited example book was deepseek_v4__aggressive. Live investigation found the portfolio system was populated:

Check	Result	Implication
PF roster export	81 portfolios	The PF universe was present on the live server.
Books with open positions	66	The UI was not blank because the system had no active portfolios.
`deepseek_v4__aggressive`	11 open names	The user's example book was alive and non-empty.

The real failure mode was ambiguity. When detail JSON lookup failed, the older page implied "first daily run pending" instead of distinguishing a bad ?key=, invisible pasted Unicode, stale deploy, or roster/detail mismatch.

ELI5The bookshelf had books on it. The sign on the shelf was wrong and made it look empty.

5. Best picks — detailed rationale

Each pick below is tagged PAPER (tournament), SHADOW (lab forward pilot), or MONITOR (interesting symbol, production book weak). None are approved for full capital sizing today.

PAPER WATCH deepseek_v4 tournament book

Why cite it: #1 on ai_tournament_leaderboard.json with n=208 resolved (passes min_n≥30). Wilson-interval WR 57.7%, bootstrap PF 3.46 — strongest stable paper track among models.
Why not size production: Tournament picks are a different universe than at_raw_picks policy-clean book. No walk-forward + money_ready convergence. Portfolio deepseek_v4__aggressive has 11 opens — book is alive, not empty.
What would promote it: Virtual forward book fed only from tournament picks → n≥100 → same admissibility pipeline as lab → live PF within ±10% of backtest for 8 weeks shadow.

ELI5This AI's practice trades win a lot — we watch it like a sports draft prospect, but don't pay it like a pro until it wins on the same field as everyone else.

PAPER WATCH gpt4o / grok3 tournament (secondary)

Why cite it: gpt4o: WR 59.7%, PF 3.14, n=134. grok3: n=303, PF 2.29 — depth for cross-model agreement.
Why secondary: Same tournament≠production gap. Use for triangulation, not sole signal.

ELI5Two more students did well on the practice test — useful as a second opinion, not as the only teacher.

PAPER WATCH CRYPTO SHORT (BTC / ETH direction)

Why cite it: EAGLE3 + swarm evidence: tournament SHORT ~67% WR vs LONG ~33% on same n≈216 cohort. Production scanner historically emitted LONG-heavy — structural direction mismatch.
What we did: CRYPTO SHORT flip logic wired in production_scanner.py (EAGLE-4) — still shadow until forward proof; production CRYPTO aggregate PF 0.92 NOT_READY.
Risk: Regime flip in bull markets; must pass regime-robustness (edge in ≥3/4 vol/trend cells).

ELI5Our crypto picks were betting "up" while the winning practice bets were often "down" — we're flipping the bet direction in trials, not betting the farm yet.

PAPER WATCH EQUITY: NVDA, BAC, JPM, MSFT

Why cite NVDA: ~64% tournament WR on resolved picks; high liquidity; appears in confluence / tournament LONG cohorts.
Why MONITOR only: Production EQUITY policy-clean: WR 26.9%, PF 0.33, n=52 — aggregate book fails. Symbol-level tournament win ≠ production emitter quality.
Safe names caveat: Large-cap quality (MSFT, JPM) align with EAGLE3 LONG-only bias in tournament — still not Tier-2 without class-level money_ready.

ELI5Big famous stocks look good on the practice sheet, but our real stock picks as a group are still losing — don't buy NVDA just because the practice league liked it.

PAPER WATCH ETF symbols: EEM, IWM, GLD

Why cite them: EAGLE3 symbol table: high tournament WR on these ETFs in LONG cohort (EEM ~93%, IWM ~75%, GLD ~68% in research digest — tournament sample).
Why not production: Live policy-clean ETF class: n=3 only — statistically meaningless. Promotion path is sleeve-level dual momentum, not single-symbol cherry-pick.

ELI5These three funds did well in practice — but we only have three real grades in class, so we can't declare the whole ETF subject an A student yet.

SHADOW PILOT etf_verified_dual_momentum

Why this is the #1 promotion candidate: Only multi-class lab Tier-2 pass (PF 1.60, WR 53.8%, n=104). Walk-forward OOS PASS (PF 1.21, n=32). Low concentration vs CRYPTO battleground. H-102 pre-registered.
Admit status: FORWARD_PILOT_ONLY — WF pass ✓, money_ready class ✗ (INSUFFICIENT_DATA n=3 live).
Gate to live merge: Forward paper n≥30 (then 100), MDD<15%, live PF ≥0.9× backtest, etf_forward_stats.promotion_ready, then ≤0.5% shadow → scale.

ELI5This is our best homework project — it passed the hard test in class, and now it has to prove itself on a month of real homework before it gets a report card for real money.

SHADOW crypto VWAP reversion + Bollinger MR

Why cite: WF report: vwap_reversion PASS (PF 1.32, n=516 OOS trades); bollinger_mr PASS (PF 1.67, n=38). Hyro pilot sleeves — gated by CRYPTO_VERIFIED_* env flags.
Blockers: Resolver hygiene on CRYPTO; production class NOT_READY. Donchian explicitly FAIL — do not enable.

ELI5Two crypto strategies passed the science fair — but the school's main crypto team is still messy, so they only get to play in the side league until the grades are fixed.

REJECT Raw DB rows with PF>10 (e.g. drawdown_recovery_rsi_xrp)

Why reject: 99% WR / PF 98 on tiny or mis-resolved cohorts — classic resolver/concentration artifacts. Not in policy-clean money_ready layer.

ELI5If someone claims they win 99% of the time, it's probably a scoring error — ignore the trophy case until auditors sign off.

6. Short-term plan (weeks 1–8)

Weeks 1–2 — Measurement honesty

Daily run_eagle_suite.py; zero sizing on NOT_READY classes (capital lock).
Resolver audit: CRYPTO/FOREX EXPIRED+PnL>0; duplicate signal_ts; weekly SQL in Part E of EAGLE doc.
FTP-deploy strategy_admissibility.json + pick-funnel discovery labels.
Emitter census → depromote top battleground / regime_terminal concentration.

ELI5Fix the scoreboard and stop letting broken grades count — run the checklist every morning.

Weeks 3–4 — Admit pipeline

Re-run real walkforward_suite.py on 50webs when source returns; replace restored WF JSON.
strategy_admit.py for every promotion candidate; Bonferroni count from hypothesis registry.
Pick_funnel: all green cells labeled "Discovery — not capital ready."

ELI5One exam rules for every strategy — no shortcuts to the honor roll.

Weeks 5–8 — Forward pilots

ETF dual momentum: forward n→30 then 100; shadow ≤0.5% capital.
Crypto VWAP/Bollinger: shadow after resolver clean; mutation on failed lab sleeves per 3-axis protocol.
Tournament bridge: virtual book for deepseek_v4 only (paper, n≥100 target).

ELI5Let the best homework projects play on a small real stage for two months before the big concert.

7. Long-term plan (90 days – 12 weeks)

North star (quarterly)

≥2 asset classes with live policy-clean PF≥1.5, WR≥50%, n≥100, MONEY_READY verdict.
First expected win: ETF dual momentum (lowest concentration, lab Tier-2).
Resolver dispute rate <1%; portfolio HHI <0.20.
End-to-end admit pipeline ≤5 min per sleeve.

ELI5By summer we want two subjects on the honor roll with a clean permanent record — ETFs likely graduate first.

Weeks 8–12 — Scale discipline

Promote only after two consecutive 4-week windows: live PF within ±10% of backtest.
Gradual scale: shadow → 0.5% → increase only if regime robustness holds (≥3/4 regimes).
CRYPTO aggregate stays frozen until policy-clean PF≥1.0 or structural sleeve replaces bulk emitters.
Grafana-style quant ops dashboard: PF, WR, MDD, HHI, resolver dispute rate (alerts per EAGLE2 §4.6).

ELI5Increase allowance slowly like parents do — only if grades stay good two report cards in a row.

8. Reproduce & daily operations

9. Source documents

After merge: run python3 tools/deploy_audit_files.py --only updates to publish on findtorontoevents.ca.

NFA — engineering and research summary, not investment advice. Stats from local JSON as of 2026-06-02; verify live URLs before trading.

EAGLE2 Swarm Session — Findings, Best Picks & Roadmap

1. Key findings

Research edge ≠ production edge

0/9 asset classes are MONEY_READY

Three different "books" must not be mixed

Concentration & resolver pollution inflate headline stats

Bonferroni / DSR: ~80+ hypotheses tested historically

2. Achievements shipped (live site)

Public documentation on /updates

Audit honesty banners (deploy via FTP after merge)

Production gates (code on main)

3. Tasks accomplished in repo

4. Where profit lives today (ranked)

4. Why the PF page did not mean "empty portfolio"

Live PF books existed; the empty-state message was the misleading part

5. Best picks — detailed rationale

PAPER WATCH deepseek_v4 tournament book

PAPER WATCH gpt4o / grok3 tournament (secondary)

PAPER WATCH CRYPTO SHORT (BTC / ETH direction)

PAPER WATCH EQUITY: NVDA, BAC, JPM, MSFT

PAPER WATCH ETF symbols: EEM, IWM, GLD

SHADOW PILOT etf_verified_dual_momentum

SHADOW crypto VWAP reversion + Bollinger MR

REJECT Raw DB rows with PF>10 (e.g. drawdown_recovery_rsi_xrp)

6. Short-term plan (weeks 1–8)

Weeks 1–2 — Measurement honesty

Weeks 3–4 — Admit pipeline

Weeks 5–8 — Forward pilots

7. Long-term plan (90 days – 12 weeks)

North star (quarterly)

Weeks 8–12 — Scale discipline

8. Reproduce & daily operations

9. Source documents

0/9 asset classes are `MONEY_READY`