EAGLE2 Swarm Session — Findings, Best Picks & Roadmap

Date: 2026-06-02 · Goal #1 (/audit phenomenal per-class edge) · ← Updates · /audit · Tournament · Research index · PF example

One-line verdict: Production /audit has zero money-ready asset classes today. Real edge lives in AI tournament (paper) and verified lab sleeves (shadow) — not in the main policy-clean book. We shipped measurement honesty, a daily operator bundle, walk-forward gates, and H-102 pre-registration for ETF dual momentum.
Do not size live capital from tournament rank, pick-funnel green cells, or raw DB strategy rows with PF>10 until they pass policy-clean + Bonferroni/DSR + forward n≥100.

1. Key findings

Research edge ≠ production edge

Lab and tournament books can show PF 2–6 while the live policy-clean aggregate loses money (CRYPTO PF 0.92, EQUITY PF 0.33). Hundreds of weak emitters dilute any sub-cohort that works in isolation.

ELI5We have a great recipe in the test kitchen, but the restaurant is still serving yesterday's burnt batch to paying customers.

0/9 asset classes are MONEY_READY

Authoritative layer: audit_dashboard/data/money_ready_verdict.json (policy-clean, post-resolver). CRYPTO and EQUITY are NOT_READY; ETF/FOREX/etc. lack sample size or fail PF gates.

ELI5Our report card says "not ready for real money" on every subject — we shouldn't increase bet size yet.

Three different "books" must not be mixed

BookURL / sourceUse for sizing?
Production/audit + money_ready_verdictOnly this one (today: none)
Tournament paperai-tournamentPaper watch
Lab + forward pilotverified_strategies/Shadow ≤0.5%
ELI5Three scoreboards measure different games — only the main audit scoreboard counts for real money, and it's empty right now.

Concentration & resolver pollution inflate headline stats

Single sources (e.g. battleground_luxalgo, mega_mutation) dominate CRYPTO volume. EXPIRED→WON labels and duplicate signal timestamps distort WR/PF — especially FOREX (high WR, PF<1).

ELI5One loud friend is skewing the class average, and some grades were marked wrong — fix the grading before trusting the GPA.

Bonferroni / DSR: ~80+ hypotheses tested historically

Family-wise α=0.05 ⇒ per-test α_adj ≈ 0.000625. Raw green funnel cells often fail after SPA/DSR — that's expected, not a bug.

ELI5If you try 80 lottery numbers, one will look lucky by accident — we need stricter math before calling it skill.

2. Achievements shipped (live site)

Public documentation on /updates

ELI5We posted a report card and watch list on the website so nobody has to read 60 markdown files to understand what is safe to trade.

Audit honesty banners (deploy via FTP after merge)

ELI5Warning labels on practice-game pages so practice wins are not mistaken for permission to bet real money.

Production gates (code on main)

ELI5The pick robot flips bad crypto directions and ignores noisy strategies before picks fight for slots.

3. Tasks accomplished in repo

ELI5We built homework, grading rules, a daily checklist, and warning signs — graduation is still ahead.

4. Where profit lives today (ranked)

RankSurfaceEdge?Evidence
1AI tournament + pf.htmlBest paperdeepseek_v4 WR 57.7% PF 3.46 n=208; 66/81 portfolios have opens
2ETF dual momentum (lab)Shadow pilotLab PF 1.60 n=104; OOS WF PF 1.21 n=32; live ETF n=3
3Crypto VWAP / Bollinger (lab)ShadowWF PASS; production CRYPTO NOT_READY
4pick_funnelDiscovery onlyCells often fail concentration / SPA
5ai_leaderboardStale / thinSwarm book ≠ production; index often dated
6/audit policy-cleanNo deployable edge0 money-ready classes
ELI5The practice games look great; the real game scoreboard is still losing — we're practicing the right plays before putting them in the real game.

4. Why the PF page did not mean "empty portfolio"

Live PF books existed; the empty-state message was the misleading part

The cited example book was deepseek_v4__aggressive. Live investigation found the portfolio system was populated:

CheckResultImplication
PF roster export81 portfoliosThe PF universe was present on the live server.
Books with open positions66The UI was not blank because the system had no active portfolios.
deepseek_v4__aggressive11 open namesThe user's example book was alive and non-empty.

The real failure mode was ambiguity. When detail JSON lookup failed, the older page implied "first daily run pending" instead of distinguishing a bad ?key=, invisible pasted Unicode, stale deploy, or roster/detail mismatch.

ELI5The bookshelf had books on it. The sign on the shelf was wrong and made it look empty.

5. Best picks — detailed rationale

Each pick below is tagged PAPER (tournament), SHADOW (lab forward pilot), or MONITOR (interesting symbol, production book weak). None are approved for full capital sizing today.

PAPER WATCH deepseek_v4 tournament book

Why cite it
#1 on ai_tournament_leaderboard.json with n=208 resolved (passes min_n≥30). Wilson-interval WR 57.7%, bootstrap PF 3.46 — strongest stable paper track among models.
Why not size production
Tournament picks are a different universe than at_raw_picks policy-clean book. No walk-forward + money_ready convergence. Portfolio deepseek_v4__aggressive has 11 opens — book is alive, not empty.
What would promote it
Virtual forward book fed only from tournament picks → n≥100 → same admissibility pipeline as lab → live PF within ±10% of backtest for 8 weeks shadow.
ELI5This AI's practice trades win a lot — we watch it like a sports draft prospect, but don't pay it like a pro until it wins on the same field as everyone else.

PAPER WATCH gpt4o / grok3 tournament (secondary)

Why cite it
gpt4o: WR 59.7%, PF 3.14, n=134. grok3: n=303, PF 2.29 — depth for cross-model agreement.
Why secondary
Same tournament≠production gap. Use for triangulation, not sole signal.
ELI5Two more students did well on the practice test — useful as a second opinion, not as the only teacher.

PAPER WATCH CRYPTO SHORT (BTC / ETH direction)

Why cite it
EAGLE3 + swarm evidence: tournament SHORT ~67% WR vs LONG ~33% on same n≈216 cohort. Production scanner historically emitted LONG-heavy — structural direction mismatch.
What we did
CRYPTO SHORT flip logic wired in production_scanner.py (EAGLE-4) — still shadow until forward proof; production CRYPTO aggregate PF 0.92 NOT_READY.
Risk
Regime flip in bull markets; must pass regime-robustness (edge in ≥3/4 vol/trend cells).
ELI5Our crypto picks were betting "up" while the winning practice bets were often "down" — we're flipping the bet direction in trials, not betting the farm yet.

PAPER WATCH EQUITY: NVDA, BAC, JPM, MSFT

Why cite NVDA
~64% tournament WR on resolved picks; high liquidity; appears in confluence / tournament LONG cohorts.
Why MONITOR only
Production EQUITY policy-clean: WR 26.9%, PF 0.33, n=52 — aggregate book fails. Symbol-level tournament win ≠ production emitter quality.
Safe names caveat
Large-cap quality (MSFT, JPM) align with EAGLE3 LONG-only bias in tournament — still not Tier-2 without class-level money_ready.
ELI5Big famous stocks look good on the practice sheet, but our real stock picks as a group are still losing — don't buy NVDA just because the practice league liked it.

PAPER WATCH ETF symbols: EEM, IWM, GLD

Why cite them
EAGLE3 symbol table: high tournament WR on these ETFs in LONG cohort (EEM ~93%, IWM ~75%, GLD ~68% in research digest — tournament sample).
Why not production
Live policy-clean ETF class: n=3 only — statistically meaningless. Promotion path is sleeve-level dual momentum, not single-symbol cherry-pick.
ELI5These three funds did well in practice — but we only have three real grades in class, so we can't declare the whole ETF subject an A student yet.

SHADOW PILOT etf_verified_dual_momentum

Why this is the #1 promotion candidate
Only multi-class lab Tier-2 pass (PF 1.60, WR 53.8%, n=104). Walk-forward OOS PASS (PF 1.21, n=32). Low concentration vs CRYPTO battleground. H-102 pre-registered.
Admit status
FORWARD_PILOT_ONLY — WF pass ✓, money_ready class ✗ (INSUFFICIENT_DATA n=3 live).
Gate to live merge
Forward paper n≥30 (then 100), MDD<15%, live PF ≥0.9× backtest, etf_forward_stats.promotion_ready, then ≤0.5% shadow → scale.
ELI5This is our best homework project — it passed the hard test in class, and now it has to prove itself on a month of real homework before it gets a report card for real money.

SHADOW crypto VWAP reversion + Bollinger MR

Why cite
WF report: vwap_reversion PASS (PF 1.32, n=516 OOS trades); bollinger_mr PASS (PF 1.67, n=38). Hyro pilot sleeves — gated by CRYPTO_VERIFIED_* env flags.
Blockers
Resolver hygiene on CRYPTO; production class NOT_READY. Donchian explicitly FAIL — do not enable.
ELI5Two crypto strategies passed the science fair — but the school's main crypto team is still messy, so they only get to play in the side league until the grades are fixed.

REJECT Raw DB rows with PF>10 (e.g. drawdown_recovery_rsi_xrp)

Why reject
99% WR / PF 98 on tiny or mis-resolved cohorts — classic resolver/concentration artifacts. Not in policy-clean money_ready layer.
ELI5If someone claims they win 99% of the time, it's probably a scoring error — ignore the trophy case until auditors sign off.

6. Short-term plan (weeks 1–8)

Weeks 1–2 — Measurement honesty

ELI5Fix the scoreboard and stop letting broken grades count — run the checklist every morning.

Weeks 3–4 — Admit pipeline

ELI5One exam rules for every strategy — no shortcuts to the honor roll.

Weeks 5–8 — Forward pilots

ELI5Let the best homework projects play on a small real stage for two months before the big concert.

7. Long-term plan (90 days – 12 weeks)

North star (quarterly)

ELI5By summer we want two subjects on the honor roll with a clean permanent record — ETFs likely graduate first.

Weeks 8–12 — Scale discipline

ELI5Increase allowance slowly like parents do — only if grades stay good two report cards in a row.
Success definition (EAGLE2): Deployable edge = live PF≥0.5 with controlled WR, forward n≥30–50 per class, DSR/SPA pass, shadow evidence — then scale. Tournament rank alone never qualifies.

8. Reproduce & daily operations

# Daily operator bundle
python3 tools/run_eagle_suite.py

# Best-picks evidence + LiteLLM review
python3 tools/verify_best_picks_swarm.py

# Per-strategy gates
python3 tools/strategy_admit.py --strategy etf_dual_momentum --asset-class ETF
python3 tools/strategy_admit.py --strategy vwap_reversion --asset-class CRYPTO

# Deploy live audit JSON (requires FTP env)
python3 tools/deploy_audit_files.py --only updates

9. Source documents

After merge: run python3 tools/deploy_audit_files.py --only updates to publish on findtorontoevents.ca.

NFA — engineering and research summary, not investment advice. Stats from local JSON as of 2026-06-02; verify live URLs before trading.