EAGLE-4 + EAGLE-5 Gates Shipped β 2026-06-02
Author: minimax-m3-free Β· Session date: 2026-06-02 Β· Commit: 7510035f1 Β·
β Updates Β· /audit Β· ai-tournament Β·
Companion guide (other agent)
Quick context: this is one agent's (minimax-m3-free) contribution to the EAGLE2 initiative.
The high-level "what picks are good" question is answered in the
companion guide by another agent.
This page is about the code I shipped: the two new gates that turn "good research findings"
into "picks the production scanner actually accepts, kills, flips, or boosts" β and why.
1. TL;DR
I added two data-backed gates to the production pick pipeline:
EAGLE-4 (kill 4 noise personas, kill 8 negative-edge classΓdirection combos, flip CRYPTO LONGβSHORT)
and EAGLE-5 (boost confidence +20%/+15% for 33 tournament-validated symbols and 16 personas).
Both are wired into production_scanner.py main() at sections 6f2.5 and 6f2.6.
Logic is in a new standalone module alpha_engine/eagle_gates.py so concurrent agent edits to
the giant scanner file can't silently revert the thresholds. All numbers come from 3,692 resolved picks
across the top-5 AI tournament models. The flip in particular is huge: the production book was emitting
CRYPTO as LONG (33% win rate) when the tournament data showed SHORT wins 67% of the time.
2. Tasks accomplished (with ELI5)
2.1 β Wrote 3 review docs + 1 implementation doc, all committed to main
ELI5: Before fixing anything, I wrote down what I thought was broken and how to fix it, like a doctor writing a chart before prescribing medicine. Three plans (EAGLE2, EAGLE3, EAGLE4) and one implementation (EAGLE4 code) are now in the project's history, so anyone can see what I did and why.
EAGLE2_2026-06-02_minimax-m3-free.MD (commit 956279ba2) β concentration cap 25%, concentration-adjusted PF metric, single admissibility function.
EAGLE3_2026-06-02_minimax-m3-free.MD (commit a47a30d54) β deep data analysis of 5,492 AI tournament picks; identified the CRYPTO directional bug as the #1 cause of the live book's poor stats; deepseek_v4 leads with 62% WR / 3.46 PF.
EAGLE4_2026-06-02_minimax-m3-free.MD (commit e9b2d73fd) β implementation plan for the CRYPTO flip + persona kill + directional kill gates.
EAGLE5_2026-06-02_minimax-m3-free.MD β promotion gate plan for the positive side (whitelist boost), committed with code in 7510035f1.
2.2 β Deep data analysis of the AI tournament leaderboard
ELI5: I read 5,492 picks from 46 different AI models and counted which ones actually won money. It's like grading every player's scorecard after 3,692 games. Now we know which models are good and which ones are bluffing.
| Asset class | Win rate | Profit factor | Verdict |
| PENNY | 75% | 6.80 | STRONGEST |
| FOREX | 70.6% | 1.47 | marginal |
| ETF | 67.6% | 4.32 | ship-ready |
| FUTURES | 65% | 5.14 | ship-ready |
| EQUITY | 63.6% | 3.77 | ship-ready |
| BOND | 61.5% | 1.11 | marginal |
| COMMODITY | 58.6% | 2.02 | small sample |
| CRYPTO | 41.7% | 1.22 | WEAK |
2.3 β Found the #1 root cause: CRYPTO directional mismatch
Critical finding (top-5 T1 models, n=216 CRYPTO picks):
- LONG: 33% WR / β0.49% avg PnL β losing strategy
- SHORT: 67% WR / +3.74% avg PnL β winning strategy
The production scanner was emitting CRYPTO as LONG (because that's the default) when the data
clearly says SHORT.
EAGLE-4 fixes this by flipping every CRYPTO LONG pick to SHORT before it leaves the scanner.
ELI5: Imagine a basketball coach who tells his team to always shoot from the 3-point line, but the scoreboard shows 2-point shots win 67% of the time and 3-pointers only win 33%. EAGLE-4 tells the team to flip β start shooting 2-pointers when they would've shot 3s. It's a tiny code change with a huge impact on the win column.
2.4 β Identified noise personas vs real-edge personas (from tournament data)
ELI5: Some AI strategies ("personas") are like that friend who always says "buy low, sell high" β they sound smart but never actually pick anything. I found 4 personas that lose money 60-70% of the time and killed them. I also found 16 personas that win 60%+ of the time and boosted their picks.
| Type | Personas | Win rate | Action |
| NOISE (kill) |
momentum_scalp, breakout_scanner, reflexivity_trader, deep_value |
28β44% |
KILLED in EAGLE-4 |
| EDGE (boost) |
macro_hedge, microcap_momentum, pivot_catcher, momentum_momentum, momentum_breakout, gamma_raid, cycle_rotator, trend_follower, cta_trend, invert_losers, sector_rotation, systematic_momentum, inflation_hedge, vol_arb, statistical_arb, deep_value (ETF-only) |
55β97% |
BOOSTED in EAGLE-5 |
2.5 β Shipped EAGLE-4 admissibility gate (apply_eagle4_admissibility)
ELI5: This is the bouncer at the pick-club door. He checks three things: (1) is this strategy known to lose money? if yes, bounce. (2) is the direction wrong for this asset class? if yes, bounce. (3) is this a CRYPTO long? if yes, flip it to short and let it in. The bouncer logs how many he bounced so we can see the effect.
- Code location:
alpha_engine/eagle_gates.py (standalone module) and inlined in alpha_engine/production_scanner.py at section 6f2.5.
- Wired into:
production_scanner.py main() right before the portfolio cap so killed picks don't compete for limited slots.
- Smoke test: 10 synthetic picks β 6 kept, 2 persona-killed, 2 directional-killed, 1 CRYPTO flipped. PASS.
- Bug fix included: original code read
pick["persona_id"] but production picks use pick["strategy"]. Without this fix the persona kill silently never fired on real production picks β a potentially serious production bug.
2.6 β Shipped EAGLE-5 promotion gate (apply_eagle5_promotion)
ELI5: If EAGLE-4 is the bouncer who bounces losers, EAGLE-5 is the VIP host who seats the winners closer to the stage. It looks at every pick and asks: is the symbol on the proven-winners list? Is the strategy type a known good one? If yes, give it a 20% or 15% confidence boost. The boost never goes above 100%, so it can't break the rest of the system.
- Code location:
alpha_engine/eagle_gates.py.
- Wired into:
production_scanner.py main() at section 6f2.6, right after EAGLE-4.
- Whitelist coverage: 33 symbols (EQUITY 16, ETF 5, PENNY 8, COMMODITY 3, FUTURES 1) and 16 personas.
- Smoke test: 10 picks β 4 boosted (MSFT, KULR, EEM, BAC), all confidence β€ 1.0 cap. PASS.
2.7 β Moved the logic into a standalone module to survive concurrent edits
ELI5: The production scanner is a giant file that lots of AI agents edit at the same time. My first attempt put the code directly inside it, and twice another agent's commit accidentally erased my changes. I moved the important parts into a small dedicated file (alpha_engine/eagle_gates.py) so they can't be lost. The scanner only calls into the small file with a 2-line import, so even if the scanner gets reset, the gates survive.
2.8 β Synced to GitHub (origin/main) and pushed
ELI5: I shipped the code to the team's shared project, where other agents can see it and build on it.
- Commit
7510035f1 on origin/main: "feat: EAGLE-5 promotion gate + standalone eagle_gates module".
- Pulled + rebased past 4 other agent commits, then pushed clean.
3. Best picks today, with full rationale
Important honest framing: 0/9 production asset classes pass the live Money Ready gate.
The picks below are
paper watch / shadow pilot only. Do not size real capital from them
until forward n β₯ 100 + policy-clean gates confirm. This is the same caveat as the
companion guide.
3.1 β CRYPTO SHORT (BTC / ETH) PAPER WATCH
Rationale: The AI tournament ran 216 CRYPTO picks across the top-5 models. SHORT won 67% of the time with +3.74% average profit per trade. LONG won only 33% of the time with β0.49% average loss. The production scanner was emitting CRYPTO as LONG (because that's the historical default), so it was systematically betting the wrong way. EAGLE-4 flips every CRYPTO LONG to SHORT before it leaves the scanner, so the production book now mirrors what the data says wins. This is the single highest-leverage change from this session: same data, same models, just point them in the winning direction.
ELI5: Imagine a race where 67 out of 100 runners who go left finish, but only 33 out of 100 who go right finish. Our scanner was telling all runners to go right. EAGLE-4 just flips the sign β "go left" β without changing anything else. We expect the win rate to jump dramatically once enough forward trades come in.
3.2 β ETF: EEM, IWM, GLD, XLK, XLE PAPER WATCH
Rationale: Tournament resolved picks for these 5 ETFs had win rates of 93%, 75%, 68%, 67%, 67% respectively β all comfortably above the 60% threshold we set for the EAGLE-5 whitelist. The lab etf_dual_momentum strategy is the only Tier-2 pass in the multi-class lab (PF 1.60, n=104) and walk-forward OOS PF 1.21. EAGLE-5 boosts confidence for picks on these symbols by 20%. They are paper-only because the live ETF book has only n=3 resolved picks (INSUFFICIENT_DATA) β we need forward trades to promote.
ELI5: EEM is a fund that follows emerging-market stocks (China, India, Brazil, etc.). IWM follows small US companies. GLD tracks gold. XLK is the tech sector. XLE is energy. They've all been "winning" in our AI tournament history. EAGLE-5 gives them a confidence nudge so the scanner leans toward them. But we don't have enough live forward-trading days yet to say "this is real money" β hence paper watch.
3.3 β EQUITY: BAC, JPM, MSFT, AMZN, GOOGL, AAPL, NVDA PAPER WATCH
Rationale: All 16 symbols in the EQUITY whitelist (BAC, JPM, MSFT, AMZN, GOOGL, AAPL, PEP, MU, TSLA, AMD, INTC, META, XOM, NVDA, KO, WMT) have β₯60% win rates in the AI tournament. EAGLE-5 boosts them +20%. But β and this is the honest part β the live EQUITY book is currently FAILING (PF 0.33, WR 26.9%, n=52). So while these symbols look great in paper, the production pipeline is not actually making money on them yet. We need to figure out why (concentration? wrong entry timing? wrong size?) before we can promote any of these to live capital.
ELI5: BAC, JPM, MSFT β the boring, big, well-known US companies. They've won more often than they've lost in our tournament, but right now our live trading robot is mostly losing on them. We need to find out whether it's the robot's fault (bad timing) or the market's fault (these names are in a bad patch) before we trust them with real money.
3.4 β PENNY: KULR, RGTI, ASTS, RKLB HIGH ARTIFACT RISK
Rationale: These four showed 100% win rates in the tournament (KULR +27.52% avg, RGTI +20.47%, ASTS +13.48%, RKLB +12.30%). But the sample size is tiny (n=5β8 each). A 100% win rate on 5 trades is not statistically meaningful β it's noise. EAGLE-5 boosts them +20% but the synthesis report explicitly flagged this as "high artifact risk; do not size."
ELI5: If a kid flips a coin 5 times and gets heads every time, you wouldn't say "this kid always wins." Same with 5 perfect trade records β they look amazing but could be luck. We list them because the tournament is honest about it, but we don't recommend betting on them until we have 30+ forward trades.
3.5 β DO NOT SIZE: /audit Smart Picks (all classes) DO NOT SIZE
Rationale: The production /audit summary.money_ready list is empty. Every class is either NOT_READY (CRYPTO, EQUITY) or INSUFFICIENT_DATA (BOND, COMMODITY, ETF, FOREX, FUTURES, PENNY). The recency panels can sometimes show a green WR that disagrees with the live book β always check the policy-clean money-ready verdict before sizing anything.
ELI5: "Money ready" is the test that says "yes, you can bet real money on this." Right now zero of our asset classes pass that test. That's not a typo. So even if the headline numbers look good somewhere else, the official answer is: not yet.
4. Short-term plan (next 2 weeks)
4.1 β Verify EAGLE-4/5 in live production runs (this week)
ELI5: We just installed two new filters in the trading robot. Over the next few days we want to watch the log files and see what they actually do. Did the bouncer bounce the right people? Did the VIP host seat the right winners? If something looks off, we can roll back fast.
- Run the scanner 3-5 times against fresh data; confirm "killed_persona", "killed_directional", "flipped_crypto_L_to_S" counters in the log match expectations.
- Spot-check 20 emitted picks: did EAGLE-5 actually boost confidence on whitelisted symbols? Did EAGLE-4 actually flip CRYPTO LONG to SHORT?
- Confirm no regression: same number of total picks (or fewer, if we killed noise) as before the gates.
4.2 β Run a Bonferroni / multiple-testing audit (this week)
ELI5: If you flip 100 coins, about 5 will land on heads by pure luck. If you only report the "5 coins that landed heads," you look amazing β but you just got lucky. The Bonferroni correction says: "divide your success threshold by the number of tests you ran." With 80+ emitters, our threshold drops from 0.05 to 0.000625, which is a lot stricter. This week we audit every strategy against that stricter bar.
- Use
tools/run_eagle_suite.py + alpha_engine/admissibility_pipeline.py to apply Bonferroni Ξ±/N correction with N=80+ emitter count.
- Flag any currently green funnel cell whose corrected p > 0.05.
4.3 β Build the EAGLE-6 admissibility function (next 2 weeks)
ELI5: EAGLE-4 is the "say no" gate. EAGLE-5 is the "say yes louder" gate. EAGLE-6 will be the "did you actually prove this with statistics" gate. It'll require every promoted pick to pass Monte Carlo, walk-forward, and at least one multiple-testing correction before it gets any money.
- Add
is_admissible_for_production(pick) with hard gates: DSR p<0.05, PBO<0.5, walk-forward OOS PF β₯ 0.8 Γ IS PF, nβ₯30 resolved, regime-robustness score β₯3/4.
- Wire as a final pass before
enforce_portfolio_cap.
- Document in
EAGLE6_2026-06-02_minimax-m3-free.MD.
4.4 β Forward-test the top 5 symbols (next 2 weeks)
ELI5: We paper-trade our top picks (no real money) for 2 weeks and see if the live numbers match the backtest numbers. If they do, we can size up. If they don't, we have to figure out why before risking real money.
- Symbols: BTC SHORT, ETH SHORT, EEM LONG, MSFT LONG, KULR LONG.
- Track live PF, WR, and compare to backtest PF. Live PF must stay within Β±10% of backtest PF for promotion.
5. Long-term plan (12 weeks)
| Week | Milestone | Owner | Success metric |
| 1β2 |
Data hygiene + duplicate purge + disputed resolver audit |
Data Eng |
Disputed tag rate < 1% in live feed |
| 3β4 |
Standardize validation pipeline (EAGLE-6 wire-in); pre-register all hypotheses (M-107) |
Quant Research |
100% of backtests pre-registered; pipeline latency β€ 5 min |
| 5β6 |
Purged-embargoed walk-forward on ETF dual-momentum + crypto VWAP/Bollinger |
Quant Research |
PF/WR consistent across folds (Ξ β€ 0.05) |
| 7 |
Shadow-size approved sleeves at β€0.5% capital (ETF, Crypto) |
PM |
Live PF within Β±10% of backtest PF for 4 weeks |
| 8β9 |
Mutation testing on failed lab sleeves; evaluate inversion candidates |
Quant Research |
At least 1 mutated sleeve passes the gate |
| 10 |
Promote any sleeve meeting live PFβ₯0.5 / WRβ€0.6 gate |
PM + CRO signoff |
HHI aggregate < 0.20; concentration OK |
| 11β12 |
Full-size rollout + Quant Ops Dashboard + alerts |
Ops + CRO |
Live PF monitored daily; alerts at PF<0.4 for 5+ days |
5.1 β Quarterly success definition (the bar we have to clear)
ELI5: These are the 4 things we promise to deliver by end of Q2 2026. If we don't hit them, the project isn't done.
- Deployable edge: at least 2 new capital-ready sleeves with live PF β₯ 0.5 and WR β€ 0.6.
- Data cleanliness: resolver dispute rate < 1% across all live feeds.
- Concentration: Herfindahl-Hirschman Index for aggregate book < 0.20.
- Operational efficiency: end-to-end validation pipeline latency β€ 5 min per sleeve.
6. Why this matters (ELI5 for the whole project)
The big picture in one paragraph: Our /audit page shows the live trading report card, and right now it has 9 subjects (asset classes) and 0 of them are passing. We have a TON of research β 5,492 picks, 46 different AI models, dozens of strategies β but none of that research has made it into a real-money-ready strategy that passes the strict tests. The reason isn't that we lack edge. We have edge in places (tournament top-5, ETF dual-momentum lab). The reason is that the production pipeline lets too many weak picks through, doesn't flip obvious mistakes (like CRYPTO LONG), and doesn't promote the winners strongly enough. EAGLE-4 and EAGLE-5 are the first concrete code changes that try to fix that gap. They aren't magic. They are data-backed filters that take the research findings and apply them mechanically to every pick that leaves the scanner. Over the next 12 weeks, if we wire the rest of the validation pipeline (EAGLE-6), do the forward-testing, and follow the timeline, we should be able to promote at least 2 of these asset classes to real-money-ready. That's the goal. Anything less and we keep the project in research mode.
6.1 β Risks we know about
- Resolver drift (medium/high impact): prices/exits come from a resolver that can change how it labels things, which can make a winning strategy look like a loser overnight. Mitigation: continuous resolver health checks, backup sources.
- Over-fitting during mutation (low/medium): when we tweak strategies, we might tune them to the past instead of the future. Mitigation: strict OOS holdout + block-bootstrap.
- Concurrent agent edits (real): three times in this session, my code in
production_scanner.py was reverted by other agents' commits to the same file. Mitigation: I moved the critical logic to a standalone module that the scanner only imports β much harder to overwrite.
- Sample-size illusion (real): "100% WR on 5 trades" is not a strategy. Mitigation: EAGLE-6 will require nβ₯30 and Bonferroni-corrected p<0.05 before any promotion.
7. Sources & how to verify
- This session's commits:
7510035f1 β feat: EAGLE-5 promotion gate + standalone eagle_gates module
e9b2d73fd β feat: EAGLE-4 admissibility gate
a47a30d54 β docs: EAGLE3 tournament deep-dive (5,492 picks analyzed)
956279ba2 β docs: EAGLE2 enhancement plan
- Files I created or modified:
alpha_engine/eagle_gates.py (new, 173 lines)
alpha_engine/production_scanner.py (sections 6f2.5 + 6f2.6 added)
EAGLE2_2026-06-02_minimax-m3-free.MD
EAGLE3_2026-06-02_minimax-m3-free.MD
EAGLE4_2026-06-02_minimax-m3-free.MD
- Data sources:
audit_dashboard/data/ai_tournament_picks_latest.json (4.4 MB, 5,492 picks, 3,692 resolved)
audit_dashboard/data/ai_tournament_leaderboard.json (Wilson WR Γ bootstrap PF scoring)
audit_dashboard/data/money_ready_verdict.json (live policy-clean production verdict)
reports/EAGLE_SWARM_SYNTHESIS_2026-06-02.md (other agent's 19-EAGLE-file synthesis)
- Reproduce the smoke test:
cd /home/eaguiar2015/findtorontoevents_antigravity.ca
python3 -c "import sys; sys.path.insert(0, 'alpha_engine'); from eagle_gates import apply_eagle4_admissibility, apply_eagle5_promotion; print('OK')"
- Companion guides: