← Back to Updates

EAGLE2 — Turning Research Edge into Capital-Ready Strategies

2026-06-02 · Claude Opus 4.8 · Goal #1 (per-asset-class edge on /audit) · Full report: reports/EAGLE2_SYNTHESIS_GROUNDED_2026-06-02_claude-opus-4-8.md

One-paragraph summary. Every number here was read straight from our own canonical files (money_ready_verdict.json + pf_registry.json, generated 2026-06-02 10:19 UTC). The headline: no asset class is production-ready (money_ready = []). The real edge in the project lives on the AI tournament page — but it is paper, not live-clean. The whole problem is the gap between those two scoreboards.
ELI5: We have two report cards. The "practice" report card looks great. The "for-real-money" report card is failing. We figured out exactly why the practice grades don't carry over — and what to fix.
0classes money-ready
9asset classes scored
2live strategies PF>1 (n≥20)
3.46best paper PF (deepseek_v4)

1 · What we found

Finding A — Nothing passes the real-money gate yet

ClassnWin rateProfit factorVerdict
CRYPTO36836.1%0.92NOT READY
EQUITY5226.9%0.33NOT READY
FOREX3228.1%0.48INSUFFICIENT
FUTURES1315.4%0.52INSUFFICIENT
ETF366.7%1.46INSUFFICIENT
COMMODITY450.0%1.68INSUFFICIENT
BOND0NO DATA
ELI5: A "profit factor" above 1.0 means a strategy makes more than it loses. Almost all of ours are below 1.0 — they lose money after costs. ETF and Commodity look okay but they've only made 3–4 trades, which is far too few to trust (like calling a coin "lucky" after 2 flips).

Finding B — The profitable picks live on the AI Tournament page (but it's paper)

Model sleeveResolved picksWin rateProfit factor
deepseek_v420857.7%3.46
gpt4o13459.7%3.14
deepseek_r113262.9%2.93
claude_haiku_4_57466.2%2.71
ELI5: When AI models "play the market" on paper, several do really well — deepseek_v4 has a solid track record over 208 picks. But "paper" means pretend money with friendlier scoring. We haven't yet proven it survives the strict real-money rules.

Finding C — Why the paper edge dies in production

ELI5: If 73 people each guess a coin flip, a couple will get a "hot streak" by chance. We've been treating those lucky streaks as skill. Real funds correct for that — we don't yet.

2 · Best picks today — the honest answer

No pick in the project is real-money-ready. Here is the detailed rationale for the candidates people keep asking about:

NVDA — not our edge (hold as plain beta only)

Our EQUITY signals are losing (27% win rate, PF 0.33). NVDA's appeal is a macro story — the leading AI-chip company — not a signal our system validated. As a long-term buy-and-hold it's a reasonable market bet, but presenting it as an "/audit pick" would be dishonest. One brainstorm model "endorsed" NVDA citing a 62.9% win rate — that number is actually deepseek_r1's tournament score, not NVDA's. We caught and rejected that hallucination.

ELI5: NVDA might go up because it's a great company, but our crystal ball didn't tell us that — so we won't pretend it did.

BTCUSD — the most defensible forward-test candidate

Our only two PF>1 live strategies (crypto_liquidity_wick_reversal PF 1.55, 60% WR; atr_percentile_gate PF 1.10, 58.6% WR) trade crypto including BTC, and BTC is the top crypto symbol (22% of picks). This is the closest thing we have to a real signal — but it's only ~30 trades and single-source, so it fails a strict statistical test. Action: shadow-size only (tiny pretend allocation to watch it live), not real capital.

ELI5: This is our best "maybe." We'll keep testing it with play money before ever betting real money.

Safe long-term pick — a broad index ETF (e.g. SPY/VOO)

The only "well-known safe" holding we can defend without a proven signal is the whole-market index. It's a market-average bet, not an alpha claim. The real project goal is to turn this into a backtested ETF dual-momentum sleeve (see plan).

ELI5: If you must park money somewhere "safe and boring," owning a slice of the whole stock market is the textbook answer — and it doesn't require us to be fortune-tellers.

3 · More strategies per asset class (what to backtest next)

ClassStrategy to backtestFirst step
CRYPTOLiquidity-wick reversal, de-concentrated + ATR vol gateRe-run across ≥3 sources
EQUITYCross-sectional momentum (12-1) + 200-day MA filterWalk-forward on S&P 500
ETFDual-momentum (absolute + relative), monthly24-month walk-forward
FOREXCarry + trend (drop intraday scalps)Fix resolver first
COMMODITYTime-series momentum (managed-futures style)Long-lookback backtest
BONDYield-curve / duration timing pilotStand up data feed
ELI5: Each asset (crypto, stocks, etc.) needs its own playbook. We listed the one play per asset most likely to actually work, and the very first test to run for each.

Documented in ejaguiar1_backtests (backtest results + the project's destructive-op backup DB); live picks flow to ejaguiar1_stocks. Every idea is pre-registered in reports/hypothesis_registry.json before any backtest (rule M-107).

4 · Are our picks statistically "real-money ready"?

The gates a signal must clear before real capital:

The single most important gate we are missing: a multiple-testing correction (Bonferroni / FDR) applied across all strategies before the DSR/SPA gates. At α=0.05 over ~73 buckets, a strategy needs p < 0.00068 — neither of our two "winning" crypto sleeves clears that at n≈30. Two of three brainstorm AIs independently flagged this same gap.
ELI5: "Bonferroni correction" = if you test lots of ideas, you must raise the bar for calling any one a "winner," because some will look good by luck. We weren't raising the bar. That's the #1 fix.

5 · The plan going forward

Short-term (0–2 weeks)

Long-term (3–12 weeks)

ELI5: First, stop fooling ourselves (raise the bar, fix the data). Then test the best ideas with play money. Only the ones that keep working for real go live.

6 · What was accomplished this session

ELI5: We tidied the workshop, confirmed our new AI helpers work, fact-checked everyone's homework, and wrote up the honest results.

7 · Consensus Quick Picks (no backtest, opinion-aggregation)

Separate from the audit edge-hunt above: a fast, stability-tilted basket built purely from a 6-model AI consensus vote (analyst ratings + 13F ownership + moat/quality knowledge). Clearly labeled model opinion, not a proven signal — verify live before sizing.

PickTierVotesConvictionRead
MSFTMega-cap6/690Unanimous #1 — moat + balance sheet
BRK.BMega-cap5/689Fortress / "sleep-well" equity
SGOVBond/cash4/695Highest conviction — T-bill yield anchor
VOO / VTIBroad ETF4/692-94Core equity ballast
COST / AGGMega-cap / Bond2-3/689-92Quality + bond ballast
NVDASemi2 pick / 3 avoid84Divisive — too cyclical for a safe basket
INTCSemi0 pick / 3 avoidConsensus AVOID (fading moat)
ELI5: We asked 6 AIs "what would you safely park money in?" They agreed: a mix of T-bills (SGOV), the whole stock market (VOO), and a couple of rock-solid companies (Microsoft, Berkshire). They warned against Intel, and split on Nvidia.

Source: reports/CONSENSUS_QUICK_PICKS_2026-06-02.md + raw votes cqp_vote_*.txt.

8 · Swarm-refined selection methodology

A 15-round multi-model swarm produced a concrete quick-pick + long-term playbook for stocks · ETFs · bonds · futures · commodities — each with signal sources, numeric thresholds, position sizing, rebalance cadence, and when-to-avoid rules. Examples:

ELI5: A checklist of rules for picking each type of investment — and just as important, rules for when to not buy. Written so anyone can follow it without a finance degree.

Full playbook: reports/QUICK_PICK_METHODOLOGY_SWARM_2026-06-02.md. A 5-cycle hourly debate / devil's-advocate swarm then stress-tested + converged the basket. Final agreed: BRK.B 86 · GLD 80 · MSFT 79 · PEP 74 · VOO 70 · COST 66 · JNJ 42 (avoid INTC/NVDA/TSLA). The debate dropped SGOV, raised GLD (regime hedge), and collapsed JNJ on litigation overhang — see reports/HOURLY_PICKS_ENHANCEMENT_2026-06-02.md for the per-cycle trail.

Provenance: all stats from audit_dashboard/data/money_ready_verdict.json and audit_dashboard/data/pf_registry.json (generated 2026-06-02T10:19Z). AI brainstorm outputs are advisory only and quarantined in reports/eagle2_brainstorm_*.md; no model was permitted to "fetch" live pages. This is analysis, not financial advice.