| System | Slash commands | Code root | Purpose | Writes code? |
|---|---|---|---|---|
/swarm | /swarm run, followup, pr-review,
second-opinion, invent | tools/swarm/ |
General multi-engine fan-out — consult N LLM vendors on one prompt, consensus, red-team | Analysis by default; only the code_implementer persona writes |
/swarm-ruflo | audit, github, strategy,
bugs, wizard | .ruflo/ |
Hermes orchestrator — audit / research / hygiene swarms | No — JSON findings only |
/swarmv2-* | /swarmv2-coding, -pr-review,
-actions, -research, -ensemble, -hierarchical |
tools/swarm_v2/ | Enhanced typed swarms — coding pipeline, PR review, CI audit, research, ensemble/hierarchical decision | Yes — real LLM source + tests |
/swarm second-opinion/swarmv2-coding/swarm-ruflo audit|bugs|github/swarm pr-review/swarmv2-researchCode: tools/swarm/ — swarm_run.py (fan-out),
worker_runner.py (single-engine worker), swarm_followup.py (multi-turn chain),
api_consult.py (HTTP API caller), safety.py (read-only allowlist).
| Command | Usage |
|---|---|
/swarm run | /swarm run <prompt-file> [engine,engine,…] — fan one prompt to N engines. Default deepseek,xai,kilo. |
/swarm followup | /swarm followup <yaml-config> — multi-turn chain, one engine. |
/swarm second-opinion | /swarm second-opinion <question> — quick 3-engine consensus. |
/swarm pr-review | /swarm pr-review [PR#|all|open] — 3 specialists per PR + consensus. |
/swarm invent | /swarm invent <problem-file> — bootstrap custom personas. |
/swarmwithprework | /swarmwithprework <task> — 4-phase pre-work→brainstorm→synthesis→QA. |
API: deepseek, xai, cerebras, inception,
openrouter, groq, gemini_api, pollinations (keyless),
nous, ollama_*. CLI: claude, gemini,
kilo, opencode, copilot, agent (Cursor).
Presets: consensus-3, fast-cheap, deep-strict, non-opus-4.
name: my_run_${TS}
prompt_file: swarm_runs/briefing_my_task.md
out_dir: swarm_runs/run_${TS}
max_parallel: 4
preset: consensus-3 # OR an explicit engines: list
red_team: true # adds a claude-opus red-team pass
cost_cap_usd: 5.0
Code: .ruflo/orchestrator.py, wizard.py, agents/*.yaml.
/swarm-ruflo [audit|github|strategy|bugs|wizard] [--tier free|paid|hybrid]
| Subcommand | Agents | Action |
|---|---|---|
audit | audit-researcher + audit-quant | Dashboard — forward-WR, stale strategies, leakage |
bugs | bug-hunter | Hardcoded paths, SQL injection, races, key leaks |
github | github-hygiene | Stale PRs, failing Actions, commits without tests |
strategy | strategist | Propose 3 new trading strategies |
Tiers: free (OpenRouter free models via Hermes in WSL), paid (direct API),
hybrid. New agents: copy .ruflo/agents/TEMPLATE_agent.yaml, fill
role/model/goal/metadata.dataSources. Ruflo agents
never write code — JSON findings only.
Code: tools/swarm_v2/swarms/. Install: cd tools/swarm_v2 && pip install -e ".[dev]".
CLI: python -m swarms.cli.main <command>.
Common confusion: "didn't the swarm always call AI models?" For swarm_v2, no.
tools/swarm (System 1) — UNCHANGED. It always called real models. Nothing changed today..ruflo/ (System 2) — UNCHANGED. Same Hermes audit/research orchestrator. Untouched.tools/swarm_v2/ (System 3) — NEW directory, then LLM-wired. Did not exist before today.A "template stub" is not an AI call. As delivered (the Kimi scaffold), swarm_v2's
engine literally returned a hardcoded string — def foo(): pass — and the workers
filled deterministic string templates. Zero LLM calls, zero network. Running
/swarmv2-coding on the old code returned placeholder code regardless of the prompt.
Today's work added llm_client.py and wired every worker to call a real LLM
(deepseek/groq/cerebras/openrouter). The template is now only the offline fallback —
used when no API key is present, so tests stay hermetic and the CLI never hard-crashes.
| swarm_v2 before today (Kimi scaffold) | swarm_v2 after today | |
|---|---|---|
| Code generation | hardcoded def foo(): pass | real LLM writes source + tests |
| Review / research | deterministic string templates | real LLM review / findings |
| LLM calls | none | deepseek / groq / cerebras / openrouter |
| Template | the only path | fallback only (offline / no key) |
| Swarm | CLI | Parameters | Pipeline |
|---|---|---|---|
| Coding | swarm coding <task-file> | --agents 3, --strict, --models | decompose → parallel generate → write tests → review → revise (≤3) → verify |
| PR Review | swarm pr-review <repo> | --pr N, --all-open | fetch → impact + review + risk → aggregate |
| GitHub Actions | swarm actions <repo> | --since 30d, --notify | fetch runs → failed/flaky/cancelled/stale → blast radius |
| Research | swarm research "<topic>" | --depth 3-5, --route A|B|C|D | decompose → parallel research → cross-verify → synthesize |
| Ensemble | swarm ensemble "<task>" | --agents 5, --confidence-threshold 0.8 | register → predict → weighted vote → expand |
| Hierarchical | swarm hierarchical "<task>" | --strategists 2, --tacticians 3 | strategic → tactical → execution → risk veto |
swarms/core/llm_client.py auto-detects
a provider from env keys. Validated working: deepseek, groq, cerebras, openrouter.
Every worker degrades to a deterministic template offline — the swarm never hard-crashes.
Re-probe with python -m swarms.core.llm_client --validate.
Memory: outputs are stored in a ChromaDB vector store with hybrid BM25 + vector search.
swarm memory search "<query>" finds prior results; swarm memory export-skill
turns a result into a reusable Claude skill.
/swarm second-opinion "Should we size up COMMODITY given PF 2.49 / n=322?"
Code a batch of todos:
cd tools/swarm_v2
python -m swarms.cli.main coding task.md --agents 3 --strict
Audit the trading system:
/swarm-ruflo audit --tier hybrid
tools/swarm_v2/task.md — a task or short spec,
stating what to implement and what the tests must cover.python -m swarms.cli.main coding task.md --agents 3 --strict.cavecrew-builder subagent —
the swarm does not auto-commit./swarm pr-review and
/swarmv2-pr-review are per-PR; /swarm actions-audit covers workflow YAML only.
/swarm run swarm_runs/briefing_50_files.md deepseek,xai,cerebras (bound spend with
--cost-cap-usd). A dedicated swarm bulk-review <glob> mode is a noted future addition.
One concrete, project-specific scenario for every swarm type.
| Swarm | Concrete use case in this repo | Command |
|---|---|---|
| Coding | Implement a queued backlog of TESTING_PROTOCOL.MD todos — e.g. wire the kill_gate min-n floor (M-055) into the commodity/fx kill switches. 3 agents draft + test in parallel; reviewers gate; you apply the winner. | swarm coding m055_task.md --agents 3 --strict |
| PR Review | Triage the open-PR backlog before merge — impact score + risk level + breaking-change list per PR, so safe PRs merge fast and risky ones get flagged. | swarm pr-review <repo> --all-open |
| GitHub Actions | Find chronically cancelled / flaky jobs in audit-dashboard.yml + sports-smoke-and-e2e.yml with no subsequent successful run — the recurring CI-drift problem. | swarm actions <repo> --since 30d |
| Research | Scope a hard feature before building — e.g. the López de Prado PBO/CPCV harness (M-052): decompose, research in parallel, cross-verify, surface disputed claims. | swarm research "PBO/CPCV overfitting harness" --depth 4 |
| Ensemble | Aggregate a directional call — BTC 4h LONG vs SHORT from N model votes, weighted by confidence; surfaces dissent instead of one model's guess. | swarm ensemble "BTC 4h direction" --agents 5 |
| Hierarchical | Mirror the trading desk: macro regime (VIX / BTC-dominance) → per-asset-class tactician signal → risk-controller veto on sizing. Produces a structured signal, not a trade. | swarm hierarchical "size COMMODITY exposure" --strategists 2 --tacticians 3 |
| Mode | Concrete use case |
|---|---|
/swarm run | Fan an asset-class audit briefing to deepseek,xai,cerebras — cross-vendor consensus on whether COMMODITY clears the Tier-2 bar. |
/swarm followup | Single-strategy deep dive — prime with the FOREX briefing, then analysis → self-critique → final JSON verdict, one engine, 4 turns. |
/swarm second-opinion | Quick 3-engine gut-check: "kill or mutate the FOREX class given PF 0.27 / n=1249?" |
/swarm pr-review | Multi-specialist (architecture / cost-risk / data-flow) review of one PR before merge. |
/swarm invent | A new problem domain with no persona — bootstrap a custom persona split + test blueprint. |
/swarmwithprework | A large fuzzy task — 4-phase pre-work → brainstorm → synthesis → QA. |
| Subcommand | Concrete use case |
|---|---|
audit | Scan /audit dashboard data for strategies with forward_wr<0.55, stale strategies, elite-score starvation, anti-predictive leakage. |
bugs | Hunt the codebase for hardcoded paths, SQL injection, race conditions, unclosed DB connections, leaked API keys. |
github | GitHub hygiene — stale PRs (>7d), failing Actions, commits without tests, workflow-file mismatches. |
strategy | Ideation — propose 3 new trading strategies (name, asset class, edge, implementation sketch, risk controls). |
/swarm run with asset-class specialist
personas, or /swarm-ruflo audit. /swarmv2-ensemble does weighted signal
voting; /swarmv2-hierarchical mirrors macro→tactician→risk-veto. These produce
signals/analysis — they do not place trades./swarmv2-coding: multiple agents attempt
each task, tests are mandatory, reviewers gate quality. Divergent attempts surface edge-cases a
single pass misses, and the revise loop is enforced rather than optional./swarm pr-review. CI health —
/swarmv2-actions or /swarm-ruflo github.
Full markdown reference: docs/SWARM_SYSTEMS_GUIDE.md. Per-system docs:
tools/swarm/README.md, .ruflo/agents/TEMPLATE_agent.yaml,
tools/swarm_v2/README.md.