🐝 Swarm Systems Guide

How to run our three multi-agent swarm systems — coding, review, research and trading-signal swarms. Last updated 2026-05-15.
← Back to Updates

1. Three systems at a glance

SystemSlash commandsCode rootPurposeWrites code?
/swarm/swarm run, followup, pr-review, second-opinion, inventtools/swarm/ General multi-engine fan-out — consult N LLM vendors on one prompt, consensus, red-team Analysis by default; only the code_implementer persona writes
/swarm-rufloaudit, github, strategy, bugs, wizard.ruflo/ Hermes orchestrator — audit / research / hygiene swarmsNo — JSON findings only
/swarmv2-*/swarmv2-coding, -pr-review, -actions, -research, -ensemble, -hierarchical tools/swarm_v2/Enhanced typed swarms — coding pipeline, PR review, CI audit, research, ensemble/hierarchical decisionYes — real LLM source + tests
Which to use:

2. System 1 — /swarm (general multi-engine)

Code: tools/swarm/swarm_run.py (fan-out), worker_runner.py (single-engine worker), swarm_followup.py (multi-turn chain), api_consult.py (HTTP API caller), safety.py (read-only allowlist).

Slash commands

CommandUsage
/swarm run/swarm run <prompt-file> [engine,engine,…] — fan one prompt to N engines. Default deepseek,xai,kilo.
/swarm followup/swarm followup <yaml-config> — multi-turn chain, one engine.
/swarm second-opinion/swarm second-opinion <question> — quick 3-engine consensus.
/swarm pr-review/swarm pr-review [PR#|all|open] — 3 specialists per PR + consensus.
/swarm invent/swarm invent <problem-file> — bootstrap custom personas.
/swarmwithprework/swarmwithprework <task> — 4-phase pre-work→brainstorm→synthesis→QA.

Engines

API: deepseek, xai, cerebras, inception, openrouter, groq, gemini_api, pollinations (keyless), nous, ollama_*. CLI: claude, gemini, kilo, opencode, copilot, agent (Cursor). Presets: consensus-3, fast-cheap, deep-strict, non-opus-4.

Fan-out YAML template

name: my_run_${TS}
prompt_file: swarm_runs/briefing_my_task.md
out_dir: swarm_runs/run_${TS}
max_parallel: 4
preset: consensus-3        # OR an explicit engines: list
red_team: true             # adds a claude-opus red-team pass
cost_cap_usd: 5.0

Multi-turn followup — prompts per round

  1. priming — feed a briefing file as warm-up context.
  2. analysis — "narrow to the problem, cite specific numbers."
  3. critique — "which claim is weakest, what unstated assumption, what would you retract."
  4. final — "emit valid JSON only, per this schema."

3. System 2 — /swarm-ruflo (Hermes orchestrator)

Code: .ruflo/orchestrator.py, wizard.py, agents/*.yaml.

/swarm-ruflo [audit|github|strategy|bugs|wizard] [--tier free|paid|hybrid]

SubcommandAgentsAction
auditaudit-researcher + audit-quantDashboard — forward-WR, stale strategies, leakage
bugsbug-hunterHardcoded paths, SQL injection, races, key leaks
githubgithub-hygieneStale PRs, failing Actions, commits without tests
strategystrategistPropose 3 new trading strategies

Tiers: free (OpenRouter free models via Hermes in WSL), paid (direct API), hybrid. New agents: copy .ruflo/agents/TEMPLATE_agent.yaml, fill role/model/goal/metadata.dataSources. Ruflo agents never write code — JSON findings only.

4. System 3 — /swarmv2-* (enhanced typed swarm) — NEW

Code: tools/swarm_v2/swarms/. Install: cd tools/swarm_v2 && pip install -e ".[dev]". CLI: python -m swarms.cli.main <command>.

4.0 What changed today — and what did NOT

Common confusion: "didn't the swarm always call AI models?" For swarm_v2, no.

A "template stub" is not an AI call. As delivered (the Kimi scaffold), swarm_v2's engine literally returned a hardcoded string — def foo(): pass — and the workers filled deterministic string templates. Zero LLM calls, zero network. Running /swarmv2-coding on the old code returned placeholder code regardless of the prompt.

Today's work added llm_client.py and wired every worker to call a real LLM (deepseek/groq/cerebras/openrouter). The template is now only the offline fallback — used when no API key is present, so tests stay hermetic and the CLI never hard-crashes.

swarm_v2 before today (Kimi scaffold)swarm_v2 after today
Code generationhardcoded def foo(): passreal LLM writes source + tests
Review / researchdeterministic string templatesreal LLM review / findings
LLM callsnonedeepseek / groq / cerebras / openrouter
Templatethe only pathfallback only (offline / no key)

4.1 The six swarm types

SwarmCLIParametersPipeline
Codingswarm coding <task-file>--agents 3, --strict, --modelsdecompose → parallel generate → write tests → review → revise (≤3) → verify
PR Reviewswarm pr-review <repo>--pr N, --all-openfetch → impact + review + risk → aggregate
GitHub Actionsswarm actions <repo>--since 30d, --notifyfetch runs → failed/flaky/cancelled/stale → blast radius
Researchswarm research "<topic>"--depth 3-5, --route A|B|C|Ddecompose → parallel research → cross-verify → synthesize
Ensembleswarm ensemble "<task>"--agents 5, --confidence-threshold 0.8register → predict → weighted vote → expand
Hierarchicalswarm hierarchical "<task>"--strategists 2, --tacticians 3strategic → tactical → execution → risk veto
Real LLM providers (wired 2026-05-15): swarms/core/llm_client.py auto-detects a provider from env keys. Validated working: deepseek, groq, cerebras, openrouter. Every worker degrades to a deterministic template offline — the swarm never hard-crashes. Re-probe with python -m swarms.core.llm_client --validate.

Memory: outputs are stored in a ChromaDB vector store with hybrid BM25 + vector search. swarm memory search "<query>" finds prior results; swarm memory export-skill turns a result into a reusable Claude skill.

5. Quick-start

Fast multi-vendor opinion:
/swarm second-opinion "Should we size up COMMODITY given PF 2.49 / n=322?"
Code a batch of todos:
cd tools/swarm_v2
python -m swarms.cli.main coding task.md --agents 3 --strict
Audit the trading system:
/swarm-ruflo audit --tier hybrid

6. Sample end-to-end flow — code todos with the coding swarm

  1. Write the task file tools/swarm_v2/task.md — a task or short spec, stating what to implement and what the tests must cover.
  2. Run python -m swarms.cli.main coding task.md --agents 3 --strict.
  3. Pipeline: decompose → 3 generators write code+tests in parallel (each may use a different provider) → test_writer enriches tests + re-runs pytest → 2 reviewers score each artifact → generator revises on flagged issues (≤3 rounds) → artifacts with failing tests dropped.
  4. Collect surviving artifacts (source + tests + review comments + results).
  5. Apply the winning diff yourself or via a cavecrew-builder subagent — the swarm does not auto-commit.

7. Bulk-reviewing many files (e.g. 50 files)

No purpose-built 50-file batch reviewer exists. /swarm pr-review and /swarmv2-pr-review are per-PR; /swarm actions-audit covers workflow YAML only.
Workaround: build one briefing file with the 50 files (or excerpts) and fan it: /swarm run swarm_runs/briefing_50_files.md deepseek,xai,cerebras (bound spend with --cost-cap-usd). A dedicated swarm bulk-review <glob> mode is a noted future addition.

8. Use case per swarm type

One concrete, project-specific scenario for every swarm type.

swarm_v2 (/swarmv2-*)

SwarmConcrete use case in this repoCommand
CodingImplement a queued backlog of TESTING_PROTOCOL.MD todos — e.g. wire the kill_gate min-n floor (M-055) into the commodity/fx kill switches. 3 agents draft + test in parallel; reviewers gate; you apply the winner.swarm coding m055_task.md --agents 3 --strict
PR ReviewTriage the open-PR backlog before merge — impact score + risk level + breaking-change list per PR, so safe PRs merge fast and risky ones get flagged.swarm pr-review <repo> --all-open
GitHub ActionsFind chronically cancelled / flaky jobs in audit-dashboard.yml + sports-smoke-and-e2e.yml with no subsequent successful run — the recurring CI-drift problem.swarm actions <repo> --since 30d
ResearchScope a hard feature before building — e.g. the López de Prado PBO/CPCV harness (M-052): decompose, research in parallel, cross-verify, surface disputed claims.swarm research "PBO/CPCV overfitting harness" --depth 4
EnsembleAggregate a directional call — BTC 4h LONG vs SHORT from N model votes, weighted by confidence; surfaces dissent instead of one model's guess.swarm ensemble "BTC 4h direction" --agents 5
HierarchicalMirror the trading desk: macro regime (VIX / BTC-dominance) → per-asset-class tactician signal → risk-controller veto on sizing. Produces a structured signal, not a trade.swarm hierarchical "size COMMODITY exposure" --strategists 2 --tacticians 3

/swarm (System 1)

ModeConcrete use case
/swarm runFan an asset-class audit briefing to deepseek,xai,cerebras — cross-vendor consensus on whether COMMODITY clears the Tier-2 bar.
/swarm followupSingle-strategy deep dive — prime with the FOREX briefing, then analysis → self-critique → final JSON verdict, one engine, 4 turns.
/swarm second-opinionQuick 3-engine gut-check: "kill or mutate the FOREX class given PF 0.27 / n=1249?"
/swarm pr-reviewMulti-specialist (architecture / cost-risk / data-flow) review of one PR before merge.
/swarm inventA new problem domain with no persona — bootstrap a custom persona split + test blueprint.
/swarmwithpreworkA large fuzzy task — 4-phase pre-work → brainstorm → synthesis → QA.

/swarm-ruflo (System 2)

SubcommandConcrete use case
auditScan /audit dashboard data for strategies with forward_wr<0.55, stale strategies, elite-score starvation, anti-predictive leakage.
bugsHunt the codebase for hardcoded paths, SQL injection, race conditions, unclosed DB connections, leaked API keys.
githubGitHub hygiene — stale PRs (>7d), failing Actions, commits without tests, workflow-file mismatches.
strategyIdeation — propose 3 new trading strategies (name, asset class, edge, implementation sketch, risk controls).

9. Recommended usage in this project

Full markdown reference: docs/SWARM_SYSTEMS_GUIDE.md. Per-system docs: tools/swarm/README.md, .ruflo/agents/TEMPLATE_agent.yaml, tools/swarm_v2/README.md.

← Back to Updates