🐝 Swarm Systems Guide

How to run our three multi-agent swarm systems — coding, review, research and trading-signal swarms. Last updated 2026-05-15.

← Back to Updates

1. Three systems at a glance

System	Slash commands	Code root	Purpose	Writes code?
`/swarm`	`/swarm run`, `followup`, `pr-review`, `second-opinion`, `invent`	`tools/swarm/`	General multi-engine fan-out — consult N LLM vendors on one prompt, consensus, red-team	Analysis by default; only the `code_implementer` persona writes
`/swarm-ruflo`	`audit`, `github`, `strategy`, `bugs`, `wizard`	`.ruflo/`	Hermes orchestrator — audit / research / hygiene swarms	No — JSON findings only
`/swarmv2-*`	`/swarmv2-coding`, `-pr-review`, `-actions`, `-research`, `-ensemble`, `-hierarchical`	`tools/swarm_v2/`	Enhanced typed swarms — coding pipeline, PR review, CI audit, research, ensemble/hierarchical decision	Yes — real LLM source + tests

Which to use:

Fast second/third opinion on a decision → /swarm second-opinion
A batch of todos coded, tested, reviewed → /swarmv2-coding
Audit dashboard / strategies / GitHub hygiene scanned → /swarm-ruflo audit|bugs|github
A PR reviewed by multiple specialists → /swarm pr-review
Deep research with cross-verification → /swarmv2-research

2. System 1 — /swarm (general multi-engine)

Code: tools/swarm/ — swarm_run.py (fan-out), worker_runner.py (single-engine worker), swarm_followup.py (multi-turn chain), api_consult.py (HTTP API caller), safety.py (read-only allowlist).

Slash commands

Command	Usage
`/swarm run`	`/swarm run <prompt-file> [engine,engine,…]` — fan one prompt to N engines. Default `deepseek,xai,kilo`.
`/swarm followup`	`/swarm followup <yaml-config>` — multi-turn chain, one engine.
`/swarm second-opinion`	`/swarm second-opinion <question>` — quick 3-engine consensus.
`/swarm pr-review`	`/swarm pr-review [PR#\|all\|open]` — 3 specialists per PR + consensus.
`/swarm invent`	`/swarm invent <problem-file>` — bootstrap custom personas.
`/swarmwithprework`	`/swarmwithprework <task>` — 4-phase pre-work→brainstorm→synthesis→QA.

Engines

API: deepseek, xai, cerebras, inception, openrouter, groq, gemini_api, pollinations (keyless), nous, ollama_*. CLI: claude, gemini, kilo, opencode, copilot, agent (Cursor). Presets: consensus-3, fast-cheap, deep-strict, non-opus-4.

Fan-out YAML template

name: my_run_${TS}
prompt_file: swarm_runs/briefing_my_task.md
out_dir: swarm_runs/run_${TS}
max_parallel: 4
preset: consensus-3        # OR an explicit engines: list
red_team: true             # adds a claude-opus red-team pass
cost_cap_usd: 5.0

Multi-turn followup — prompts per round

priming — feed a briefing file as warm-up context.
analysis — "narrow to the problem, cite specific numbers."
critique — "which claim is weakest, what unstated assumption, what would you retract."
final — "emit valid JSON only, per this schema."

3. System 2 — /swarm-ruflo (Hermes orchestrator)

Code: .ruflo/orchestrator.py, wizard.py, agents/*.yaml.

Subcommand	Agents	Action
`audit`	audit-researcher + audit-quant	Dashboard — forward-WR, stale strategies, leakage
`bugs`	bug-hunter	Hardcoded paths, SQL injection, races, key leaks
`github`	github-hygiene	Stale PRs, failing Actions, commits without tests
`strategy`	strategist	Propose 3 new trading strategies

Tiers: free (OpenRouter free models via Hermes in WSL), paid (direct API), hybrid. New agents: copy .ruflo/agents/TEMPLATE_agent.yaml, fill role/model/goal/metadata.dataSources. Ruflo agents never write code — JSON findings only.

4. System 3 — /swarmv2-* (enhanced typed swarm) — NEW

Code: tools/swarm_v2/swarms/. Install: cd tools/swarm_v2 && pip install -e ".[dev]". CLI: python -m swarms.cli.main <command>.

4.0 What changed today — and what did NOT

Common confusion: "didn't the swarm always call AI models?" For swarm_v2, no.

tools/swarm (System 1) — UNCHANGED. It always called real models. Nothing changed today.
.ruflo/ (System 2) — UNCHANGED. Same Hermes audit/research orchestrator. Untouched.
tools/swarm_v2/ (System 3) — NEW directory, then LLM-wired. Did not exist before today.

A "template stub" is not an AI call. As delivered (the Kimi scaffold), swarm_v2's engine literally returned a hardcoded string — def foo(): pass — and the workers filled deterministic string templates. Zero LLM calls, zero network. Running /swarmv2-coding on the old code returned placeholder code regardless of the prompt.

Today's work added llm_client.py and wired every worker to call a real LLM (deepseek/groq/cerebras/openrouter). The template is now only the offline fallback — used when no API key is present, so tests stay hermetic and the CLI never hard-crashes.

	swarm_v2 before today (Kimi scaffold)	swarm_v2 after today
Code generation	hardcoded `def foo(): pass`	real LLM writes source + tests
Review / research	deterministic string templates	real LLM review / findings
LLM calls	none	deepseek / groq / cerebras / openrouter
Template	the only path	fallback only (offline / no key)

4.1 The six swarm types

Swarm	CLI	Parameters	Pipeline
Coding	`swarm coding <task-file>`	`--agents 3`, `--strict`, `--models`	decompose → parallel generate → write tests → review → revise (≤3) → verify
PR Review	`swarm pr-review <repo>`	`--pr N`, `--all-open`	fetch → impact + review + risk → aggregate
GitHub Actions	`swarm actions <repo>`	`--since 30d`, `--notify`	fetch runs → failed/flaky/cancelled/stale → blast radius
Research	`swarm research "<topic>"`	`--depth 3-5`, `--route A\|B\|C\|D`	decompose → parallel research → cross-verify → synthesize
Ensemble	`swarm ensemble "<task>"`	`--agents 5`, `--confidence-threshold 0.8`	register → predict → weighted vote → expand
Hierarchical	`swarm hierarchical "<task>"`	`--strategists 2`, `--tacticians 3`	strategic → tactical → execution → risk veto

Real LLM providers (wired 2026-05-15): swarms/core/llm_client.py auto-detects a provider from env keys. Validated working: deepseek, groq, cerebras, openrouter. Every worker degrades to a deterministic template offline — the swarm never hard-crashes. Re-probe with python -m swarms.core.llm_client --validate.

Memory: outputs are stored in a ChromaDB vector store with hybrid BM25 + vector search. swarm memory search "<query>" finds prior results; swarm memory export-skill turns a result into a reusable Claude skill.

5. Quick-start

Fast multi-vendor opinion:

/swarm second-opinion "Should we size up COMMODITY given PF 2.49 / n=322?"

Code a batch of todos:

cd tools/swarm_v2
python -m swarms.cli.main coding task.md --agents 3 --strict

Audit the trading system:

/swarm-ruflo audit --tier hybrid

6. Sample end-to-end flow — code todos with the coding swarm

Write the task file tools/swarm_v2/task.md — a task or short spec, stating what to implement and what the tests must cover.
Run python -m swarms.cli.main coding task.md --agents 3 --strict.
Pipeline: decompose → 3 generators write code+tests in parallel (each may use a different provider) → test_writer enriches tests + re-runs pytest → 2 reviewers score each artifact → generator revises on flagged issues (≤3 rounds) → artifacts with failing tests dropped.
Collect surviving artifacts (source + tests + review comments + results).
Apply the winning diff yourself or via a cavecrew-builder subagent — the swarm does not auto-commit.

7. Bulk-reviewing many files (e.g. 50 files)

No purpose-built 50-file batch reviewer exists. /swarm pr-review and /swarmv2-pr-review are per-PR; /swarm actions-audit covers workflow YAML only.
Workaround: build one briefing file with the 50 files (or excerpts) and fan it: /swarm run swarm_runs/briefing_50_files.md deepseek,xai,cerebras (bound spend with --cost-cap-usd). A dedicated swarm bulk-review <glob> mode is a noted future addition.

8. Use case per swarm type

One concrete, project-specific scenario for every swarm type.

swarm_v2 (/swarmv2-*)

Swarm	Concrete use case in this repo	Command
Coding	Implement a queued backlog of `TESTING_PROTOCOL.MD` todos — e.g. wire the `kill_gate` min-n floor (M-055) into the commodity/fx kill switches. 3 agents draft + test in parallel; reviewers gate; you apply the winner.	`swarm coding m055_task.md --agents 3 --strict`
PR Review	Triage the open-PR backlog before merge — impact score + risk level + breaking-change list per PR, so safe PRs merge fast and risky ones get flagged.	`swarm pr-review <repo> --all-open`
GitHub Actions	Find chronically cancelled / flaky jobs in `audit-dashboard.yml` + `sports-smoke-and-e2e.yml` with no subsequent successful run — the recurring CI-drift problem.	`swarm actions <repo> --since 30d`
Research	Scope a hard feature before building — e.g. the López de Prado PBO/CPCV harness (M-052): decompose, research in parallel, cross-verify, surface disputed claims.	`swarm research "PBO/CPCV overfitting harness" --depth 4`
Ensemble	Aggregate a directional call — BTC 4h LONG vs SHORT from N model votes, weighted by confidence; surfaces dissent instead of one model's guess.	`swarm ensemble "BTC 4h direction" --agents 5`
Hierarchical	Mirror the trading desk: macro regime (VIX / BTC-dominance) → per-asset-class tactician signal → risk-controller veto on sizing. Produces a structured signal, not a trade.	`swarm hierarchical "size COMMODITY exposure" --strategists 2 --tacticians 3`

/swarm (System 1)

Mode	Concrete use case
`/swarm run`	Fan an asset-class audit briefing to `deepseek,xai,cerebras` — cross-vendor consensus on whether COMMODITY clears the Tier-2 bar.
`/swarm followup`	Single-strategy deep dive — prime with the FOREX briefing, then analysis → self-critique → final JSON verdict, one engine, 4 turns.
`/swarm second-opinion`	Quick 3-engine gut-check: "kill or mutate the FOREX class given PF 0.27 / n=1249?"
`/swarm pr-review`	Multi-specialist (architecture / cost-risk / data-flow) review of one PR before merge.
`/swarm invent`	A new problem domain with no persona — bootstrap a custom persona split + test blueprint.
`/swarmwithprework`	A large fuzzy task — 4-phase pre-work → brainstorm → synthesis → QA.

/swarm-ruflo (System 2)

Subcommand	Concrete use case
`audit`	Scan `/audit` dashboard data for strategies with `forward_wr<0.55`, stale strategies, elite-score starvation, anti-predictive leakage.
`bugs`	Hunt the codebase for hardcoded paths, SQL injection, race conditions, unclosed DB connections, leaked API keys.
`github`	GitHub hygiene — stale PRs (>7d), failing Actions, commits without tests, workflow-file mismatches.
`strategy`	Ideation — propose 3 new trading strategies (name, asset class, edge, implementation sketch, risk controls).

9. Recommended usage in this project

Stock/crypto prediction — /swarm run with asset-class specialist personas, or /swarm-ruflo audit. /swarmv2-ensemble does weighted signal voting; /swarmv2-hierarchical mirrors macro→tactician→risk-veto. These produce signals/analysis — they do not place trades.
Coding a backlog faster — /swarmv2-coding: multiple agents attempt each task, tests are mandatory, reviewers gate quality. Divergent attempts surface edge-cases a single pass misses, and the revise loop is enforced rather than optional.
PR triage — /swarm pr-review. CI health — /swarmv2-actions or /swarm-ruflo github.

Full markdown reference: docs/SWARM_SYSTEMS_GUIDE.md. Per-system docs: tools/swarm/README.md, .ruflo/agents/TEMPLATE_agent.yaml, tools/swarm_v2/README.md.

← Back to Updates