Brigade Master Class

Brigade, end to end

A local operator-system CLI for AI agent memory, handoffs, and guardrails across 18 writer harnesses. Not a runner, not a daemon, not a hosted service. This is how every piece works, from the decide() guardrail to the cross-model run engine.

brigade-cli v0.12.0Python 3.10+MITzero runtime deps

Repository Site22 min read

What Brigade Is — and Is Not

Brigade’s own one-liner: "Your agents run loops. Brigade keeps the receipts." It is AI agent memory, handoffs, and local guardrails for Codex, Claude Code, OpenCode, and over a dozen other harnesses. It runs on the machine you control: local by default, loud about the exceptions. Holding both halves of this panel in your head is the whole mental model.

What it is

Shared memory across harnessesOne handoff note format every tool writes; one canonical owner files the good notes into durable memory.
Local guardrailsLints handoffs, scrubs content, and security-scans agent workspaces before anything leaves the machine.
A receipt for everythingEvery consequential action lands a plain-file receipt you can grep, diff, and prune.
File-firstYour memory is markdown in your repo, reviewable in git, readable without Brigade installed.
Deliberately boundedSafe, targeted notes auto-file; ambiguous or risky ones wait for your review.

What it is not

A hosted service or daemonDoes not run in the background or install schedulers.
A publisherNever pushes to GitHub or publishes packages.
NoisySends no notifications by default.
Blindly automaticDoes not save every note automatically.
A silent ingesterNever turns ingest into a background process, and never skips review for risky or failed notes.

Why It Exists: Two Incidents

Brigade was hand-rolled one incident at a time around an always-on OpenClaw agent plus daily Codex and Claude Code sessions. Two failures shaped the design more than anything planned. First, a nightly "dreaming" job promoted raw session fragments straight into memory and bloated MEMORY.md past the bootstrap budget; every session then started with truncated memory and nobody noticed for weeks. Blind auto-promotion died that day. Second, 195 handoff notes sat unread across 35 repos because the ingester had a hardcoded three-repo allowlist and nothing warned about the gap. Silence is the failure mode. Every part of Brigade that lints, warns, or writes a receipt exists because something once failed in silence.

41KB

MEMORY.md bloat

Blew past the 12KB bootstrap budget. Truncated memory, unnoticed for weeks. The end of blind auto-promotion.

195

Handoffs unread

Across 35 repos, behind a hardcoded 3-repo allowlist. No warning fired. Silence is the failure mode.

12KB

Bootstrap budget

The slim-index ceiling Brigade now guards on every route into memory.

482

Cards in production

The system Brigade packages, surviving daily multi-agent work. The cookbook is where it came from.

The Loop in Five Steps

The memory loop is the core; everything else orbits it. Writer harnesses leave handoff notes as they work. Brigade lints, guards, and classifies each one, then files the safe, targeted notes into durable memory on its own. A memory owner (OpenClaw, Hermes, or just you) only steps in for the ambiguous few.

Agents write handoffsEach writer harness drops a handoff note into its own local inbox (e.g. .claude/memory-handoffs/) as it works.
Brigade lints & classifiesBefore a note can become memory, it is parsed, guarded, and classified. Nothing reaches memory unlinted.
Safe notes auto-fileNotes that name a valid target and clear every guard file themselves into durable memory automatically.
Risky notes waitAnything ambiguous, unsafe, or failed is kicked to a review inbox for the memory owner to decide.
Receipts + better contextFuture sessions start with better context, and a plain-file receipt records exactly what happened.

The Loop, Drawn

The same five steps as a diagram: writer harnesses hand off, Brigade lints and classifies, safe notes reach durable memory through the owner, the risky few branch to review, and memory feeds the next session.

Writer harnessesCodex · Claude Code · OpenCode · …

→

Brigadelint · guard · classify

→

Memory ownerOpenClaw / Hermes / you

→

Durable memoryMEMORY.md index + cards

↓ ambiguous or risky

Review inboxwaits for your yes

↻ Durable memory feeds the next session with better context — and a receipt records exactly what happened.

The Kitchen Glossary

Brigade borrows the vocabulary of a professional kitchen brigade (a brigade de cuisine). The terms are load-bearing, not decoration: internalize them and the rest of the system reads cleanly. Definitions are verbatim from docs/technical-guide.md.

harnessthe line cook: An AI agent program: Claude Code, Codex, OpenCode, Antigravity, Pi, Cursor, OpenClaw, Hermes, and more.docs/technical-guide.md
operatorthe chef: You, the human running the agents and making the explicit decisions about what gets saved, run, or published.
handoffthe ticket: A memory note an agent writes to be saved long-term, held in its own inbox until linted and classified.
ingest / ingesterexpediting to the pass: Reading handoff notes from every inbox and filing them into permanent memory. Conservative by design.
stationa kitchen station: A subsystem of Brigade (memory, security, tokens, pantry, ...) with its own commands and a health check.
mise en placeeverything in its place: Rules, memory, tools, inboxes, and guards laid out before the session gets expensive. The core idea.
aboyeurthe expediter: The kitchen expediter who calls out orders. In Brigade, the orchestrator behind a cross-model `brigade run`.
receiptthe docket: A local file logging that something happened, kept for audit and proof. Brigade writes them but never acts on them.
gatethe chef’s yes: A manual approval checkpoint; nothing risky happens without your yes.
dogfoodstaff meal: Brigade used on itself or another trusted repo.

Mise en Place: Two-Layer Memory

The 41KB incident produced the central design decision: memory has two layers. Knowledge cards under memory/cards/ hold the detail; MEMORY.md stays a slim, one-line-per-card index that loads every session and is guarded against the 12KB budget. Brigade never edits canonical memory itself — the owner does the writing.

Bootstrap layer (loads every session, budget-guarded)

MEMORY.md

Slim one-line-per-card index. Guarded against the 12KB bootstrap budget on every route.

AGENTS.md / rules

Operator rules and conventions projected into each harness’s native format.

TOOLS.md

The reviewed tool + endpoint catalog. A safe special target for routed notes.

↓

Detail layer (loaded on demand)

memory/cards/*.md

Knowledge cards with YAML frontmatter. Where promoted handoffs land. 482 in the production system.

.learnings/*

LEARNINGS.md, ERRORS.md, FEATURE_REQUESTS.md — routed append targets.

memory care scan

Flags stale, contradictory, or undersourced cards for review instead of trusting them forever.

The Nine Stations

Brigade is partitioned into nine builtin stations (the _BUILTIN tuple in registry.py). Each is a frozen Station dataclass with a name, a summary, kitchen aliases, an optional doctor() health check, and the managed tools attached to it. The station is the unit of the system.

coremise

Workspace bootstrap and harness adapters — the setup station.

memorygarde

Handoff inbox, ingest, and memory-care.

memory-doctor · bootstrap-doctor

guardpass

Publish safety and content scrub.

content-guard

tokens

Output compaction.

tokenjuice

searchcode-search

Local semantic code search.

code-search-api · code-search-mcp

securitysec

Agent workspace security scanning.

pantrylarder

Agent session auth sync.

agentpantry

notificationsnotify

Operator notification wiring.

agent-notify

evidenceledger

Local-first evidence ledger and source exporters.

miseledger · stationtrail · sourceharvest

The decide() Guardrail

This is the single most important piece of Brigade. decide() (ingest.py) is the pure function that turns a parsed handoff into one of four outcomes: promoted, routed, inboxed, or skipped. Every branch that cannot prove a note is safe routes it to the review inbox. Conservative by default — that pause is the point.

Parsed handoff note

Sections parsed from the markdown by SECTION_RE (^##\s+name).

any unknown ## section?
inboxedoutcome
A section key not in KNOWN_SECTIONS means the parser may have split on internal text. Bail to review.
action = create/update-card AND promote_cards
▸ Card safety checks
SAFE_CARD_NAME_RE filename allowlist → YAML frontmatter present → scan_untrusted for injection.
- any check fails
  inboxedoutcome
  Unsafe name, missing frontmatter, or injection signal.
- all pass
  promotedoutcome
  Written to memory/cards/<card>.md.
action names a target document AND route_documents
▸ Document safety checks
Target in SAFE_SPECIAL_TARGETS or matching SAFE_RULE_PATH_RE → non-empty → no ## headings → dedupe → budget guard → injection scan.
- any check fails
  inboxedoutcome
  Unsafe target, dup, over-budget, or injection signal.
- all pass
  routedoutcome
  Appended to the target document (TOOLS.md, rules/*, .learnings/*).
no recommended action
inboxedoutcome
Missing "Recommended memory action".
anything else
skippedoutcome
Action not auto-handled; left for a human.

decide(), in the Source

The same guardrail in code. Notice that every failure path constructs an Outcome("inboxed", ...) instead of writing — the safe default is always review.

src/brigade/ingest.py:209python

The card-promotion branch of decide(). The document-routing branch below it follows the same shape.

def decide(sections, target, promote_cards, route_documents) -> Outcome:
    action = sections.get("recommended memory action", "").strip().lower()

    stray = [s for s in sections if s not in KNOWN_SECTIONS]
    if stray:
        return Outcome("inboxed",
            reason=f"unknown sections present (parser may have split content): {stray}")

    if action in ("create-card", "update-card") and promote_cards:
        card = sections.get("target card", "").strip()
        content = sections.get("suggested card content", "")
        if not SAFE_CARD_NAME_RE.match(card):
            return Outcome("inboxed", reason=f"target card name unsafe: {card!r}")
        if not content.lstrip().startswith("---"):
            return Outcome("inboxed", reason="card content missing YAML frontmatter")
        if scan_untrusted(content).flagged:
            return Outcome("inboxed", reason="injection signal in card content ...")
        return Outcome("promoted", dest=target / "memory" / "cards" / card)

stray = [s for s in sections if s not in KNOWN_SECTIONS]An unrecognized ## heading means the parser likely split the note wrong. Refuse to guess; inbox it.
and promote_cardsAuto-promotion is opt-in. Off by default, you choose to enable it per ingest.
SAFE_CARD_NAME_RE.match(card)A strict filename allowlist (^[A-Za-z0-9._-]+\.md$) blocks path traversal into other dirs.
scan_untrusted(content).flaggedPrompt-injection scan runs before anything is allowed to become durable memory.

Execution: brigade run & the Aboyeur

brigade run "<task>" is a bounded cross-model orchestration. The aboyeur (the kitchen expediter) plans the task into staged JSON assignments; workers run in parallel within a stage through their own CLIs; later stages receive earlier results; the orchestrator synthesizes. It is intentionally bounded: two orchestrator calls plus the planned worker calls. No infinite loop.

Phase 1 · PLANorchestrator (aboyeur)

build_plan_promptEmit one strict JSON object of staged assignments, capped at roster.max_workers per stage.orchestrator

↓

Phase 2 · DISPATCHworkers run their own CLIsparallel

Stage 1 workersRun concurrently via ThreadPoolExecutor. A crash becomes ok=False, never a hard failure.cli=codex

Stage 2 workersReceive all prior-stage results as context, then run concurrently too.cli=claude

...later stagesSerial across stages, parallel within. Pool never over-provisions.cli=...

↓

Phase 3 · SYNTHESISorchestrator (aboyeur)

Synthesize final answerFold worker results into one answer, then write a run handoff into the ingester inbox.orchestrator

The Dispatch Loop

How "serial across stages, parallel within a stage, later stages see earlier results" is actually implemented.

src/brigade/aboyeur.py:456python

all_results: list[WorkerResult] = []
stages = sorted({a.stage for a in assignments})
for stage in stages:
    stage_assignments = [a for a in assignments if a.stage == stage]
    prior_results = list(all_results)
    max_workers = min(roster.max_workers, len(stage_assignments))
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_index = {
            executor.submit(run_one, a, prior_results): i
            for i, a in enumerate(stage_assignments)
        }
        for future in as_completed(future_to_index):
            i = future_to_index[future]
            try:
                stage_results_by_index[i] = future.result()
            except Exception as exc:
                a = stage_assignments[i]
                stage_results_by_index[i] = WorkerResult(
                    worker=a.worker, task=a.task, text="", ok=False, detail=str(exc)[:200])
    all_results.extend(stage_results_by_index[i] for i in range(len(stage_assignments)))

for stage in stages:Stages are serial; ordering is preserved so dependent work runs after its inputs.
prior_results = list(all_results)Each stage receives a snapshot of every earlier worker’s output as context.
min(roster.max_workers, len(stage_assignments))The pool never over-provisions: cap is the smaller of the roster limit and the stage size.
except Exception ... ok=FalseA worker that crashes becomes a failed result, not an exception that kills the run.

The CLI Adapter Ring & Sandbox Enforcement

Brigade reaches models with no SDKs and no API keys: it builds subprocess argv for the user’s own authenticated CLIs. Because not every CLI can truly enforce read-only, Brigade is honest about it. hard = a native sandbox the model cannot escape; soft = read-only is only a prompt instruction; none = read_only is not applied at all. A writable --sandbox override downgrades even a hard CLI to prompt-only, with a loud advisory.

	hard	soft	none
codex
antigravity
pi / cursor / aider
continue / qwen / kimi
goose / copilot / adal
openhands / grok
amp / crush
claudeargv ignores read_only entirely
opencode

8 hard · 7 soft · 2 none. From READ_ONLY_ENFORCEMENT in agents.py. ollama:<model> refs always resolve to "none".

Enforcement at a Glance

The 17 run-CLI adapters split across three enforcement strengths. Knowing which bucket your CLI falls in tells you how much to trust a read-only run.

hard (real sandbox)8 CLIs

soft (prompt only)7 CLIs

none (not applied)2 CLIs

How read-only Is Enforced

The codex adapter is the cleanest example of the "build argv, no shell" pattern. Every external command is an argv list run through proc.run, which captures output and normalizes failure to exit codes.

src/brigade/agents.py:22 · src/brigade/proc.py:30python

def _codex_argv(prompt, read_only, sandbox):
    if sandbox:
        return ["codex", "exec", "--sandbox", sandbox, prompt]   # writable override
    if read_only:
        return ["codex", "exec", "--sandbox", "read-only", prompt]  # hard sandbox
    return ["codex", "exec", prompt]

# proc.run: no shell, bounded, failures become exit codes
cp = subprocess.run(args, capture_output=True, text=True,
                    timeout=timeout, check=False, stdin=subprocess.DEVNULL)
# FileNotFoundError -> Result(code=127, ...)   # missing tool, not a raise
# TimeoutExpired    -> Result(code=124, ...)   # hung tool, bounded

["codex", "exec", "--sandbox", "read-only", prompt]codex gets a real OS sandbox flag, which is why its enforcement is "hard".
sandbox: ... return [..., "--sandbox", sandbox, ...]A writable --sandbox override is honored, downgrading enforcement to prompt-only with a loud advisory.
subprocess.run(args, ... ) # no shell=Trueargv list, never a shell string: there is no shell-injection surface.
stdin=subprocess.DEVNULLTools can never block waiting on input; a hung tool is killed by timeout (code 124).

Installation as Declarative Composition

brigade init does not run a script of imperative steps. It composes manifests (a depth manifest + one per selected harness + includes), dedupes them by destination so a harness can override a baseline file, renders the templates, and atomically rewrites a marker-delimited block in .gitignore.

src/brigade/install.py:257python

# Dedupe files by dst (last-wins): a harness manifest can override a baseline file.
seen: dict[str, dict] = {}
for entry in files:
    seen[entry["dst"]] = entry
deduped_files = list(seen.values())

# Per selected harness, write a managed, marker-delimited gitignore block:
for h in selection.harnesses:
    inbox = WRITER_INBOXES.get(h)
    if inbox:
        lines += [f"{inbox}/*",            # ignore session-local handoffs
                  f"!{inbox}/TEMPLATE.md",  # but keep the template
                  f"!{inbox}/.gitkeep"]

seen[entry["dst"]] = entryLast-wins by destination path: later (harness-specific) manifests override earlier (baseline) files.
GITIGNORE_BEGIN ... markersRe-running brigade init only rewrites the content between the markers, never your edits outside them.
f"{inbox}/*" + f"!{inbox}/TEMPLATE.md"Handoffs are private session context, so they are gitignored — but the shared template is kept.

Read-only MCP & Managed Tools

Brigade exposes its skills and memory cards to agents over a zero-dependency, read-only MCP server: a single line-oriented JSON-RPC loop on stdin/stdout that only ever initializes, lists resources, and calls read tools. Managed tools follow the same humility: an absent or unconfigured fleet tool is advisory (WARN/MANUAL), never a hard failure of the workspace doctor.

src/brigade/mcp_server.py:50python

for line in sys.stdin:
    request = json.loads(line)               # newline-delimited JSON-RPC 2.0
    method = str(request.get("method") or "")
    if method == "initialize":
        _emit(response(id, result={"protocolVersion": "2024-11-05",
            "capabilities": {"resources": {}, "tools": {}}}))
    elif method == "resources/list":
        _emit(response(id, result={"resources": list_resources()}))
    elif method == "tools/call":
        payload, failed = call_tool(name, arguments)   # station-supplied callback
        _emit(response(id, result={"content": [{"type": "text", "text": text}],
                                   "isError": failed}))

for line in sys.stdin:No framework: a plain newline-delimited JSON-RPC loop, which is why it has zero dependencies.
initialize / resources/list / tools/callThe only methods it answers. There is no mutating method — the server is read-only by construction.
call_tool(name, arguments)Each station supplies a read callback; the loop owns the JSON-RPC envelope.

The CLI Surface: 38 Commands, 5 Families

Everything above is reachable through one CLI. COMMAND_GROUPS (cli/_common.py) organizes all 38 top-level commands into five families; a test enforces that every command appears in exactly one group, so nothing can be silently un-grouped. This is the map of where each capability lives.

Core memory loop

init
handoff
handoff-template
ingest
memory
doctor
status

Daily operator loop

operator
daily
work
friction
center
runbook
budgets
notifications

Stations and tools

add
skills
tools
pantry
roster
run
runs
dogfood

Review, security, research

security
scrub
untrusted
research
learn
chat
context
projects

Wiring and advanced

release
roadmap
repos
reconfigure
completions
openclaw-fragments
hermes-fragments

From the Brigade to the Whole Kitchen

Brigade is the brigade de cuisine itself — the executive chef and the line — but it does not work alone. The stations you saw attach to managed tools (content-guard, agentpantry, code-search, miseledger, and more), each owning one job and handing off cleanly to the next.

Escoffier Labs Academy. Generated from the deep-dive source of record.