<SYSTEM>This is the abridged developer documentation for SOMA</SYSTEM>

# SOMA

> A persistent agent operating system. Claude Code sessions running 24/7, coordinating through a SQLite-backed priority queue with crash-resumable LLM loops.

## Read next

- **[Quickstart](/project-soma/quickstart/)** — cold-start path from clone to a running daemon and your first job, in about ten minutes.
- **[What is SOMA](/project-soma/what-is-soma/)** — the mental model: substrate → minions → engines → tools, plus the project's vocabulary glossary.
- **[Architecture](/project-soma/architecture/)** — component map, three data-flow walkthroughs, full key-file inventory.
- **[Donor lineage](/project-soma/donor-lineage/)** — what was inherited from cortextOS, ported from gbrain and gstack, and what's deferred.
- **[Agent bootstrap](/project-soma/agent-bootstrap/)** — for an LLM agent (or human dev) opening this repo cold. Read order, edit boundaries, registration patterns.

## For AI agents

The full corpus is published as [`/llms.txt`](/project-soma/llms.txt), [`/llms-full.txt`](/project-soma/llms-full.txt), and [`/llms-small.txt`](/project-soma/llms-small.txt) per the [llms.txt standard](https://llmstxt.org). Append `.md` to any page URL to fetch the source.

## Status

Phase 1 (Minions queue + multi-provider API engine + dashboard submit UI) is essentially complete. Phase 2 (worktree isolation per gstack) is next. Full roadmap and architecture-decision record in [PROJECT_SOMA.md](https://github.com/NulightJens/project-soma/blob/soma/phase-1-minions/PROJECT_SOMA.md) on GitHub.

# Agent bootstrap

> For an LLM agent (or human dev) opening this repo cold. Read order, edit boundaries, registration patterns, write-up protocol.

# Agent bootstrap

You're an LLM agent (or a human dev who works like one) opening this repo cold. This page is your shortest path to "I know enough to be useful here." It complements rather than replaces [CLAUDE.md](../../CLAUDE.md) (the operational harness) and [HANDOFF.md](../../HANDOFF.md) (the live state snapshot).

## Read order on first contact

1. [HANDOFF.md](../../HANDOFF.md) — 30-second resume snapshot. Where the codebase is right now, what works, what's broken, what the immediate next moves are.
2. [CLAUDE.md](../../CLAUDE.md) — operating harness. Line limits, ownership zones, verification discipline, when to delegate, when to check in. **Memorise §3 (hard limits) and §6 (security boundaries).**
3. [what-is-soma.md](./what-is-soma.md) — concept layer. Substrate / minions / engines / tools.
4. [PROJECT_SOMA.md §13 chronicle's last 3 entries](../../PROJECT_SOMA.md) — what the previous sessions did and why.
5. `git log --oneline -10` on the active branch — last 10 commits in raw form.
6. The verify block from [HANDOFF.md §2](../../HANDOFF.md) — confirm tests pass and the daemon is up before you write code.

That gets you from "fresh boot" to "ready to act" in under two minutes.

## Mental model

The single most important fact about this codebase: **state lives outside any one process**. When you're tempted to add an in-memory cache or a "session" or a "context object," check first whether the durable queue (`minion_jobs`), file bus (`~/.soma/<inst>/inbox/`), or persisted message log (`minion_subagent_messages`) already gives you what you need. They almost always do.

The second most important fact: **the queue is the integration point**. New features add either (a) a new handler name registered with `MinionWorker.register`, or (b) a new engine registered with `registerEngine`, or (c) a new provider registered with `registerProvider`, or (d) a new tool factory registered with `registerToolFactory`. The framework around them is already built — extend by registering, not by editing the loop.

## Vocabulary you must internalise

If you can't paraphrase these in one sentence each without reading them, slow down and re-read [what-is-soma.md](./what-is-soma.md) before writing code:

- **handler** vs. **engine** vs. **provider** vs. **tool** — four distinct registries, four distinct extension seams.
- **trusted** vs. **untrusted** submitter — the dashboard, the model's `submit_minion` tool, and any external HTTP/MCP bridge are all untrusted. The CLI with `--trusted` and in-process tests are trusted.
- **subscription engine** vs. **api engine** — different cost surfaces (Claude subscription quota vs. pay-per-token API credits), different env gates.
- **`ctx.signal`** vs. **`ctx.shutdownSignal`** — the first fires on timeout/cancel/lock-loss; the second only on worker SIGTERM/SIGINT. Shell + subscription engines subscribe to both; most other handlers only need `ctx.signal`.
- **port-exempt** commits — when porting from gbrain or gstack, the 300-LOC ceiling is waived but every deviation gets a `// SOMA:` annotation.

## Where to make changes

Use this table when you're about to edit something. If your change crosses two zones, the rule is to **pause and ask** before making the change rather than batching everything in one commit.

| Zone | Edit when | Verification before commit |
|---|---|---|
| `src/minions/**` | Adding queue features, handlers, engines, providers, tools | `npx vitest run tests/minions-*.test.ts` |
| `src/cli/**` | Adding `soma <subcommand>` surface | `node dist/cli.js <cmd> --help`; relevant `tests/cli-*.test.ts` |
| `src/daemon/**` | Daemon supervision, agent lifecycle | `npx tsc --noEmit`; manual `soma start` smoke |
| `src/pty/**` | `claude` subprocess spawning, env allowlist | **HITL required** — security surface |
| `dashboard/**` | Web UI | `cd dashboard && npx tsc --noEmit`; eyeball the route in dev server |
| `src/brain/**` (Phase 6+) | Memory layer | (future — see ADR-004) |
| `tests/**` | Tests for whichever zone | `npm test` |
| `templates/**` | Per-agent scaffolds | Manual: `soma add-agent <name> --template <new>` |
| `PROJECT_SOMA.md` | New ADR or chronicle entry | Append-only — never edit a prior ADR |
| `HANDOFF.md` | After every non-trivial commit | Per its §10 update checklist |

## Where NOT to make changes (without HITL)

These surfaces are security-critical. Any change here pauses for explicit operator review:

- `src/pty/agent-pty.ts` env allowlist — anything in there can leak into the spawned `claude` process.
- `src/cli/install.ts` token plumbing — touches OAuth and Keychain.
- `src/minions/handlers/shell.ts` — RCE-adjacent.
- `src/minions/protected-names.ts` — the gate that blocks untrusted submission of high-stakes handlers.
- The Telegram `ALLOWED_USER` check in agent .env files — single-user authentication.
- Anything in `~/.soma/default/orgs/**/.env` or `secrets.env` — gitignored, never echo, never `git add`.

## Verification discipline

> **Every commit ends with output proof.** Not "tests should pass" — show the green count. Not "type-checks ok" — show the silent `npx tsc --noEmit`. The harness is set up so that "I claim X works" without showing the verification is treated as untrusted.

Before any commit:

```bash
npx tsc --noEmit                          # silent = pass
npx vitest run tests/minions-*.test.ts tests/cli-*.test.ts \
  dashboard/src/app/api/intents/parse/__tests__/pattern-parser.test.ts
# → Tests N passed (N) — N must equal the discipline-suite count from HANDOFF
```

If you touched the dashboard:

```bash
(cd dashboard && npx tsc --noEmit)        # silent = pass
```

If you touched a UI route, also exercise it in a browser and report the result. Type checks are not feature checks.

## Commit hygiene

- **Conventional commit prefix.** `soma:` for feature work, `docs:` for docs, `fix:` for bug fixes, `refactor:` for non-functional cleanup.
- **HEREDOC commit messages.** Always pass the message via `git commit -m "$(cat <<'EOF' ... EOF)"` — preserves blank lines + bullet structure.
- **One feature per commit, ≤ 300 LOC** (port commits exempt, must annotate `// SOMA:`).
- **Commit then push to `soma/phase-N-*`.** Never force-push shared branches.
- **After every non-trivial commit:** update HANDOFF.md per its §10. Snapshot to `docs/handoffs/YYYY-MM-DD-NN-topic.md` at phase milestones. Append to PROJECT_SOMA.md §13 chronicle.

## Hub-and-spoke delegation

The session you're in is the **hub** — strategic decisions, architecture, memory consolidation, integration. When you need to read 3+ files or search across multiple directories to answer a question, **spawn a spoke agent** (Explore for fast searches, general-purpose for multi-file research). Don't burn the hub's context on bulk reading. See [CLAUDE.md §7](../../CLAUDE.md).

## Patterns you'll see often

### Registering a new engine

```ts
// In src/minions/handlers/engines/<your-engine>.ts
import { registerEngine, type RunnerEngine } from '../registry.js';

export const myEngine: RunnerEngine = {
  name: 'my-engine',
  async run(ctx, params) {
    // ... implementation ...
    return { engine: 'my-engine', result: '...', /* ... */ };
  },
};

registerEngine(myEngine);
```

Then add a side-effect import in `runner.ts` so it loads.

### Registering a new provider (under the api engine)

```ts
// In src/minions/handlers/engines/api/providers/<your-provider>.ts
import { registerProvider } from './registry-leaf.js';

registerProvider({
  name: 'my-provider',
  rateKey: () => 'my-provider:api',
  async runTurn(req) { /* ... */ },
});
```

Then side-effect import in `providers/index.ts`.

### Registering a new tool

```ts
// In src/minions/handlers/engines/api/tools/<your-tool>.ts
import { registerToolFactory } from './registry-leaf.js';

registerToolFactory('my_tool', (queue) => ({
  name: 'my_tool',
  description: '...',
  input_schema: { type: 'object', properties: {...} },
  idempotent: false,
  async execute(input, ctx) { /* ... */ },
}));
```

Then side-effect import in `tools/registry.ts`.

### Adding a new built-in handler

```ts
// In src/cli/job-handlers.ts BUILTIN_HANDLERS or behind an env gate in resolveBuiltinHandlers
const myHandler: MinionHandler = async (ctx) => { /* ... */ };

export function resolveBuiltinHandlers(): Record<string, MinionHandler> {
  const handlers: Record<string, MinionHandler> = { ...BUILTIN_HANDLERS };
  if (process.env.SOMA_ALLOW_MY_HANDLER === '1') {
    handlers.my_handler = myHandler;
  }
  return handlers;
}
```

If the handler is high-stakes (RCE, network, file I/O), **gate it** behind an env flag AND consider adding the name to `protected-names.ts`.

### TDZ-safe registry pattern

Anywhere a module self-registers at load time, the storage map must live in a leaf module with zero further imports. We've hit this exact issue three times (engines, providers, tools) — the leaf-orchestrator split is in `src/minions/handlers/registry.ts` + `runner.ts` (engines), `providers/registry-leaf.ts` + `providers/index.ts` (providers), `tools/registry-leaf.ts` + `tools/registry.ts` (tools). Use the same pattern for any future registry.

## When you're stuck

| Symptom | Action |
|---|---|
| You're not sure if a behaviour is intentional | `git log -p -- <file>` and read the commit message that introduced it |
| You don't know what donor a piece came from | Search [donor-lineage.md](./donor-lineage.md) |
| The test discipline-suite has fewer cases than HANDOFF claims | Tests added since the last HANDOFF update — check `git log` and update HANDOFF |
| `tsc --noEmit` fails after your change | Read the error from the bottom up — TS error chains often have the actionable diagnostic at the end |
| ESM TDZ error during test load | You hit the registry-storage / self-register cycle — extract to a leaf module |
| A change spans pty / shell / protected-names | **Stop.** Pause and ask the operator before continuing. |

## Writing up your work

When you finish a non-trivial slot:

1. Make the commit (≤300 LOC; port-exempt commits annotated).
2. Push to the active branch.
3. **Update HANDOFF.md** per its §10:
   - §1 Resume in 30s — new branch tip + green signals
   - §3 file map — new files
   - §5 commit timeline — prepend the commit
   - §8 next moves — remove what you did, add what you discovered
4. **Append to PROJECT_SOMA.md §13 chronicle** with: date, what changed, what's next.
5. **At a phase milestone**, snapshot HANDOFF.md to `docs/handoffs/YYYY-MM-DD-NN-topic.md`.
6. **At ~500K context used**, prepare a resume-prompt bundle per [docs/handoffs/TEMPLATE-resume-prompt.md](../handoffs/TEMPLATE-resume-prompt.md) before the window exhausts.

## What good work looks like here

- A focused commit that makes one thing work, with tests that prove it.
- An honest verification report — actual command output, not paraphrase.
- An updated HANDOFF that the next session can read cold.
- A chronicle entry that captures the "why," not just the "what."
- Zero unprompted force-pushes, zero merged-but-failing tests, zero `--trusted` submissions from untrusted surfaces.

If you do all of those, the operator's trust budget grows and you get more autonomy on the next slot. If you skip verification or claim "done" without proof, the budget shrinks and the next slot turns into a HITL checkpoint.

---

Welcome to SOMA. Now go read [HANDOFF.md](../../HANDOFF.md).

# Architecture

> Component map, data flow, key file paths. Each section links to the ADR that introduced the design.

# Architecture

Component map, data flow, key file paths. Each section links to the ADR that introduced the design when relevant — the ADR log lives in [PROJECT_SOMA.md §10](../../PROJECT_SOMA.md).

## Component map

```
                       ┌──────────────────────────────────┐
   Telegram   ─────►   │       soma-daemon (PM2)          │
   (operator)          │                                  │
                       │  ┌────────────┐  ┌────────────┐  │
                       │  │ agent PTY  │  │ agent PTY  │  │  ← `claude` subprocesses
                       │  │  (system)  │  │ (analyst)  │  │     via node-pty
                       │  └────────────┘  └────────────┘  │
                       │           ▲       ▲              │
                       │           │       │              │
   Phone   ────────►   │      ┌────┴───────┴────┐         │
   (Telegram)          │      │   file bus      │         │  ← atomic-write
                       │      │  (events,       │         │     filesystem messages
                       │      │   approvals)    │         │
                       │      └─────────────────┘         │
                       │           ▲                       │
                       └───────────┼───────────────────────┘
                                   │
                                   │  IPC (Unix socket)
                                   │
                       ┌───────────┴──────────────────────┐
   Browser  ────►      │  SOMA-dashboard (PM2)            │
                       │  Next.js 16 + Tailwind v4        │
                       │  - /jobs (list)                   │
                       │  - /jobs/submit (Freeform/Adv)    │
                       │  - /agents, /experiments, ...     │
                       └──────────────────────────────────┘
                                   │
                                   │  shell-out: `soma jobs ...`
                                   ▼
   ┌─────────────────────────────────────────────────────────────────────┐
   │                     soma-jobs-worker (PM2)                          │
   │                                                                     │
   │   poll loop  ─►  claim ─►  handler dispatch                         │
   │                              │                                     │
   │                              ▼                                     │
   │   ┌───────────────────────────────────────────────────────┐        │
   │   │  Handlers (job.name → fn)                             │        │
   │   │   echo / noop / sleep        (always on)              │        │
   │   │   shell                       (SOMA_ALLOW_SHELL_JOBS) │        │
   │   │   subagent / subagent_aggreg. (SOMA_ALLOW_SUBAGENT)   │        │
   │   └───────────────────────────────────────────────────────┘        │
   │                              │                                     │
   │                              ▼ (subagent only)                     │
   │   ┌───────────────────────────────────────────────────────┐        │
   │   │  runnerHandler — dispatches by data.engine            │        │
   │   │                                                       │        │
   │   │   subscription engine    api engine                   │        │
   │   │      │                       │                        │        │
   │   │      ▼                       ▼                        │        │
   │   │   spawn `claude -p`     Provider seam                 │        │
   │   │   parse NDJSON           ├── anthropic (SDK)          │        │
   │   │                          ├── openai (fetch)           │        │
   │   │                          └── custom (env config)      │        │
   │   └───────────────────────────────────────────────────────┘        │
   └─────────────────────────────────────────────────────────────────────┘
                                   │
                                   ▼
   ┌─────────────────────────────────────────────────────────────────────┐
   │                       Minions queue (SQLite)                        │
   │                                                                     │
   │   minion_jobs                  ← the queue itself                   │
   │   minion_inbox                 ← per-job message inbox              │
   │   minion_attachments           ← BLOB storage per job               │
   │   minion_rate_leases           ← engine-owned advisory locks        │
   │   minion_subagent_messages     ← API engine: replay log             │
   │   minion_subagent_tool_executions ← API engine: two-phase ledger    │
   └─────────────────────────────────────────────────────────────────────┘
```

## Process supervision

PM2 supervises three Node processes. None of them shares an event loop — work flows between them through the SQLite queue or Unix sockets, never through shared memory.

| App | Script | Role |
|---|---|---|
| `soma-daemon` | `dist/daemon.js` | Owns agent registry, spawns/restarts agent PTYs, polls Telegram, runs cron, serves IPC over `~/.soma/default/daemon.sock` |
| `SOMA-dashboard` | `npm run dev` (in `dashboard/`) | Next.js dev server. Reads the queue DB read-only; writes flow through `soma jobs ...` shell-outs |
| `soma-jobs-worker` | `dist/cli.js jobs work` | Polls the queue, claims one job at a time (configurable concurrency), dispatches to a handler, persists the result |

ADRs: [ADR-001](../../PROJECT_SOMA.md) (fork in place), [ADR-015](../../PROJECT_SOMA.md) (PM2 app naming + state-dir layout).

## State directories

```
~/.soma/<instance>/                       # canonical (was ~/.cortextos before ADR-015)
├── minions.db                            # SQLite queue + inbox + attachments + subagent state
├── daemon.sock                           # Unix socket: dashboard ↔ daemon IPC
├── dashboard.env                         # auto-generated NextAuth credentials
├── config/
│   └── enabled-agents.json               # which agents the daemon should keep alive
├── orgs/
│   └── <org>/
│       ├── secrets.env
│       └── agents/
│           └── <agent>/
│               ├── .env                  # per-agent env (BOT_TOKEN, ALLOWED_USER, etc.)
│               ├── IDENTITY.md
│               ├── SOUL.md
│               ├── GOALS.md
│               └── MEMORY.md
├── state/
│   └── <agent>/
│       ├── heartbeat.json
│       └── (agent-specific transient state)
├── inbox/<agent>/                        # per-agent file-bus messages
└── logs/<agent>/                         # rolling logs
```

`~/.cortextos` symlinks to `~/.soma` for backward compat with any external script that hadn't migrated.

## Data flow walkthroughs

### 1. Operator submits a job from the dashboard

```
1. User opens /jobs/submit, types "sleep 5 seconds" in Freeform tab.
2. POST /api/intents/parse → pattern matcher → {name: 'sleep', data: {ms: 5000}}.
3. UI renders confirmation card. User clicks "Confirm and submit".
4. POST /api/jobs/submit → validates input (no protected names), spawns
   `soma jobs submit sleep --data '{"ms":5000}' --json`.
5. CLI: openSqliteEngine → MinionQueue.add(...) (untrusted; protected-name
   gate runs but the name isn't protected so it passes).
6. Row inserted into minion_jobs with status='waiting', priority=0.
7. CLI emits the new job's JSON; Next.js route forwards it to the UI;
   UI redirects to /jobs?focus=<id>.
8. soma-jobs-worker poll loop sees the new row on next tick, claims it
   (status → 'active', lock_token = uuid, lock_until = now + 30s).
9. Handler dispatch: data.name === 'sleep', so sleepHandler runs;
   awaits 5000ms with cooperative abort wiring.
10. On return: queue.completeJob(id, lockToken, {slept_ms: 5000}).
11. /jobs page auto-refreshes (5s interval) and shows status='completed'.
```

ADR: [ADR-014](../../PROJECT_SOMA.md) (user-facing-edge filter — Freeform parser + structured Advanced fallback).

### 2. Subagent calls the api engine with the OpenAI provider

```
1. Operator submits via CLI:
   soma jobs submit subagent --trusted --data '{
     "engine": "api",
     "provider": "openai",
     "model": "gpt-4o-mini",
     "prompt": "Hello"
   }'
2. Worker claims; handler = runnerHandler (registered under 'subagent'
   when SOMA_ALLOW_SUBAGENT_JOBS=1).
3. runnerHandler reads data.engine='api' → getEngine('api') → api engine.
4. api engine checks SOMA_ALLOW_API_ENGINE=1 (cost-surface gate).
5. runApiLoop() — checks ctx.subagent (worker wired it), loads any prior
   messages (none on first run), persists seed user message.
6. Provider lookup: getProvider('openai') → makeOpenAiProvider() instance.
7. engine.acquireLock('api:openai:chat', 30000) — rate-lease around the
   outbound call.
8. provider.runTurn() — fetches OPENAI_API_KEY from env, builds
   /v1/chat/completions request body, fetches, parses choice[0].message.
9. Token usage extracted; ctx.updateTokens(...) writes to minion_jobs.
10. Assistant message persisted to minion_subagent_messages.
11. No tool_use blocks → loop exits with stop_reason='end_turn',
    final_text from content_blocks.
12. queue.completeJob(...) with the RunnerResult shape; row → 'completed'.
```

ADR: [ADR-008](../../PROJECT_SOMA.md) (subscription-first, api opt-in), [ADR-012](../../PROJECT_SOMA.md) (Provider seam).

### 3. A subagent submits a child job mid-loop using the `submit_minion` tool

```
1. Subagent is in mid-conversation; the model emits a tool_use block:
   {tool: "submit_minion", input: {name: "echo", data: {msg: "hello"}}}.
2. Loop intercepts, looks up the tool factory in the registry → bound to
   the live MinionQueue at worker construction.
3. submitMinion executor calls queue.add('echo', {msg:'hello'}, {parent_job_id: ctx.jobId})
   — UNTRUSTED (no allowProtectedSubmit), so 'shell'/'subagent'/'subagent_aggregator'
   would bounce. 'echo' is fine.
4. Two-phase ledger: minion_subagent_tool_executions row inserted with
   status='pending', then updated to 'complete' with {job_id, status: 'waiting'}.
5. Result is wrapped as a tool_result content block, fed into the next
   provider turn.
6. (Independently) The worker eventually claims the new echo job, runs
   it, posts a child_done message into the parent's minion_inbox.
7. The parent subagent can read it via the `read_own_inbox` tool.
```

ADR: [ADR-014](../../PROJECT_SOMA.md) (untrusted submitter invariant), tools detail in [src/minions/handlers/engines/api/tools/builtin.ts](../../src/minions/handlers/engines/api/tools/builtin.ts).

## Key file paths

### Substrate (cortextOS upstream — still active)

| File | Purpose |
|---|---|
| `src/daemon/index.ts` | Daemon entry point; spawns agent supervisors, telegram poller, IPC server |
| `src/daemon/agent-manager.ts` | Per-agent lifecycle: spawn PTY, watch heartbeat, restart on death |
| `src/pty/agent-pty.ts` | `claude` subprocess via `node-pty`; reads OAuth from Keychain or env |
| `src/bus/` | File-bus message types + atomic-write helpers |
| `src/cli/index.ts` | Commander root; registers all `soma <subcommand>` |
| `src/cli/ecosystem.ts` | Generates `ecosystem.config.js` from current org/agent state |

### Minions queue (Phase 1 ports)

| File | LOC | Purpose |
|---|---|---|
| `src/minions/types.ts` | ~400 | All job/inbox/attachment/subagent types + row mappers |
| `src/minions/schema.sql` | ~230 | DDL: 6 tables + indexes + update trigger |
| `src/minions/engine.ts` | ~75 | `QueueEngine` interface (sqlite/pglite/postgres/d1) |
| `src/minions/engine-sqlite.ts` | ~235 | better-sqlite3 implementation; advisory locks via `BEGIN IMMEDIATE` |
| `src/minions/queue.ts` | ~1500 | `MinionQueue` class — state machine + helpers + subagent persistence |
| `src/minions/worker.ts` | ~440 | `MinionWorker` — claim/run/complete loop + ctx wiring |
| `src/minions/attachments.ts` | ~110 | Pure validation; CRUD lives in queue.ts |
| `src/minions/protected-names.ts` | ~40 | Constant + helper; gate enforced in queue.add |
| `src/minions/handlers/shell.ts` | ~310 | Shell handler (env-allowlisted, kill-laddered) |
| `src/minions/handlers/registry.ts` | ~80 | Engine registry (leaf module) |
| `src/minions/handlers/runner.ts` | ~95 | Unified handler — dispatches by data.engine |

### LLM-loop engines

| File | Purpose |
|---|---|
| `src/minions/handlers/engines/subscription.ts` | claude CLI subprocess + NDJSON parser; default engine (ADR-008) |
| `src/minions/handlers/engines/api.ts` | API engine factory + queue binding + cost-surface gate |
| `src/minions/handlers/engines/api/loop.ts` | Provider-neutral multi-turn loop with crash-resumable replay |
| `src/minions/handlers/engines/api/types.ts` | `Provider`, `ApiToolDef`, `ProviderHttpError` |
| `src/minions/handlers/engines/api/providers/registry-leaf.ts` | Provider registry storage (TDZ-safe leaf) |
| `src/minions/handlers/engines/api/providers/anthropic.ts` | Anthropic SDK provider (lazy-imported) |
| `src/minions/handlers/engines/api/providers/openai.ts` | OpenAI-compatible provider (native fetch) |
| `src/minions/handlers/engines/api/providers/custom.ts` | `SOMA_API_CUSTOM_PROVIDERS` env loader |
| `src/minions/handlers/engines/api/tools/registry-leaf.ts` | Tool factory registry (TDZ-safe leaf) |
| `src/minions/handlers/engines/api/tools/builtin.ts` | `submit_minion`, `send_message`, `read_own_inbox` |

### Dashboard

| File | Purpose |
|---|---|
| `dashboard/src/app/(dashboard)/jobs/page.tsx` | List + auto-refresh + status filters + detail sheet (ADR-014) |
| `dashboard/src/app/(dashboard)/jobs/submit/page.tsx` | Freeform + Advanced submit UI |
| `dashboard/src/app/api/jobs/route.ts` | GET /api/jobs (list + stats) |
| `dashboard/src/app/api/jobs/[id]/route.ts` | GET + POST per-job (action: cancel \| retry) |
| `dashboard/src/app/api/jobs/submit/route.ts` | POST untrusted submit; shells out to CLI |
| `dashboard/src/app/api/intents/parse/route.ts` | POST freeform-text → structured intent |
| `dashboard/src/app/api/intents/parse/pattern-parser.ts` | Deterministic pattern matcher |
| `dashboard/src/components/ui/soma-mark.tsx` | Brand mark SVG (black circle + triangle) |
| `dashboard/src/lib/data/minions.ts` | Read-only better-sqlite3 access to the queue DB |
| `dashboard/src/lib/data/cortextos-cli.ts` | CLI resolver for shell-outs |

## Test surfaces

| Suite | Coverage |
|---|---|
| `tests/minions-engine.test.ts` | SQLite engine: schema, CRUD, idempotency, locks, tx |
| `tests/minions-queue.test.ts` | All MinionQueue state transitions + DAG + stall + cancel |
| `tests/minions-worker.test.ts` | Worker registry, claim/run/complete, retry, SIGKILL rescue |
| `tests/minions-attachments.test.ts` | Pure validation + queue CRUD round-trip |
| `tests/minions-protected-names.test.ts` | Membership + queue gate + trim-evasion |
| `tests/minions-shell-handler.test.ts` | Shell handler validation + execution + abort |
| `tests/minions-runner.test.ts` | Engine registry + dispatch + subscription engine integration |
| `tests/minions-api-engine.test.ts` | API loop + Anthropic provider + replay reconciliation |
| `tests/minions-api-openai.test.ts` | OpenAI translators + custom-endpoint loader |
| `tests/minions-api-tools.test.ts` | Tool registry + 3 builtin tools against real queue |
| `tests/cli-job-handlers.test.ts` | Built-in handler behaviour |
| `tests/cli-jobs-sigkill-rescue.test.ts` | Real subprocess SIGKILL → stall sweep regression |
| `dashboard/.../pattern-parser.test.ts` | Deterministic intent parser |

Discipline: 202/202 pass after Phase 1 closeout. Run with `npx vitest run tests/minions-*.test.ts tests/cli-*.test.ts dashboard/src/app/api/intents/parse/__tests__/pattern-parser.test.ts`.

## Where decisions live

| You want to know... | Read |
|---|---|
| Why a thing was built this way | [PROJECT_SOMA.md §10 ADR log](../../PROJECT_SOMA.md) |
| What changed yesterday | [PROJECT_SOMA.md §13 chronicle](../../PROJECT_SOMA.md) |
| What state we're in right now | [HANDOFF.md](../../HANDOFF.md) |
| How to write code on this repo | [CLAUDE.md](../../CLAUDE.md) |
| What the donor codebases gave us | [donor-lineage.md](./donor-lineage.md) |

Next reading: [agent-bootstrap.md](./agent-bootstrap.md) if you're about to make changes.

# Donor lineage

> What was inherited from cortextOS, ported from gbrain and gstack, and what's deferred to later phases.

# Donor lineage

SOMA is a fork-and-evolve project. Three open-source codebases provided the load-bearing patterns; one provided the harness template. All MIT-licensed. This page is the source of truth for who contributed what — useful when reviewing a port commit, when checking whether a behaviour is original-to-cortextOS or new-to-SOMA, or when deciding which donor to consult for a future feature.

## At a glance

| Donor | Author | Role | License | Status in SOMA |
|---|---|---|---|---|
| **cortextOS** | grandamenium | Substrate (daemon, PTY, file bus, dashboard) | MIT | Forked; still upstream-trackable via `git fetch upstream` |
| **gbrain** | Garry Tan | Minions queue + tool runtime + memory primitives | MIT | Queue + handlers ported in Phase 1; memory deferred to Phase 6 |
| **gstack** | Garry Tan | Subprocess pattern + worktree isolation + skill format | MIT | Subprocess pattern ported in Phase 1; worktrees deferred to Phase 2; skill format adopted Phase 5 |
| **graphify** | Safi Shamsi | Codebase enrichment (tree-sitter AST + clustering) | MIT | Adopted as enrichment pipeline; lands Phase 6 |
| **claudecode-harness** | anothervibecoder-s | CLAUDE.md template | MIT | Adopted verbatim as the SOMA `CLAUDE.md` (ADR-013) |

Donor repos:
- cortextOS: https://github.com/grandamenium/cortextos
- gbrain: https://github.com/garrytan/gbrain (cloned at `/tmp/gbrain` for Phase-1 porting)
- gstack: https://github.com/garrytan/gstack (cloned at `/tmp/gstack`)
- graphify: https://github.com/safishamsi/graphify (cloned at `/tmp/graphify`)
- claudecode-harness: https://github.com/anothervibecoder-s/claudecode-harness (cloned at `/tmp/claudecode-harness`)

---

## cortextOS — substrate (still active)

What we got from cortextOS and kept verbatim. Most of `src/` outside `src/minions/` is upstream-original.

| Subsystem | Path | Purpose |
|---|---|---|
| Daemon | `src/daemon/` | Long-running supervisor process. Holds the agent registry, restarts dead PTYs, polls Telegram, runs cron, handles IPC over a Unix socket. |
| Agent PTY | `src/pty/agent-pty.ts` | Spawns `claude` CLI through `node-pty`. Reads OAuth from macOS Keychain (or `CLAUDE_CODE_OAUTH_TOKEN` for headless). Env-allowlisted to prevent secret leakage into the subprocess. |
| File bus | `src/bus/` | Atomic-write filesystem-backed message bus. Carries events, heartbeats, telegram messages, and approvals between agents. |
| CLI | `src/cli/` | `cortextos init / add-agent / start / status / dashboard / ecosystem / install / doctor`. Adopted; `soma jobs ...` (added in Phase 1) follows the same Commander pattern. |
| Dashboard runtime | `dashboard/` | Next.js 16 + React 19 + Tailwind v4 + shadcn + `@base-ui/react`. Auth gate, sidebar, monochrome theme (post ADR-010). |
| PM2 ecosystem generator | `src/cli/ecosystem.ts` | Auto-generates `ecosystem.config.js` from current org + agent set. |
| Templates | `templates/` | Per-agent markdown scaffolds (IDENTITY / SOUL / GOALS / SKILLS / GUARDRAILS). Used by `soma add-agent`. |

What we changed in upstream code:

- Display rebrand: `cortextOS` → `SOMA` across user-visible prose, dashboard UI, docs, package metadata. (ADR-015 Tier A.)
- State-dir rename: `~/.cortextos/` → `~/.soma/` with backward-compat symlink. (ADR-015 Tier B.)
- PM2 app names: `cortextos-daemon` → `soma-daemon`, etc. (ADR-015 Tier C.)
- `soma` bin alias added alongside `cortextos`.

What we did NOT change (and why):
- The daemon itself, agent PTY, and file bus internals — they work, they're upstream-mergeable, and the queue layers on top rather than replacing them.
- Org template schema — Phase 5 will revisit when we unify with gstack's `SKILL.md` format.

---

## gbrain — Minions queue + tool runtime (Phase 1 ported)

The big port. gbrain's `src/core/minions/` is what makes SOMA durable.

| File in donor | LOC | Status | SOMA destination |
|---|---|---|---|
| `minions/queue.ts` | 1152 | Ported | `src/minions/queue.ts` (~1380 LOC after attachment + protected-name + subagent-persistence helpers) |
| `minions/worker.ts` | 415 | Ported | `src/minions/worker.ts` (~440 LOC after subagent ctx wiring) |
| `minions/types.ts` | 287 | Adapted | `src/minions/types.ts` (Postgres types → SQLite affinities; subagent persistence types added) |
| `minions/backoff.ts` | 34 | Verbatim | `src/minions/backoff.ts` |
| `minions/stagger.ts` | 33 | Verbatim | `src/minions/stagger.ts` |
| `minions/quiet-hours.ts` | 86 | Verbatim | `src/minions/quiet-hours.ts` |
| `minions/attachments.ts` | 110 | Verbatim | `src/minions/attachments.ts` |
| `minions/protected-names.ts` | 28 | Verbatim | `src/minions/protected-names.ts` |
| `minions/handlers/shell.ts` | 311 | Ported | `src/minions/handlers/shell.ts` (env-gate renamed; otherwise unchanged) |
| `minions/handlers/subagent.ts` | 710 | Ported + abstracted | `src/minions/handlers/engines/api/loop.ts` + `providers/anthropic.ts` (Provider seam added; subagent_rate_leases table replaced by SOMA's `engine.acquireLock`) |
| `minions/tools/brain-allowlist.ts` | — | Deferred | Phase 6 (brain-derived tool registry); SOMA Phase 1 ships 3 minimal queue-internal tools instead |
| `core/cycle.ts` | — | Deferred | Phase 6 (`runCycle` maintenance primitive) |
| `core/fail-improve.ts` | — | Deferred | Phase 7 (self-improvement retros) |
| `core/memory.ts` | — | Deferred | Phase 6 (markdown-first memory) |

### Adaptations from gbrain Postgres → SOMA SQLite

| gbrain (Postgres) | SOMA (SQLite) | Why |
|---|---|---|
| `BIGSERIAL` PK | `INTEGER PRIMARY KEY` | SQLite rowid alias; same auto-increment semantics |
| `TIMESTAMPTZ` | `INTEGER` (Unix ms) | SQLite has no native timezone-aware time; centralise on Unix-ms-as-number throughout |
| `JSONB` | `TEXT` (JSON-encoded) | App-side `JSON.stringify` on write, `JSON.parse` on read at the row-mapper boundary |
| `now()` SQL | `engine.now()` JS | Centralised clock so tests can inject |
| `FOR UPDATE SKIP LOCKED` | dropped — `BEGIN IMMEDIATE` serialises writers | Single-writer SOMA preserves correctness; Postgres engine (Phase 7) restores SKIP LOCKED |
| `pg_advisory_xact_lock` | sentinel-row pattern in `minion_rate_leases` | Postgres engine (Phase 7) restores `pg_advisory_xact_lock` |
| `count(*) FILTER (WHERE cond)` | `SUM(CASE WHEN cond THEN 1 ELSE 0 END)` | Same semantics, broader compatibility |
| `to_jsonb($x::text) \|\| stacktrace` | JS-side read-parse-push-write inside `BEGIN IMMEDIATE` tx | SQLite has no JSONB append operator |
| Anthropic-specific subagent rate-leases table | reused `engine.acquireLock(...)` over `minion_rate_leases` | Avoid two parallel rate-lease implementations (ADR-012) |

Every deviation is annotated `// SOMA:` inline at the call site.

---

## gstack — subprocess pattern (Phase 1 partial; worktrees Phase 2)

| File in donor | Status | SOMA destination |
|---|---|---|
| `test/helpers/session-runner.ts` (NDJSON spawn) | Ported (adapted) | `src/minions/handlers/engines/subscription.ts` — `spawn('claude', ['-p', '--output-format', 'stream-json', ...])`, pure `ingestNDJSONLine` parser, kill ladder on both `ctx.signal` and `ctx.shutdownSignal` |
| `lib/worktree.ts` (`WorktreeManager`) | Pending Phase 2 | Will live at `src/minions/worktrees/` — per-job git worktree create/cleanup hooks around handler invoke |
| `autoplan/SKILL.md` (skill format) | Pending Phase 5 | Will become the canonical SOMA skill format (gstack/gbrain converge on the same shape) |
| `lib/gen-skill-docs.ts` | Pending Phase 5 | Skill catalog generation |
| `ETHOS.md` | Reference only | Informed CLAUDE.md and ADR-011 wording |

What gstack contributed conceptually:

- The "claude as subprocess" pattern. Rather than embedding the model in your process, you spawn the CLI, hand it a prompt over stdin, parse NDJSON from stdout. Easier to kill, easier to compose, easier to switch to a different binary.
- The kill ladder discipline (SIGTERM → grace period → SIGKILL) wired to both timeout-style and shutdown-style abort signals. Lifted into the shell handler and subscription engine.
- The worktree-per-job invariant for parallel safety. Not yet shipped — Phase 2.

---

## graphify — enrichment pipeline (Phase 6 pending)

What we'll adopt from graphify when the brain layer lands:

| Capability | Source path | Use in SOMA |
|---|---|---|
| Tree-sitter AST extraction (25 languages) | `graphify/extract/` | Brain-enricher: index codebases into typed-edge graph |
| Leiden clustering on the graph | `graphify/build/` | Cluster related concepts for retrieval |
| Multimodal ingest (PDF/audio/video transcription) | `graphify/ingest/` | Optional skill-driven ingestion paths |
| God-node analytics | `graphify/serve/` | Identify central concepts during summarisation |

What we will NOT adopt:
- Graphify as the memory backend itself. SOMA stores memory in markdown + the brain graph (typed edges over rows in `minion_jobs.result` + a future `learnings` table). Graphify enriches; gbrain-style storage is canonical.

---

## claudecode-harness — CLAUDE.md template (adopted)

`anothervibecoder-s/claudecode-harness` provides the 10-section CLAUDE.md template structure (Platform & Mission / Ownership / Hard Limits / Local-First / Data Discipline / Env & Security / Hub-Spoke / Memory & Retros / DB Rules / ADR Habit). SOMA adopted it verbatim as `CLAUDE.md` and filled it in for our stack. See ADR-013 for context.

---

## What Phase 1 effectively accomplished from this lineage

Reading the rows above as a delta:

- **From cortextOS:** the daemon + PTY + file bus + dashboard remain intact and upstream-mergeable. We rebranded display surfaces and migrated state dirs but did not rewrite the runtime.
- **From gbrain:** the entire Minions queue + worker + attachment + protected-names runtime + the full subagent loop with crash-resumable replay. The 710-LOC subagent handler was abstracted behind a Provider seam so OpenAI / OpenAI-compat / custom HTTP endpoints drop in without code changes.
- **From gstack:** the `claude -p` NDJSON subprocess pattern with kill ladders. The `WorktreeManager` is staged for Phase 2.
- **From graphify:** nothing yet — staged for Phase 6.
- **From claudecode-harness:** the CLAUDE.md template that runs the dev loop in this very repo.

The mechanical sum: ~3000 LOC ported (gbrain queue + handlers + subagent), ~500 LOC adapted (gstack subprocess pattern), ~150 LOC of new abstraction (Provider seam + tool registry + ctx.subagent), ~430 LOC of UI (submit page + intent parser + API routes). All under MIT, all runnable today.

Next reading: [architecture.md](./architecture.md) for how these pieces fit together at runtime.

# Quickstart

> Cold-start path from clone to a running daemon and your first job, in about ten minutes.

# Quickstart

Cold-start path from "I just cloned this" to "I have an agent running and can submit a job from the dashboard." ~10 minutes on a clean macOS machine; Linux is similar with package-manager substitutions.

## Prerequisites

| Dependency | Why | Install |
|---|---|---|
| Node.js 20+ | Runtime | https://nodejs.org or `brew install node` |
| Git | Source control | `xcode-select --install` (macOS) or your distro's package manager |
| `claude` CLI + OAuth | Subscription engine spawns this; agent PTYs use it | `npm install -g @anthropic-ai/claude-code` then `claude login` |
| PM2 (optional) | Process supervisor for the daemon + dashboard | `npm install -g pm2` |
| Telegram bot (optional) | Phone control surface | Create one through [@BotFather](https://t.me/BotFather) |
| Anthropic / OpenAI API key (optional) | Only if using the `api` engine instead of `subscription` | https://console.anthropic.com (or your provider) |

You can run SOMA without PM2, Telegram, or API keys — the queue + dashboard work standalone. The full setup below shows the fleet path.

## 1. Clone + install

```bash
git clone https://github.com/NulightJens/project-soma.git ~/cortextos
cd ~/cortextos
npm install
npm run build
npm link                                 # exposes `soma` and `cortextos` globally

# Verify
which soma                               # /Users/<you>/.nvm/.../bin/soma (or similar)
soma --version                           # 0.1.x
```

If `npm link` fails with permission errors, run `npm config get prefix` and ensure you own that directory. (Or use `nvm` so the prefix lives under `~`.)

## 2. Sanity-check the queue without a daemon

The queue + worker work standalone — no daemon, no Telegram, no dashboard, no LLM. This is the fastest "is anything broken" test.

```bash
TMPDB=$(mktemp -d)/smoke.db

# Submit a no-op job
soma jobs submit echo --data '{"msg":"hello"}' --db "$TMPDB" --json

# Run a worker that drains the queue
soma jobs work --db "$TMPDB" --handlers echo,noop,sleep --poll-interval 500 &
WORKER_PID=$!
sleep 2
kill $WORKER_PID

# Inspect the result
soma jobs get 1 --db "$TMPDB"
# Expect: status=completed, result={"echoed":{"msg":"hello"},"attempt":1}
```

If that round-trips, the queue + worker + handler dispatch are healthy.

## 3. Initialise SOMA on this machine

```bash
soma install                              # creates ~/.soma/<instance>/ state dirs
```

State now lives at `~/.soma/default/` (with a `~/.cortextos` symlink for backward compat). Inside:

```
~/.soma/default/
├── config/
│   └── enabled-agents.json              # which agents are active
├── orgs/                                # per-org agent dirs (created by soma init <org>)
├── state/                               # per-agent heartbeat + transient state
├── inbox/                               # per-agent message inbox
└── logs/                                # per-agent rolling logs
```

## 4. Bring up your first org + agent

An org is a namespace. An agent is a per-org persistent Claude session. The `system` template is the simplest — one orchestrator, no specialists.

```bash
soma init myorg
soma add-agent boss --template orchestrator --org myorg

# Optional: Telegram credentials
cat > ~/cortextos/orgs/myorg/agents/boss/.env <<'EOF'
BOT_TOKEN=<your-bot-token>
CHAT_ID=<your-telegram-chat-id>
ALLOWED_USER=<your-telegram-user-id>
EOF
chmod 600 ~/cortextos/orgs/myorg/agents/boss/.env

soma enable boss
```

## 5. Generate the PM2 ecosystem and start the fleet

```bash
soma ecosystem                            # generates ~/cortextos/ecosystem.config.js
pm2 start ecosystem.config.js
pm2 save                                  # persist across reboots
pm2 startup                                # if you want it to survive a reboot — follow the printed sudo command
```

You should see three apps come online:
- `soma-daemon` — supervises agent PTYs, polls Telegram, runs cron, serves the dashboard's IPC socket
- `SOMA-dashboard` — Next.js dev server on port 3000
- `soma-jobs-worker` — drains the Minions queue (defaults to handlers `echo,noop,sleep` — add more via `SOMA_WORKER_HANDLERS=echo,noop,sleep,shell` etc.)

```bash
pm2 list                                  # all three should be 'online'
soma status                               # human-readable agent fleet state
```

## 6. Open the dashboard

```bash
open http://localhost:3000
```

The default admin password is generated on first run and written to `~/.soma/default/dashboard.env` — open that file or `pm2 logs SOMA-dashboard` and look for the seeded credential. Or set your own:

```bash
echo 'ADMIN_USERNAME=admin' >> ~/cortextos/dashboard/.env.local
echo 'ADMIN_PASSWORD=<your-strong-password>' >> ~/cortextos/dashboard/.env.local
echo 'SYNC_ADMIN_PASSWORD=true' >> ~/cortextos/dashboard/.env.local
pm2 restart SOMA-dashboard
```

(The `SYNC_ADMIN_PASSWORD=true` flag forces a hash refresh on next sign-in; remove or set to `false` afterwards so subsequent restarts don't keep overwriting.)

## 7. Submit a job from the dashboard

Sign in, navigate to **Jobs** in the sidebar, click **New job**.

- **Freeform tab** — type `sleep 5 seconds` → Parse intent → Confirm and submit. Watch the row appear on `/jobs` and transition `waiting → active → completed` over ~5s.
- **Advanced tab** — handler `echo`, data `{"msg": "hi"}`, click Submit.

Protected handler names (`shell`, `subagent`, `subagent_aggregator`) won't go through the dashboard — they require the operator CLI with `--trusted`.

## 8. (Optional) Try the API engine

The `api` engine costs pay-per-token credits, so it's gated behind a separate flag. Setup:

```bash
# Add the gate to the worker process's env (in ecosystem.config.js or your shell)
export SOMA_ALLOW_SUBAGENT_JOBS=1                # registers the runner handler
export SOMA_ALLOW_API_ENGINE=1                   # cost-surface gate
export ANTHROPIC_API_KEY=sk-ant-...              # default provider key

# Run a worker that includes the subagent handler
soma jobs work --handlers echo,noop,sleep,subagent --poll-interval 500 &

# Submit a subagent job that uses the api engine + Anthropic provider
soma jobs submit subagent --trusted --data '{
  "engine": "api",
  "provider": "anthropic",
  "model": "claude-sonnet-4-6",
  "prompt": "Reply with one short sentence.",
  "max_turns": 1
}' --json
```

For OpenAI-compat providers and custom endpoints, set `SOMA_API_CUSTOM_PROVIDERS` to a JSON array. Example:

```bash
export OPENROUTER_API_KEY=sk-or-...
export SOMA_API_CUSTOM_PROVIDERS='[
  {
    "name": "openrouter",
    "base_url": "https://openrouter.ai/api/v1",
    "auth_env_var": "OPENROUTER_API_KEY"
  }
]'
```

Then submit with `"provider": "openrouter"` in the job data.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `soma: command not found` | `npm link` didn't resolve into PATH | Check `npm config get prefix` — that bin dir must be on PATH. |
| Daemon spawns but agent never comes up | OAuth token expired / Keychain locked | `claude login` (or set `CLAUDE_CODE_OAUTH_TOKEN` env). Watch `pm2 logs soma-daemon`. |
| Dashboard returns 401 on `/api/...` | Session cookie missing or expired | Sign in again; clear browser cookies for `localhost:3000` if stale state. |
| `Sign-in failed: MissingCSRF` | Stale `authjs.csrf-token` cookie | Clear browser cookies for the dashboard origin. |
| `Sign-in failed: CredentialsSignin` | Wrong password (often browser autofill capturing an error message) | Manually type the password from `~/.soma/default/dashboard.env`. Delete the saved password in your browser if autofill keeps re-injecting the wrong value. |
| Submit returns 422 + `protected_job_name` | You tried to submit `shell` / `subagent` / `subagent_aggregator` from the dashboard | Use the operator CLI with `--trusted`. The dashboard pre-renders the equivalent command in the error card. |
| API engine throws `engine is gated` | `SOMA_ALLOW_API_ENGINE=1` not set on the worker process | Set the env in the worker's `ecosystem.config.js` `env` block, or in your shell before `soma jobs work`. |

## What you have now

- A persistent agent (`boss`) running under PM2, surviving reboots and crashes
- A SQLite queue at `~/.soma/default/minions.db` you can submit work to from the CLI or the dashboard
- A worker draining the queue with a configurable handler set
- (Optional) Telegram control surface for the agent
- (Optional) API-engine path for pay-per-token providers

Next reading: [architecture.md](./architecture.md) to understand how the pieces fit, or [agent-bootstrap.md](./agent-bootstrap.md) if you're planning to develop in this repo (human or AI).

# What is SOMA

> One-page mental model — substrate, minions, engines, tools — plus the project's vocabulary glossary.

# What is SOMA

SOMA is a **persistent agent operating system**. The shortest possible definition: it lets a Claude Code session keep working — through crashes, OAuth refresh cycles, context-window resets, machine reboots, and operator absence — by externalising state into a queue of durable rows that any worker can resume.

## The mental model in 30 seconds

Three layers, each independently durable. Reading bottom-up:

```
┌──────────────────────────────────────────────────────────────────┐
│  Tools          submit_minion / send_message / read_own_inbox    │  ← Phase 1 (shipped); brain-derived tools land Phase 6
├──────────────────────────────────────────────────────────────────┤
│  Engines        subscription (claude -p)  +  api (HTTP)          │  ← shared loop, swappable Provider seam
│                 ↓                              ↓                  │
│                 Anthropic SDK / OpenAI / custom endpoints         │  ← `SOMA_API_CUSTOM_PROVIDERS` registers more
├──────────────────────────────────────────────────────────────────┤
│  Minions queue  durable priority queue (SQLite)                  │  ← every "thought" is a row; survives crashes
│                 minion_jobs / minion_inbox / minion_attachments  │
│                 minion_subagent_messages / *_tool_executions     │
├──────────────────────────────────────────────────────────────────┤
│  Substrate      PM2 daemon · node-pty `claude` spawn · file bus  │  ← inherited from cortextOS upstream
│                 Telegram poller · Next.js dashboard               │
└──────────────────────────────────────────────────────────────────┘
```

The runtime keeps running because each layer's state is observable from outside the process holding it:

- A worker dies mid-LLM-loop → the `minion_subagent_messages` rows are still there → a new worker reads them and continues from the last persisted turn.
- The daemon dies → PM2 restarts it → it re-reads the file bus and picks up where the previous instance left off.
- The Claude OAuth token expires after 71 hours → the daemon's PTY supervisor catches the failure, refreshes from Keychain, respawns.

## Why this shape

The cortextOS upstream gives you "one persistent Claude session." That's nice but limited: you can't parallelise, you can't rescue work after a kill -9, and you can't compose multiple agents because they share one terminal.

The Minions queue (from gbrain) gives you "many parallel Claude sessions whose work survives any failure." That's the leap. The cost is operational complexity — but the queue absorbs that complexity into a small pile of well-tested SQL. Most of the rest of SOMA is just plumbing into that queue.

The dual-engine seam (subscription vs. api) gives you "use the operator's Claude subscription quota for default work; switch to pay-per-token API credits when you need fan-out parallelism without quota limits." Each engine spends a different bucket of capacity.

The protected-names gate gives you "agents can submit work to themselves without being able to escalate privileges." A model that wants to call `shell` has to ask the operator via a dashboard CLI prompt — no path through model output to RCE.

## Vocabulary

These terms are used throughout the codebase and docs. Internalising them speeds up everything else.

| Term | Means |
|---|---|
| **orchestrator** / **twin** | Top-level AI layer. "Orchestrator" is internal / dev-facing; "twin" is conceptual / business-facing. Same runtime entity. (ADR-007) |
| **brain** | The persistent files representing an agent — markdown identity + soul + goals + skills. Survives process death. |
| **body** | A transient `claude` Code subprocess instantiating a brain. |
| **minion** / **job** | A durable row in `minion_jobs`. Priority-ordered, DAG-aware, stall-rescued, idempotent. |
| **worker** | An ephemeral process claiming minions and running them. Phase 2 will run each inside a worktree. |
| **handler** | The function bound to a job's `name`. Built-ins: `echo`, `noop`, `sleep`, `shell` (gated), `subagent` (gated), `subagent_aggregator` (gated). |
| **engine** | Under the `subagent` handler, the LLM-loop implementation. Two ship: `subscription` (claude CLI subprocess) and `api` (HTTP, Anthropic SDK + OpenAI-compat + custom). |
| **provider** | Under the `api` engine, the HTTP shape. `anthropic` (SDK), `openai` (covers OpenAI / OpenRouter / Together / Groq / Anyscale / Mistral / Ollama / vLLM / LM Studio), plus anything in `SOMA_API_CUSTOM_PROVIDERS`. |
| **tool** | Under the `api` engine, a function the model can call. Phase 1 ships 3 queue-internal tools; brain-derived tools come Phase 6. |
| **worktree** | Per-job git worktree for filesystem isolation. Phase 2 — not shipped yet. |
| **pillar** | Routing dimension on every job: Memory / Action / Automation / Self-Learning. |
| **department** | Routing dimension on every job: Marketing / Sales / Operations / Content / Finance / Product. |
| **skill** | Fat-markdown file an agent invokes (gstack/gbrain `SKILL.md` format). Lives under `templates/` or per-agent `.skills/`. |
| **harness** | This repo's operating rules for Claude Code. See [CLAUDE.md](../../CLAUDE.md). |

## Routing principle

> **Deterministic work → Minion handler. Judgment → subagent.**

A handler is the right answer when the action is mechanical: send an HTTP request, write a file, run a query, post a message. A subagent is the right answer when the action requires reading context and choosing among options.

## Engine selection (ADR-008)

> **Subscription-first. API opt-in only.**

Subscription engine ships as the default because it spends the operator's existing Claude subscription quota and uses the OAuth credential already in Keychain. The API engine is opt-in (`SOMA_ALLOW_API_ENGINE=1`) because it spends pay-per-token API credits — a different cost surface that warrants its own gate.

For per-job override: set `data.engine: 'api'` on the job. For process-wide default: `SOMA_DEFAULT_ENGINE=api`.

## Capability ceiling (ADR-011)

> **Don't dumb down.**

Every donor system's capabilities port in full. The narrative ("personal agent OS") is the organising story, not a cap on features. If gbrain shipped a 710-LOC subagent handler with two-phase tool ledgers and crash-resumable replay, SOMA's API engine ships the same — even when the surface explanation is "it's just a chat loop."

## Synergy principle (ADR-012)

> **Synergy not silos.**

When two donors ship overlapping concepts, integrate into a single coherent implementation. The unified runner handler with the engine seam is the canonical example: gbrain had a subagent handler, gstack had a `claude -p` subprocess pattern, and rather than ship both as parallel handlers we built one runner that selects by `data.engine`. Same goes for memory (Phase 6 — typed edges in the brain graph, not a sidecar JSONL), scheduled work (Minion jobs, not a parallel cron), and the file bus vs queue (different purposes, cleanly separated).

## User-facing edge (ADR-014)

> **Filter both directions at the human boundary.**

Internals stay full-fidelity (per ADR-011). The dashboard, Telegram bot, and any future CLI-chat surface translate simple human input → structured backend calls and structured backend output → plain-language summaries with progressive disclosure. A model on the inside speaks structured JSON to the queue; a human on the outside types "sleep 5 seconds" and sees "submitted job #42 — sleeping 5000ms."

---

Next reading: [donor-lineage.md](./donor-lineage.md) for the per-donor port table, or [architecture.md](./architecture.md) for the component map.