Workers and the Intelligence Layer

Arc OS uses a worker system to distribute tasks across specialized AI agents, while the Intelligence Layer ensures the quality of their responses through four modules: Binary Evals, Context Router, Learnings, and the Karpathy Loop.

Worker System

Each worker is a separate AI agent with a defined role, model, and set of tools. Workers operate within a project and are accessible through the Workspace UI or Telegram commands (/c, /d, /w:worker_id).

Canonical preset library (12)

All presets live in config/workers_registry.json and are available via GET /api/crm/workers/presets. Each one is a generic template for any project: no brand references, no character names, no references to our infrastructure.

Engineering / Core (6):

Worker	ID	Model	Type	Tools	Purpose
Consultant	`consultant`	Sonnet	chat	Read, Glob, Grep, WebSearch, WebFetch	Read-only research, advisory
Developer	`developer`	Opus	terminal	All	Ship code that meets DoD
UI/UX Designer	`ui-designer`	Sonnet	chat	Read, Glob, Grep, WebFetch	UI layouts, design tokens
Knowledge Archivist	`archivist`	Sonnet	terminal	Read, Write, Glob, Grep	Knowledge base curator
Sentinel	`sentinel`	Sonnet	chat	Read, Glob, Grep, WebSearch	Security audits, pentests
Product Owner	`product-owner`	Sonnet	chat	Read, Edit, Grep, Glob	Roadmap, scoping, user-first decisions

Startup operations (6, added in Phase 66):

Worker	ID	Purpose
Market Analyst	`analyst`	TAM/SAM/SOM, SWOT, Porter's Five Forces, PEST
Growth Strategist	`growth`	AARRR funnel, ICP, channels, A/B testing, LTV/CAC
Fractional CFO	`cfo`	Unit economics, burn, runway, 3-scenario forecasts
Pitch Coach	`pitch-coach`	One-liner, story arc, 15-slide deck rule, Q&A prep
Legal Advisor	`legal`	Entity choice, founder agreements, IP, GDPR/CCPA
Customer Researcher	`researcher`	Mom Test, hypothesis-driven, cohort retention

Creating a worker in a project

Via UI (default): click + Add in the WorkerSelector pill bar → the WorkerCreationWizard opens with 3 steps:

Identity — pick preset card OR "From scratch"
Capabilities — model + tools + smart warnings (e.g. "read-only role + Write tool = misconfig")
Instructions — system prompt + skills picker + live preview

The wizard auto-injects the SYSTEM_PROTOCOL baseline (see below) — the preset focuses only on role-specific expertise.

Via CLI / API: POST /api/crm/projects/:name/workers with the full body (legacy form, "Show advanced form →" link in the wizard).

Worker types

chat — turn-based conversation with full context history. The worker receives the entire previous conversation and responds as a dialogue partner.
terminal — streaming execution with tool events. The worker operates as a terminal session, running tools sequentially and broadcasting progress in real time.

Creating a custom worker

Custom workers are described in the config/workers_registry.json file. Each entry defines the agent's behavior:

{
  "id": "my-worker",
  "label": "My Worker",
  "icon": "🔧",
  "type": "chat",
  "model": "claude-sonnet-4-5",
  "max_turns": 10,
  "tools": ["Read", "Glob", "Grep"],
  "system_prompt": "You are...",
  "focus_dirs": ["src/"],
  "builtin": false
}

Configuration fields

Field	Type	Description
`id`	string	Unique worker identifier, used in commands (`/w:id`)
`label`	string	Display name in the UI
`icon`	string	Emoji icon for the avatar
`type`	`"chat"` \| `"terminal"`	Operating mode (see above)
`model`	string	Claude model (`claude-sonnet-4-5`, `claude-opus-4-6`, `claude-haiku-4-5`)
`max_turns`	number	Maximum number of tool-use cycles per response
`tools`	`"all"` \| string[]	Available tools. `"all"` grants the full set
`system_prompt`	string	Inline system prompt
`system_prompt_skill`	string	Path to a file containing the system prompt (alternative to inline)
`prompt_style`	`"history"` \| `"gsd"`	Prompting style: `history` keeps context, `gsd` is task-oriented
`output_format`	`"text"` \| `"stream-json"`	Output format
`focus_dirs`	string[]	Directories the worker focuses on
`log_category`	string	Logging category
`builtin`	boolean	`true` for built-in workers (cannot be deleted via UI)

SYSTEM_PROTOCOL — Baseline for all workers

While worker.system_prompt defines role-specific expertise (the analyst does TAM/SAM/SOM, the sentinel does SQL injection audits), there are 15 cross-cutting rules that every worker must follow — from developer to pitch-coach. Instead of duplicating them in every preset, they live in a single constant (shared/cli-routes.ts:SYSTEM_PROTOCOL) and are auto-injected on every worker spawn via child-bot/claude-runner.ts.

5 Mandatory Workflow rules

Every new task MUST be registered via arc issue create
Any plan change MUST update ROADMAP.md via arc roadmap sync
Before starting work, read ROADMAP.md + open issues (arc issues)
After significant changes, sync knowledge via arc memory refresh
Log meaningful progress on issues via arc issue log <id> "<text>"

10 Quality Baseline rules (#229)

Priorities: P0 > P1 > P2 > P3 — always know what's next and why
Session report: close meaningful work with arc report --summary
Definition of Done includes documentation, not just commit
Trade-offs explicit: scope vs deadline vs quality — recommend one path + 1-2 alternatives
Format: concise, tables/numbers where possible, actionable beats descriptive
Cite sources for any fact/number; "I don't know" beats fabrication
No silent failures: state blockers explicitly, don't continue down wrong path
Honest progress: report what actually shipped (done vs attempted vs failed)
Convention over invention: follow existing patterns, explain deviations
Learnings feedback loop: append to learnings.md when corrected on recurring mistake

Effect

Thanks to this automatic injection, presets became 50-70% shorter. Example: product-owner dropped from 733 to 404 chars — only the "User-first lens" (specific frame) remains; the rest (priorities/roadmap/issues/DoD/trade-offs) is now baseline.

Admins can extend the baseline in shared/cli-routes.ts — the change automatically applies to all workers on the next spawn.

Binary Evals — Response Validation

What is it?

Declarative rules for checking the quality of worker responses. Each rule is deterministic (no AI), runs instantly, and does not block the response. Results have severity warning or info — they inform rather than stop.

6 rule types

Type	Description	Example
`string_contains`	Response contains a substring	`"verdict"` in code review
`string_not_contains`	Response does NOT contain a substring	No `--force` in output
`regex_match`	Response matches a regex	Contains a metric (`disk\|RAM\|CPU`)
`regex_not_match`	Response does NOT match a regex	No credentials in output
`max_length`	Length <= value	Response up to 5000 characters
`min_length`	Length >= value	Response at least 1000 characters

Evals file format

The file is placed next to the skill: skills/{skill_name}/{skill_name}.evals.json

{
  "version": 1,
  "skill": "code-review",
  "rules": [
    {
      "id": "cr-001",
      "name": "Must return JSON verdict",
      "type": "string_contains",
      "value": "\"verdict\"",
      "severity": "warning"
    }
  ]
}

Each rule has a unique id, a human-readable name, one of the 6 types, a value to compare against, and a severity (warning or info).

Context Router — Automatic Skill Selection

How does it work?

On every message, the Context Router scores all skills from skills/_registry.json and automatically selects the most relevant ones:

Trigger match (+2 points) — a direct occurrence of a trigger word from the message
Keyword match (+1 point) — semantic proximity by keywords
Top-5 by total score are injected as SKILLS_HINT into the worker's prompt

Example

Message: "review the git commit for security"

code-review: trigger "review" found → +2 points
git-manager: keyword "commit" found → +1 point
Result: code-review (2), git-manager (1) injected into the prompt

Skill registry format

{
  "name": "code-review",
  "triggers": ["review", "audit", "security"],
  "keywords": ["vulnerability", "OWASP", "XSS"],
  "agents": ["summer"],
  "category": ["complex"]
}

triggers — words that directly indicate the skill (high priority)
keywords — additional terms for semantic association
agents — which workers may use this skill
category — classification (simple, complex, critical)

Learnings — Correction Memory

How are they created?

Learnings are accumulated rules that emerge from feedback:

Thumbs-down (👎) — a learning with source "negative" is automatically created based on the problematic response
Fix It — re-running a task generates a learning with source "fixit"
Manual — architectural decisions and rules, source "manual" or "architecture"

File format

The learnings.md file in the project root:

# Learnings
> Auto-generated. Injected into GSD prompt at session start.

## Rules
- [2026-04-03T20:00:00Z] [architecture] Rule text here...
- [2026-04-04T10:00:00Z] [security] Another rule...

How are they used?

Loaded at the start of each worker session
Injected into the Developer's GSD prompt (budget — 2000 characters)
Newest rules come first (time-based priority)
They act as immune memory — mistakes made once are not repeated in subsequent sessions

Karpathy Loop — Nightly Self-Improvement

An automatic skill-improvement cycle, inspired by Andrej Karpathy's ideas on iterative self-improvement.

How does it work?

Every night at 3:00 UTC an automatic pipeline runs:

Metrics collection — reads each project's quality-metrics.json
Finding problematic skills — filters skills with a success rate < 80% or more negative than positive feedback
Sage analysis — Haiku generates an improved version of the skill based on the collected mistakes
Blind A/B test — 3 scenarios, randomized order, dual scoring:
- Eval rules (60% weight) + LLM judge (40% weight)
PR creation — if the new version wins (new_wins > old_wins), a pull request is created
CEO report — results are sent to Telegram for the final decision

Quality metrics

Each project accumulates statistics in quality-metrics.json:

{
  "total_invocations": 42,
  "total_successes": 40,
  "total_feedback_positive": 35,
  "total_feedback_negative": 2,
  "avg_duration_ms": 15000,
  "skills": [
    {
      "name": "code-review",
      "applied_count": 5,
      "success_count": 4
    }
  ]
}

These metrics let the system objectively determine which skills need improvement and track progress after updates.