Problems We Solve

Five real pains of AI-assisted development — and how Arc OS eliminates each one.

Pain 1: "AI forgets everything between sessions"

The Problem

You spend 30 minutes teaching Claude your project conventions. Next session — blank slate. You correct a mistake. Tomorrow — same mistake. Every session starts from zero.

How Others Handle It

ChatGPT: Custom Instructions (200 words, one set for everything)
Cursor: .cursorrules file (manual, no feedback loop)
Manual: Copy-paste your "rules" into every conversation

How Arc OS Solves It

Reflect Loop — automatic persistent memory from corrections.

You press "Fix It" or "thumbs-down"
    → System writes rule to learnings.md
    → Rule survives restarts
    → Injected into EVERY future prompt automatically

Example learnings.md after 2 weeks:

- [2026-03-20] [fixit] Always use t-call for translations in Odoo QWeb
- [2026-03-21] [negative] Avoid sudo in deployment scripts
- [2026-03-25] [fixit] Use server components by default in Next.js 15
- [2026-04-01] [negative] Don't suggest rm -rf without confirmation

Result: The system builds "immune memory". One correction = permanent rule. The same mistake never happens twice.

Pain 2: "AI doesn't understand my project's tech stack"

The Problem

Your Odoo project uses Bootstrap, Owl framework, QWeb templates, Python. Your SaaS uses Tailwind, React, Next.js, TypeScript. A generic AI bot confuses the two. Odoo advice leaks into React context. React patterns appear in Odoo code.

How Others Handle It

ChatGPT: One conversation per project (no enforcement)
Cursor: Workspace-aware but single context window
Manual: Constantly remind the AI what project you're in

How Arc OS Solves It

Federated Architecture — one child bot per project, complete isolation.

Master Bot
    ├── Child: odoo-site     (CLAUDE.md: Odoo 17, Bootstrap, QWeb)
    │   ├── skills/library/odoo-expert.md
    │   ├── skills/library/odoo-owl-expert.md
    │   └── learnings.md: "Use t-call for i18n"
    │
    └── Child: saas-app      (CLAUDE.md: Next.js 15, React, Tailwind)
        ├── skills/library/react-patterns.md
        ├── skills/library/tailwind-expert.md
        └── learnings.md: "Prefer server components"

Different Telegram bots. Different working directories. Different skills. Different memory. They never see each other's context.

Result: Full guide in Multi-Project Skill Isolation.

Pain 3: "AI generates unsafe code and nobody catches it"

The Problem

AI suggests git push --force. Outputs a password in a code snippet. Recommends rm -rf /. You don't always catch it. The response goes to production.

How Others Handle It

ChatGPT / Copilot: No output validation at all
Cursor: Syntax checking only
Manual: Human review of every response (doesn't scale)

How Arc OS Solves It

Binary Eval Engine — declarative rules that check every response before delivery.

{
  "rules": [
    { "name": "No force push", "type": "string_not_contains", "value": "--force" },
    { "name": "No credentials", "type": "regex_not_match", "pattern": "(password|token)\\s*[:=]\\s*\\w{8,}" },
    { "name": "Response under 5000 chars", "type": "max_length", "value": 5000 }
  ]
}

Failures appear as footnotes on the response:

[Claude's response here]
---
Eval: ⚠️ No force push | ⚠️ No credentials in output

Rules are per-skill, per-project. Your Odoo project checks for QWeb compliance. Your React project checks for direct DOM manipulation.

Result: Automated quality gate on every AI output. No human review needed for basic safety.

Pain 4: "I have no idea if the AI is performing well"

The Problem

You've been using AI for 3 months. Is it actually good? Which skills work? Which fail? Is it getting better or worse? No data. No metrics. Just vibes.

How Others Handle It

ChatGPT: Conversation history (unstructured, no metrics)
Copilot: Acceptance rate (one number, no detail)
Manual: Gut feeling

How Arc OS Solves It

Quality Tracker + Karpathy Loop — per-skill metrics with automated improvement proposals.

Every response is logged:

{
  "type": "execution",
  "skills": ["code-review"],
  "success": true,
  "duration_ms": 12340,
  "response_length": 2847
}

Every feedback button (thumbs-up/thumbs-down) is tracked per response:

/quality command shows:
  code-review: 45x, 91% ok, thumbs-up 12/thumbs-down 2, avg 8.3s
  git-manager: 23x, 78% ok, thumbs-up 5/thumbs-down 4, avg 3.1s

At 3:00 AM the Karpathy Loop runs:

Finds skills with <80% success or more negative than positive feedback
Sends CEO a proposal card in Telegram
One tap: Approve (backup + improve) or Reject (discard)

Result: Data-driven AI management. You know exactly what works and what doesn't.

Pain 5: "25 skills loaded at once = confused AI"

The Problem

You have 25 skills covering git, deployment, code review, Figma, Odoo, testing, security. Loading all into every prompt wastes context window and confuses the model. It tries to apply deployment advice to a code review question.

How Others Handle It

ChatGPT: No skill system at all
Cursor: All rules always loaded
Manual: Comment out irrelevant rules per task

How Arc OS Solves It

Context Router — intelligent skill selection per message.

User: "Review this code for XSS vulnerabilities"

Context Router scores:
  code-review:          trigger "review" (2) + keyword "XSS" (1) = 3
  code-review-protocol: trigger "code review" (2)                 = 2
  system-audit:         no match                                  = 0
  git-manager:          no match                                  = 0

Injects into prompt:
  SKILLS_HINT (focus on these):
  - code-review: Security audit and code quality review...
  - code-review-protocol: Structured code review with OWASP...

Only top-5 relevant skills are suggested. Claude still has access to all skills, but focuses on the right ones. Advisory, not restrictive — no risk of breaking anything.

Result: Focused, relevant responses. No context pollution from irrelevant skills.

Summary

Pain	Arc OS Solution	Mechanism
AI forgets corrections	Persistent learning rules	Reflect Loop (`learnings.md`)
Wrong tech stack context	Isolated child bots	Federated Architecture
Unsafe output	Declarative validation	Binary Eval Engine
No performance data	Per-skill metrics + nightly analysis	Quality Tracker + Karpathy Loop
Context dilution	Smart skill selection	Context Router