Problems We Solve
Five real pains of AI-assisted development — and how Arc OS eliminates each one.
Pain 1: "AI forgets everything between sessions"
The Problem
You spend 30 minutes teaching Claude your project conventions. Next session — blank slate. You correct a mistake. Tomorrow — same mistake. Every session starts from zero.
How Others Handle It
- ChatGPT: Custom Instructions (200 words, one set for everything)
- Cursor:
.cursorrulesfile (manual, no feedback loop) - Manual: Copy-paste your "rules" into every conversation
How Arc OS Solves It
Reflect Loop — automatic persistent memory from corrections.
You press "Fix It" or "thumbs-down"
→ System writes rule to learnings.md
→ Rule survives restarts
→ Injected into EVERY future prompt automatically
Example learnings.md after 2 weeks:
- [2026-03-20] [fixit] Always use t-call for translations in Odoo QWeb
- [2026-03-21] [negative] Avoid sudo in deployment scripts
- [2026-03-25] [fixit] Use server components by default in Next.js 15
- [2026-04-01] [negative] Don't suggest rm -rf without confirmation
Result: The system builds "immune memory". One correction = permanent rule. The same mistake never happens twice.
Pain 2: "AI doesn't understand my project's tech stack"
The Problem
Your Odoo project uses Bootstrap, Owl framework, QWeb templates, Python. Your SaaS uses Tailwind, React, Next.js, TypeScript. A generic AI bot confuses the two. Odoo advice leaks into React context. React patterns appear in Odoo code.
How Others Handle It
- ChatGPT: One conversation per project (no enforcement)
- Cursor: Workspace-aware but single context window
- Manual: Constantly remind the AI what project you're in
How Arc OS Solves It
Federated Architecture — one child bot per project, complete isolation.
Master Bot
├── Child: odoo-site (CLAUDE.md: Odoo 17, Bootstrap, QWeb)
│ ├── skills/library/odoo-expert.md
│ ├── skills/library/odoo-owl-expert.md
│ └── learnings.md: "Use t-call for i18n"
│
└── Child: saas-app (CLAUDE.md: Next.js 15, React, Tailwind)
├── skills/library/react-patterns.md
├── skills/library/tailwind-expert.md
└── learnings.md: "Prefer server components"
Different Telegram bots. Different working directories. Different skills. Different memory. They never see each other's context.
Result: Full guide in Multi-Project Skill Isolation.
Pain 3: "AI generates unsafe code and nobody catches it"
The Problem
AI suggests git push --force. Outputs a password in a code snippet. Recommends rm -rf /. You don't always catch it. The response goes to production.
How Others Handle It
- ChatGPT / Copilot: No output validation at all
- Cursor: Syntax checking only
- Manual: Human review of every response (doesn't scale)
How Arc OS Solves It
Binary Eval Engine — declarative rules that check every response before delivery.
{
"rules": [
{ "name": "No force push", "type": "string_not_contains", "value": "--force" },
{ "name": "No credentials", "type": "regex_not_match", "pattern": "(password|token)\\s*[:=]\\s*\\w{8,}" },
{ "name": "Response under 5000 chars", "type": "max_length", "value": 5000 }
]
}
Failures appear as footnotes on the response:
[Claude's response here]
---
Eval: ⚠️ No force push | ⚠️ No credentials in output
Rules are per-skill, per-project. Your Odoo project checks for QWeb compliance. Your React project checks for direct DOM manipulation.
Result: Automated quality gate on every AI output. No human review needed for basic safety.
Pain 4: "I have no idea if the AI is performing well"
The Problem
You've been using AI for 3 months. Is it actually good? Which skills work? Which fail? Is it getting better or worse? No data. No metrics. Just vibes.
How Others Handle It
- ChatGPT: Conversation history (unstructured, no metrics)
- Copilot: Acceptance rate (one number, no detail)
- Manual: Gut feeling
How Arc OS Solves It
Quality Tracker + Karpathy Loop — per-skill metrics with automated improvement proposals.
Every response is logged:
{
"type": "execution",
"skills": ["code-review"],
"success": true,
"duration_ms": 12340,
"response_length": 2847
}
Every feedback button (thumbs-up/thumbs-down) is tracked per response:
/quality command shows:
code-review: 45x, 91% ok, thumbs-up 12/thumbs-down 2, avg 8.3s
git-manager: 23x, 78% ok, thumbs-up 5/thumbs-down 4, avg 3.1s
At 3:00 AM the Karpathy Loop runs:
- Finds skills with <80% success or more negative than positive feedback
- Sends CEO a proposal card in Telegram
- One tap: Approve (backup + improve) or Reject (discard)
Result: Data-driven AI management. You know exactly what works and what doesn't.
Pain 5: "25 skills loaded at once = confused AI"
The Problem
You have 25 skills covering git, deployment, code review, Figma, Odoo, testing, security. Loading all into every prompt wastes context window and confuses the model. It tries to apply deployment advice to a code review question.
How Others Handle It
- ChatGPT: No skill system at all
- Cursor: All rules always loaded
- Manual: Comment out irrelevant rules per task
How Arc OS Solves It
Context Router — intelligent skill selection per message.
User: "Review this code for XSS vulnerabilities"
Context Router scores:
code-review: trigger "review" (2) + keyword "XSS" (1) = 3
code-review-protocol: trigger "code review" (2) = 2
system-audit: no match = 0
git-manager: no match = 0
Injects into prompt:
SKILLS_HINT (focus on these):
- code-review: Security audit and code quality review...
- code-review-protocol: Structured code review with OWASP...
Only top-5 relevant skills are suggested. Claude still has access to all skills, but focuses on the right ones. Advisory, not restrictive — no risk of breaking anything.
Result: Focused, relevant responses. No context pollution from irrelevant skills.
Summary
| Pain | Arc OS Solution | Mechanism |
|---|---|---|
| AI forgets corrections | Persistent learning rules | Reflect Loop (learnings.md) |
| Wrong tech stack context | Isolated child bots | Federated Architecture |
| Unsafe output | Declarative validation | Binary Eval Engine |
| No performance data | Per-skill metrics + nightly analysis | Quality Tracker + Karpathy Loop |
| Context dilution | Smart skill selection | Context Router |