Cloud Chat Routing — Guide

Phase 70. When you chat with a worker from CRM or Telegram, that conversation now runs inside your Hetzner Cloud container, not on the master server. Close your laptop, switch to Telegram, keep working — the session, the files, and the open repos all live in your Cloud workspace.


TL;DR


Why this matters

Phase 60 shipped Standard Cloud as "always-on Linux box with Claude Code". Phase 69 wired your repos into the bot org. But until Phase 70, the chat workflow still spawned claude -p on the master VPS — your Cloud container was sitting idle and your chat work lived on the master.

Phase 70 routes chat into the container so the value prop actually delivers:


How routing works

Every chat message routes through this decision tree:

chat msg arrives → child-bot on master
  │
  ▼
getWorkerTarget(ownerChatId)
  │
  ├─ user.plan === 'cloud' AND container.status ∈ (ready, paused) ?
  │   │ YES
  │   ▼
  │   ensureAwake(target)        ← Phase 70.4: wake if paused (~1s)
  │   │
  │   ▼
  │   spawnWorker → docker exec  ← Phase 70.2: --workdir /workspace/<slug>
  │                                 (Phase 70.3 maps project_name → slug)
  │   │
  │   ▼
  │   bash -lc 'exec "$@"' bash claude -p "..."
  │                                ↑ login shell sources ~/.profile / ~/.bashrc
  │                                  For BYOK / free plans: ANTHROPIC_API_KEY
  │                                  is set via ~/.bashrc (injected at provision)
  │                                  For cloud plan (no BYOK): docker exec adds
  │                                  -e ANTHROPIC_API_KEY= which clears the key
  │                                  → claude CLI uses OAuth session from
  │                                  `claude login` (Claude Code subscription)
  │
  └─ otherwise → Bun.spawn(["claude", ...]) on master (today's default)

That single decision happens once per message. The worker log emits a single line telling you the chosen target:

claude spawn: container/ready
claude spawn: container/ready (woke from paused)
claude spawn: local

If you ever wonder where your last message ran, that's the line to grep for.


How to verify it's working

1. Header pill

After provisioning your Cloud Workspace, Settings → reload CRM and look at the top-right header. You should see one of:

Pill Meaning
Cloud (green dot) Container alive, chat will route into it
Cloud · asleep (grey dot) Container paused; next chat message wakes it
(nothing) No container provisioned yet

The pill polls container status every 30 seconds, so it lags a wake/pause event by up to half a minute.

2. Worker log

If you have terminal access to the master (tmux attach -t citadel-child for the Arc OS dev bot), tail the worker output and watch for the spawn line on every message.

3. Files actually inside the container

Open the Cloud terminal from /cloud, then:

cd /workspace/<your-project>
git log -3
ls -lh

If chat-driven edits appear here (and arc cloud sync from your laptop flags behind with the right commit count), routing is wired correctly.


Lifecycle: pause, wake, push, fetch

The Phase 69 lifecycle still applies:

  1. Idle 30 min → cron pauses the container.
  2. Before pausesnapshotAndPush auto-commits dirty trees in every repo under /workspace/ and pushes them to the bot org.
  3. Next chat messageensureAwake unpauses the container (~1 s), then fires fetchAll in the background so every repo's origin/main pointer is fresh.
  4. arc pull <project> on your laptop reads those auto-commits.

The pause/wake cost is folded into the latency of the next chat — you don't need to think about it.


Session continuity

Claude Code stores conversation history at ~/.claude/projects/<cwd-hash>/<session-id>.jsonl. Inside the container, that directory sits on a volume mount, so it survives docker pause / docker unpause. As long as your chat keeps targeting the same project, claude's --resume <session-id> finds the same JSONL after wake and continues the thread.

One caveat: cwd hash changes across local ↔ cloud

The hash is derived from the working directory path. On the master, claude runs in /opt/repos/<slug>; in the container, it runs in /workspace/<slug>. Two different paths → two different hashes → two different conversation histories.

What this means in practice:

We considered symlinking /opt/repos/workspace on the master to keep hashes aligned, but that conflated the two file trees and broke the master bots that run on Free plans. Two histories is the lesser evil.


Troubleshooting

"claude spawn: local" but I'm on Cloud plan

Check, in order:

  1. subscription.plan === 'cloud' in the user dropdown billing pill?
  2. /cloud/status returns status: 'ready' or 'paused'?
  3. Did you create the project AFTER provisioning? Containers only clone project repos that existed at provision time (Phase 69.3). Re-provision or re-trigger the bootstrap to pick up newly-created projects.

Chat hangs for ~5 seconds, then responds

That's the first wake after a long idle. The wake is ~1 s; the rest is claude's own startup + first inference. Subsequent messages while the container stays warm are normal latency.

"Not logged in" from claude

Phase 70.5 fixed a bug where ~/.bashrc early-returned for non-interactive shells, so ANTHROPIC_API_KEY never reached claude. Fix: the key now lives in ~/.profile (sourced by the login shell wrapper). If you upgraded mid- Phase-70 and never re-saved your token, go to /cloud Step 1, paste your sk-ant-api03-… again. Newly-saved tokens land in the right file.

Container says paused for hours after a chat msg

The 30-second polling on the header pill is one cause. The other: if wakeContainer() failed silently, the DB row reads ready but Docker says paused. Run arc cloud sync <project> — it will surface the divergence in the matrix.


See also