aboutsummaryrefslogtreecommitdiff
path: root/.ai/scripts/cross-agent-comms/cross-agent-halt.md
blob: b817fbcfc2673962de787e867799e49c8636f711 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# cross-agent-halt

**Purpose.** Failsafe stop for all cross-agent activity on the local machine
(or, with `--tailnet`, across all configured peers). Creates the HALT file
that every component in the protocol checks; within one polling cadence
(~5 min) all polling, sending, watching, and receiving stops.

This is the user's emergency brake. Use when something is misbehaving and
visiting individual sessions is too slow.

## Usage

```
cross-agent-halt [reason] [--tailnet] [--no-stop-watcher]
```

### Positional argument

| Position | Meaning | Example |
|---|---|---|
| 1 | Optional human-readable reason for the halt. Written into the HALT file's body. Helps future-you remember why you stopped things. | `"investigating runaway poll loop, 2026-04-27"` |

### Flags

| Flag | Default | Purpose |
|---|---|---|
| `--tailnet` | local only | Propagate halt to every peer in `peers.toml` via SSH over Tailscale. |
| `--no-stop-watcher` | (stops watcher) | Skip stopping the `cross-agent-watch.path` systemd unit. Useful if the watcher is intentionally separate from comms (rare). |

## Behavior

### Local halt (default)

1. Write the HALT file: `~/.config/cross-agent-comms/HALT`. If a `[reason]` was
   passed, write it as the file's body. Otherwise the file is empty (existence
   alone triggers halt).
2. Stop the watcher service: `systemctl --user stop cross-agent-watch.path`
   (and the corresponding `.service` if running).
3. Print a summary:
   ```
   ✓ HALT file written: ~/.config/cross-agent-comms/HALT
   ✓ Watcher service stopped (cross-agent-watch.path)
   - In-flight sends will complete their current rsync step (~seconds), then
     stop. New sends are blocked.
   - Active agent polling sessions stop within one cadence (~5 min).
   - Use `cross-agent-resume` to clear HALT.
     Per-session polling does NOT auto-resume — you re-engage each session by
     telling its agent to resume polling.
   ```
4. Exit 0.

### Cross-tailnet halt (`--tailnet`)

1. Apply local halt steps 1-2 first.
2. Read `peers.toml` for the list of remote machines.
3. For each peer, SSH and write the HALT file:
   ```
   ssh <user>@<host> "echo '<reason>' > ~/.config/cross-agent-comms/HALT && \
                       systemctl --user stop cross-agent-watch.path"
   ```
4. Track per-peer success/failure. Print results:
   ```
   Halting velox.local       ✓ (HALT file written)
   Halting bastion.local     ✗ (ssh exit 255: no route to host)
   Halting locally           ✓ (HALT file written)

   PARTIAL HALT: 2/3 machines halted. bastion.local needs manual halt.
   ```
5. Exit 0 if all peers halted; exit 1 if any peer failed (so scripts can
   detect partial halt). The local halt always succeeds — even on `--tailnet`,
   if remote peers fail, local is still halted.

## What "halt active" means for each component

| Component | Behavior under HALT |
|---|---|
| `cross-agent-send` | Refuses to send. Exits 5 with "halt active; remove ~/.config/cross-agent-comms/HALT to resume." Checks HALT at start AND between each retry/rsync step, so an in-flight send completes its current step then stops. |
| `cross-agent-recv` | Refuses to verify or dedup. Exits 5 with same message. Inbound files are **left in place** — not moved, not rejected — so resume picks them up cleanly via cold-start. |
| `cross-agent-watch` | Continues running but suppresses notifications. Logs each event with `(suppressed by HALT)` so the operator can see what would have fired. |
| `cross-agent-status` | Prints prominent `⚠ HALT ACTIVE` banner before normal output. Continues to enumerate (read-only). |
| `cross-agent-discover` | Same banner. Continues (read-only). |
| Agent polling loops | Check HALT on every wake. If set: write a final `progress` note to any active conversation ("HALT fired locally; pausing"), surface "(HALT active; cross-agent comms paused)" in every user response, and stop rescheduling. Polling decays naturally within one cadence. |
| Conversation initiator | Refuses to write sequence 1 of any new conversation. Surfaces refusal to user. |
| Startup workflow (Phase A) | Checks HALT at session boot. If set, surfaces immediately and skips cross-agent inbox checks. |

## Failure modes

| Symptom | Cause | Fix |
|---|---|---|
| `~/.config/cross-agent-comms/HALT` already exists | Halt was already active | OK — running halt again refreshes the reason text. Safe. |
| `systemctl --user stop` fails | Watcher service not installed, or systemd not available | The HALT file is still written — components that check HALT will still stop. The systemctl failure surfaces as a non-fatal warning. |
| `--tailnet` halts some peers but not others | One or more peers unreachable | Exit 1 with per-peer status. Manually halt the unreachable peers (visit each machine, `touch ~/.config/cross-agent-comms/HALT`), or fix the network and re-run. |
| Permission denied writing the HALT file | `~/.config/cross-agent-comms/` doesn't exist or is owned by another user | `mkdir -p ~/.config/cross-agent-comms/`; check ownership. |

## What halt does NOT do

- Does not kill running Claude sessions. Polling stops within ~5 min, but the
  session itself stays alive and can be re-engaged after resume.
- Does not delete pending messages. Inbound files in `inbox/from-agents/`
  remain; they get processed when polling resumes.
- Does not abort in-flight rsync push mid-byte. Atomic-write semantics
  guarantee in-flight messages either complete cleanly or leave only `.tmp.*`
  files (which receivers ignore).

## Examples

```bash
# Quick halt with no reason
cross-agent-halt

# Halt with a memo
cross-agent-halt "runaway poll loop in homelab session, debugging"

# Halt all tailnet peers + local
cross-agent-halt --tailnet "shutting down for system update"

# Halt protocol comms but leave the watcher service running
cross-agent-halt --no-stop-watcher
```

## Recovery

Always pair with `cross-agent-resume` when the situation is resolved:

```bash
cross-agent-resume                # local
cross-agent-resume --tailnet      # all peers
```

## See also

- `cross-agent-resume` — counterpart that clears HALT.
- `cross-agent-status` — see HALT state at a glance.
- `cross-agent-comms.org` — protocol spec, `* Halt mechanism` section.