aboutsummaryrefslogtreecommitdiff
path: root/docs/design/2026-06-17-ntfy-agent-comms-proposal.org
blob: ce1713892067a8e5dfb91031f372bd7056e95cab (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#+TITLE: Proposal — Promote the ntfy phone channel into a general agent-comms tool
#+AUTHOR: Craig Jennings & Claude (home project)
#+DATE: 2026-06-17

* Why this is in rulesets' inbox

The home project built a working, private phone-notification channel for Craig on 2026-06-17 (self-hosted ntfy over Tailscale). Craig wants rulesets to consider promoting it from a one-way notification system into a *general two-way communication tool* between him and his agents — and, critically, to move it off pure polling toward event-driven delivery (an inbound message can trigger an action or notify an agent, not just sit in a queue waiting to be polled).

This is a proposal, not a change to anything rulesets owns. It documents exactly what exists, what ntfy makes possible, and the open design decisions rulesets would own. It also relates directly to the cross-agent-comms scripts that were retired from the templates in this same session — ntfy may be the transport layer that effort was missing.

* Part 1 — What exists now (as-built, verified)

- *Server:* ntfy in Docker on =ratio= at =~/docker/ntfy/= (=compose.yml= + =server.yml=, =data/= volume, =restart=unless-stopped=, healthcheck on =/v1/health=). Listens container :80, published to =127.0.0.1:2586=.
- *Tailnet exposure:* =tailscale serve --bg --http=80 http://127.0.0.1:2586= → reachable at =http://ratio.tailf3bb8c.ts.net= (tailnet only, no public exposure). Disable with =tailscale serve --http=80 off=.
- *Transport security:* plain HTTP, but every byte rides inside the WireGuard mesh (Tailscale), so it is encrypted end to end. The Tailscale account does not support TLS certs, and TLS would be redundant on the tailnet anyway. If ever exposed publicly, TLS + stronger auth become mandatory.
- *Auth:* =auth-default-access: deny-all=. User =cj= has read-write on topics =claude= and =infra=. Anonymous is denied — verified 403 on both publish AND subscribe without a token. Token =tk_…= never expires. App login is username =cj= + a short password.
- *Phone:* Pixel 6, ntfy F-Droid build (no Firebase / Google Play Services), WebSocket instant delivery. Already on the tailnet.
- *Publisher wrapper:* =~/.local/bin/phone-notify= (on ratio only). Reads =~/.config/phone-notify/config= (chmod 600: URL, token, topic). Supports =-t/--title=, =-p/--priority=, =-T/--tags=, =--topic=, =--click=, =--url=.
- *Verified two-way:* agent → phone push lands instantly; phone → publish to the topic lands on the server and is readable by the agent (Craig sent "It did, in fact, land." from the app and the agent polled and saw it).

* Part 2 — The ntfy building blocks rulesets can use

** Publish (agent → phone), already wired
- =curl -H "Authorization: Bearer <token>" -d "msg" <url>/<topic>= or =phone-notify=.
- Rich features available, unused so far: =Priority= (1-5), =Tags= (emoji/keywords), =Click= (URL opened on tap), =Actions= (tappable buttons — =view= a URL, =http= fire a request, =broadcast= an Android intent), =Attach= (files/images), =Markdown=, scheduled/delayed delivery (=At:= / =Delay:= header), and email/call forwarding.

** Read (agent ← phone)
- One-shot poll, all cached: =GET /<topic>/json?poll=1= (needs the token).
- Only-new since a point: =?since=<id|timestamp|duration>= (e.g. =?since=5m= or =?since=<last-seen-id>=). This is the basis of a =phone-recv= helper that prints only messages newer than the last one seen.
- Cache window is 12h (=cache-duration= in server.yml), so on-demand polling never misses a recent message.

** Subscribe with side effects (the event-driven primitive)
- =ntfy subscribe <topic> '<command>'= holds a persistent connection (WebSocket / JSON stream) and runs =<command>= for *every* inbound message, with fields exposed as environment variables (=$message=, =$title=, =$topic=, =$tags=, =$priority=, etc.).
- This is the answer to "not all polling": a long-running subscriber reacts the instant a message arrives.

* Part 3 — Making it event-driven (Craig's core ask)

Three tiers, increasing capability and difficulty:

** Tier A — Subscriber daemon routes inbound (clean, doable now)
A systemd *user* service on ratio (always-on):
#+begin_src
ntfy subscribe --since=<last> claude /usr/local/bin/ntfy-inbound-handler
#+end_src
=ntfy-inbound-handler= classifies the message and routes it:
- Append to a watched queue (a project =inbox/= or a dedicated comms file) → the next agent session picks it up at a task boundary (already in protocols: inbox check at task boundaries).
- Fire desktop =notify= so a human at a screen sees it immediately.
- Tag-based dispatch: =#task= → file as a TODO; =#infra= → infra queue; etc.

This gets us instant reaction with zero polling, and it degrades gracefully — if nothing is listening, the message still sits in the queue.

** Tier B — Inbound spawns a *new* agent session
The handler invokes the =ai= launcher (or a scheduled/cron Claude run) to process the message autonomously. An inbound phone text becomes an agent action — "remind me to X" from anywhere, "what's the status of Y", "approve the pending commit". This is where it stops being a notifier and becomes a remote control for the agent fleet. Ties into the harness cron/schedule features and the retired cross-agent-comms intent.

** Tier C — Notify / interrupt a *live* agent session (hardest, harness-dependent)
A turn-based session has no native external interrupt. Honest options to explore:
- The session runs a background subscriber/poll loop; the harness re-invokes the agent when backgrounded work emits or completes (the background-Bash + Monitor + ScheduleWakeup / =/loop= dynamic-pacing mechanisms).
- A =/loop= that polls the topic every N seconds (still polling, but bounded and cheap).
- Whatever the harness exposes for inbound push into a live session (e.g. a RemoteTrigger / inbound-PushNotification path) — needs experimentation.

Recommendation: ship Tier A first (high value, low risk), prototype Tier B, treat Tier C as research.

* Part 4 — The general-comms vision (beyond notifications)

- *Channels as topics:* =claude= (agent ↔ Craig), =infra= (server/health/backup alerts — the DEGRADED-pool class), per-project topics, a cross-agent topic.
- *Bidirectional chat:* Craig texts his agent from anywhere over Tailscale; the agent replies. Effectively private, self-hosted "SMS with your agent."
- *Approval buttons:* the publish =Actions= feature can render Approve / Reject buttons on the phone. For the commits.md approval gates (commit message, PR body, PR review) when Craig is away from the desk, a tapped button fires a webhook the handler turns into "proceed." This is a concrete, high-value use.
- *Attachments:* agent sends a generated screenshot/report to the phone; Craig sends a photo to the agent.

* Part 5 — What rulesets would own / decide

1. *Canonical tooling:* promote =phone-notify= (send) and add =phone-recv= (check-since) as rulesets bin scripts, synced to all machines via dotfiles/templates. Today =phone-notify= lives only on ratio.
2. *Config + secret convention:* where the server URL + token live per machine (=~/.config/phone-notify/config= chmod 600 today), and whether the token should be a rulesets-managed GPG-encrypted secret distributed via dotfiles.
3. *The subscriber daemon:* a reference =ntfy-inbound-handler= + a systemd user-unit template, plus the routing convention (tags → destinations).
4. *Protocol conventions:* topic naming, a message format/tag vocabulary for routing, and how inbound maps to the existing =inbox/= and (retired) cross-agent-comms protocols.
5. *Harness integration:* how — if at all — to wake or notify a live/new agent session on inbound. The Tier C research.
6. *Relationship to cross-agent-comms:* decide whether ntfy is the transport that replaces the just-retired scripts, and whether agent↔agent messaging rides the same server (a dedicated topic) or stays separate.

* Part 6 — Open questions

- Multi-machine token distribution (per-machine config vs encrypted-in-dotfiles).
- Daemon placement: one always-on subscriber on ratio vs per-machine subscribers.
- Inbound integration with the existing inbox + the retired cross-agent protocols.
- Live-session interrupt feasibility (entirely harness-dependent — needs a spike).
- Whether agent↔agent comms and agent↔Craig comms share a server or are isolated.

* Companion artifact

The full as-built runbook (concrete values, server.yml, the verification checklist, the security model) lives in the home project at =working/phone-notifications/spec.org=. This proposal is the forward-looking half; that file is the operational record of what was deployed.