aboutsummaryrefslogtreecommitdiff
path: root/docs/agent-knowledge-base-spec.org
diff options
context:
space:
mode:
Diffstat (limited to 'docs/agent-knowledge-base-spec.org')
-rw-r--r--docs/agent-knowledge-base-spec.org236
1 files changed, 236 insertions, 0 deletions
diff --git a/docs/agent-knowledge-base-spec.org b/docs/agent-knowledge-base-spec.org
new file mode 100644
index 0000000..c59c33b
--- /dev/null
+++ b/docs/agent-knowledge-base-spec.org
@@ -0,0 +1,236 @@
+#+TITLE: Agent Knowledge Base on Org-roam — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-06-10
+
+* Metadata
+| Status | ready with caveats — Codex review incorporated, D7 ratified keep (Craig, 2026-06-10); caveat: confirm work-root denylist contents; implementation awaiting Craig's go |
+| Owner | Craig Jennings |
+| Reviewer | Craig Jennings; Codex (2026-06-10) |
+| Related | [[file:../todo.org][todo.org — "Check that memories are sync'd across machines via git"]] |
+
+This spec supersedes the 2026-06-05 draft (formerly docs/design/2026-06-05-org-roam-knowledge-base-spec.org, removed; content in git history), folding in Craig's 2026-06-10 ratification answers and restructuring to the spec-create format.
+
+* Summary
+
+Agents adopt Craig's existing org-roam knowledge base (=~/sync/org/roam/=, ~490 org files, Syncthing-synced since 2023) as the shared, cross-project store for durable knowledge. Per-project harness memory stays as a fast capture layer; durable facts get promoted into the KB, which already syncs across machines. This replaces two abandoned designs (a dedicated git repo, a two-tier rules split) with a substrate that already exists.
+
+* Problem / Context
+
+Per-project agent memory lives at =~/.claude/projects/<encoded-cwd>/memory/=, a harness-owned path that is unmanaged and unsynced, so it doesn't survive a new-machine setup or dotfiles restore. Anything durable an agent learns is at-risk by default.
+
+Two fixes were built or designed and dropped. A dedicated =claude-memory.git= repo was built then reversed because it pooled work-confidential and personal memory into one store. A two-tier split (general lessons to rules, project memory to each project's =.ai/=) left public-remote code projects' memory at-risk by design.
+
+The simplification: cross-machine sync is already solved for anything living in =~/sync/org/roam/=. The task stops being "build a memory-sync system" and becomes "point agents at the knowledge base that already syncs."
+
+* Goals and Non-Goals
+
+** Goals
+- Durable, cross-machine agent knowledge with no new sync mechanism.
+- Agents query the KB before relying on remembered project facts, prior decisions, or reference material.
+- Agent-written notes are distinguishable from Craig's, and index cleanly in his org-roam.
+- The work/personal confidentiality boundary is explicit and enforced on the write side.
+
+** Non-Goals
+- The rules layer (=claude-rules/=, =CLAUDE.md=) is untouched. The KB replaces the memory tier, not the rules tier.
+- No Emacs/org-roam package integration; agents never touch the SQLite cache.
+- No autonomy expansion. Free agent writes apply to the KB only — email, Linear comments, PRs, and every other public or external channel still require Craig's review and consent (D6).
+
+** Scope tiers
+- v1: the pointer rule, the write schema, the boundary, one verified seed node.
+- Out of scope: migrating historical harness memory wholesale into the KB; auto-promotion.
+- vNext: promotion tooling (a wrap-up prompt or =/promote= pass), KB hygiene reports (orphan =:agent:= nodes).
+
+* Design
+
+The KB is a directory of plain org files with =#+title=, =#+filetags=, =:ID:= property drawers, and =[[id:UUID]]= links. That is the entire interface.
+
+For a *reader* (any agent): query with ripgrep over content and tags; follow a link by grepping for its target =:ID:=. The backlink graph and database are Craig's Emacs conveniences — the files are the agent's interface. Before relying on a remembered project fact, a prior decision, or reference material, search the KB first.
+
+Conflict-file exclusion is part of the command contract, not a prose reminder — the KB carries dozens of Syncthing =*.sync-conflict-*= files (63 at last count) whose contents are stale or duplicated. The canonical commands:
+
+#+begin_src sh
+# content/tag search
+rg --glob '*.org' --glob '!*sync-conflict*' '<query>' ~/sync/org/roam/
+# follow an [[id:UUID]] link to its node
+rg --glob '*.org' --glob '!*sync-conflict*' ':ID:[[:space:]]+<uuid>' ~/sync/org/roam/
+#+end_src
+
+For a *writer* (personal-project agents only, per D5): create one node per fact, following roam conventions so the next =org-roam-db-sync= indexes it:
+
+#+begin_example
+:PROPERTIES:
+:ID: <generated uuid>
+:END:
+#+title: <concise title>
+#+filetags: :agent:<scope>:
+
+<the fact, with [[id:...]] links to related nodes>
+#+end_example
+
+Filename follows roam's timestamp-prefix convention (=YYYYMMDDHHMMSS-slug.org=). The =:agent:= filetag makes =rg '#\+filetags:.*:agent:'= a clean inventory of everything agents wrote, so Craig can review or prune at will.
+
+** Project classification and write routing (v1)
+
+D5's boundary needs an executable answer to "is this project allowed to write?" — inference from cwd names, remotes, or task content is too much discretion for a confidentiality boundary. The v1 source of truth is an explicit *work-root denylist* carried in =knowledge-base.md= (initially =~/projects/work=; contents confirmed with Craig before the rule ships). Classification:
+
+- *Work* — the project root is, or sits under, a denylisted work root. No KB write, ever. The agent records durable facts per that project's own conventions (work already keeps its knowledge in its project tree); v1 adds no new work-side store.
+- *Personal* — the project root sits under a known project parent (=~/code/=, =~/projects/=, =~/.emacs.d=) and is not denylisted. KB writes allowed per D6.
+- *Unknown* — anything else. No KB write. Refuse and report.
+
+The refusal message contract (work and unknown alike): state the classification, name the durable fact in a one-line redacted summary, and say where it was or wasn't written — so Craig can re-route it deliberately instead of losing it silently.
+
+** Harness memory: capture, then promote
+
+Harness memory keeps its current role but is redefined as an ephemeral working set: fast, automatic, per-project, relevance-recalled, and allowed to be at-risk because nothing durable depends on it surviving. Durable or cross-machine-valuable facts get promoted into the KB as a deliberate step (wrap-up, a task audit, or an explicit prompt) — the same capture-on-landing / promote-on-review cadence the pattern catalog uses.
+
+A new =claude-rules/knowledge-base.md= rule (auto-installs via the Makefile RULES glob, like =patterns.md= — no Makefile change expected) is the bridge: it carries the KB path, the query commands, the write schema, the classification denylist, and the D5/D6 boundary with its refusal contract.
+
+* Alternatives Considered
+
+** Dedicated private git repo (=claude-memory.git=)
+- Good, because git gives history, review, and a deliberate sync step.
+- Bad, because it pooled work-confidential and personal memory into one all-machines store — the reason it was built and then reversed (2026-05-23/24).
+- Bad, because it added a new clone + symlink mechanism every machine must maintain.
+
+** Two-tier split (rules file + per-project =.ai/memory/=)
+- Good, because general lessons would load natively into every session via the rules layer.
+- Bad, because project memory in gitignored-=.ai/= projects stays at-risk by design — it solved sync only where =.ai/= was tracked.
+- Neutral, because the promote-general-lessons instinct survives in this design as KB promotion.
+
+** Org-roam KB (chosen)
+- Good, because sync already works (Syncthing, since 2023) — zero new infrastructure.
+- Good, because the KB is already Craig's curated knowledge home; agent knowledge lands where he actually looks.
+- Bad, because Syncthing has no review gate (accepted in D6) and no history — a bad write propagates immediately.
+- Neutral, because agents read it as plain files; no org-roam tooling required or used.
+
+* Decisions
+
+** D1 — The KB is a queried substrate, accessed as files
+- State: accepted (Craig, 2026-06-10)
+- Context: an agent is a harness process, not an Emacs session; it cannot call org-roam's Elisp API or read its SQLite cache.
+- Decision: We will treat =~/sync/org/roam/= as a directory of plain org files — ripgrep for search, grep-for-=:ID:= to follow links.
+- Consequences: easier — works in every runtime, no Emacs dependency; harder — no backlink graph or db-backed queries for agents (acceptable: ~490 tagged, linked text files grep well).
+
+** D2 — Capture in harness memory, promote into the KB
+- State: accepted (Craig, 2026-06-10)
+- Context: harness memory is fast and auto-recalled but unsynced; the KB is durable but query-only.
+- Decision: We will keep both with distinct roles — harness memory captures, the KB holds what's promoted. Promotion is deliberate (wrap-up, task audit, or explicit prompt), never automatic.
+- Consequences: easier — the at-risk problem dissolves (what stays unsynced is by definition the regenerable hot set); harder — promotion is a discipline that has to actually happen, or value silts up in the capture layer. D7 (resolved: keep) confirms the capture layer stays, so this decision stands as written.
+
+** D3 — Surfacing via a pointer rule
+- State: accepted (Craig, 2026-06-10)
+- Context: agents need to know the KB exists, where it lives, and how to use it — in every project, on every machine.
+- Decision: We will ship =claude-rules/knowledge-base.md= carrying path, query method, write schema, and boundary. It auto-installs via the existing Makefile RULES glob.
+- Consequences: easier — one rule, machine-wide, same mechanism as =patterns.md=; harder — nothing material.
+
+** D4 — Write schema: roam-valid, =:agent:=-tagged, one node per fact
+- State: accepted (Craig, 2026-06-10: "per fact")
+- Context: agent writes must index cleanly in Craig's org-roam and stay distinguishable from his hand-authored notes. Granularity was open: per-fact nodes vs a per-project appended notes file.
+- Decision: We will write one roam-valid node per fact (=:ID:= drawer, =#+title=, =#+filetags: :agent:<scope>:=, timestamp-prefixed filename), linking related nodes by =[[id:]]=.
+- Consequences: easier — roam-native, linkable, =rg :agent:= inventories everything agents wrote; harder — more files (accepted; that's what roam is).
+
+** D5 — Write boundary: read-shared, write-scoped (option C)
+- State: accepted (Craig, 2026-06-10: "Your recommendation C is the right one.")
+- Context: the KB is personal and replicates to every Syncthing machine — Craig confirmed that includes a work machine. The leak risk is asymmetric and lives on the write side: a work agent writing confidential facts would pool them into the personal store.
+- Decision: We will let any project read the shared KB; only personal projects write to it. Work agents write to work's own project tree, never the shared KB. Craig confirmed C handles the work-machine replication acceptably.
+- Consequences: easier — reading value lands everywhere, confidential work data stays physically out of the KB; harder — work knowledge has no shared home (status quo, unchanged), and the inverse risk (personal facts surfacing in work artifacts) remains governed by the existing content-scope rules in =commits.md=.
+
+** D6 — Agent writes land freely in the KB, and only there
+- State: accepted (Craig, 2026-06-10)
+- Context: Syncthing has no git-style review gate; the alternative was a staging tag (=:agent:inbox:=) Craig promotes from.
+- Decision: We will let agent writes land freely in the KB without a review gate. This autonomy is scoped to the KB alone — it is not permission to send email, comment on Linear tickets, or post to any public or external channel; those still require Craig's review and consent.
+- Consequences: easier — no promotion queue to tend, knowledge lands immediately; harder — a bad write syncs everywhere before anyone reviews it (mitigated by the =:agent:= inventory and Craig's normal roam curation).
+
+** D7 — Harness memory stays as the capture layer
+- State: accepted (Craig, 2026-06-10: "keep")
+- Context: with the KB live, harness memory (=~/.claude/projects/<enc>/memory/= — the per-project store the harness auto-loads into context at session start) could either stay or retire. *Keep* preserves automatic relevance recall at zero query cost, at the price of two stores plus the promotion habit. *Retire* would mean one store and no promotion step, but recall stops being automatic and session start gets heavier.
+- Decision: We will keep harness memory as the ephemeral capture layer. D2 stands as written, and Phase 3's promotion cadence is required, not optional — it's what keeps the capture layer from silting up.
+- Consequences: easier — automatic recall keeps working, no harness behavior changes; harder — two stores and a promotion discipline (mitigated by Phase 3's mechanical wrap-up trigger).
+
+* Implementation phases
+
+Not started — Craig has explicitly held implementation pending his go-ahead.
+
+** Phase 1 — Pointer rule
+Confirm the work-root denylist contents with Craig, then write =claude-rules/knowledge-base.md=: path, the canonical query commands (conflict-file exclusion included), the D4 schema, the classification + write-routing rules, the refusal contract, and the D5/D6 boundary. =make install= links it machine-wide via the existing RULES glob — no Makefile change. Tree stays working throughout (pure addition).
+
+** Phase 2 — Seed node + index verification
+Craig supplies or approves the durable fact; the implementer writes exactly one node under =~/sync/org/roam/= per the schema (a genuine durable fact, not a test stub). Craig runs =org-roam-db-sync= and confirms it indexes and displays cleanly. Rollback if the schema fails: delete that one timestamped =:agent:= file. This validates the schema end-to-end before agents write at volume.
+
+** Phase 3 — Promotion cadence
+Wire the promotion prompt into the wrap-up workflow (a "anything worth promoting to the KB?" check), and note the cadence in =knowledge-base.md=. Resolves D2's discipline risk with a mechanical trigger.
+
+* Acceptance criteria
+
+- [ ] =claude-rules/knowledge-base.md= exists with path, query method, write schema, and the ratified D5/D6 boundary.
+- [ ] An agent in a personal project can find a relevant prior note by querying the KB.
+- [ ] An agent-written node indexes cleanly in Craig's org-roam on the next =org-roam-db-sync= and is identifiable via the =:agent:= filetag.
+- [ ] The capture/promote split and its trigger are documented.
+- [ ] A work-project agent, asked to store a durable fact, writes it to work's own tree, not the KB.
+- [ ] An unknown-classification project, asked to store a durable fact, refuses the KB write and reports the redacted fact per the refusal contract rather than guessing.
+- [ ] The documented query commands find a known note and exclude =*.sync-conflict-*= files.
+
+* Readiness dimensions
+
+- Data model & ownership: KB nodes are user-curated (Craig) or agent-authored (=:agent:= tag); agents never edit Craig's hand-authored nodes, only link to them. Harness memory stays agent-owned and ephemeral.
+- Errors, empty states & failure: a missing =~/sync/org/roam/= (machine without Syncthing) means the rule's query step finds nothing — agents proceed without the KB and say so rather than fabricate recall. No write occurs to a nonexistent path.
+- Security & privacy: D5/D6 are the whole story — work never writes; KB writes never extend to public channels. No credentials live in the KB.
+- Observability: =rg '#\+filetags:.*:agent:'= inventories all agent writes; roam's UI shows them tagged in Craig's normal browsing.
+- Performance & scale: 484 files today; ripgrep over a few thousand org files is milliseconds. N/A as a concern.
+- Reuse & lost opportunities: maximal — the entire design is reusing an existing synced, curated store instead of building one.
+- Architecture fit & weak points: mirrors the patterns.md pointer-rule shape; the existing Makefile RULES glob installs the new rule with no Makefile change. Weak point is Syncthing conflict files (=*.sync-conflict-*=, 63 today) — excluded by the canonical query commands, not left to prose.
+- Config surface: one path constant in =knowledge-base.md=. No knobs.
+- Documentation plan: =knowledge-base.md= is the documentation; the spec records the why.
+- Dev tooling: N/A because the interface is ripgrep and Write — no build, no tests beyond Phase 2's manual index check.
+- Rollout, compatibility & rollback: pure addition; rollback is deleting the rule file and (optionally) the =:agent:=-tagged nodes, which the tag makes a one-command sweep.
+- External APIs & deps: none — plain files. Verified: =~/sync/org/roam/= exists with ~490 org files plus 63 conflict files, Syncthing-synced (2026-06-05 ground-truth check; recount in the 2026-06-10 Codex review).
+
+* Risks, Rabbit Holes, and Drawbacks
+
+- Un-reviewed writes propagate instantly (D6 accepted this). Dodge: the =:agent:= inventory keeps cleanup cheap.
+- Promotion discipline may not stick (D2). Dodge: Phase 3 makes it a mechanical wrap-up step rather than a memory burden.
+- Syncthing conflict files could confuse queries. Dodge: exclusion is baked into the canonical commands.
+- An incomplete work-root denylist would let a work project classify as personal. Dodge: Phase 1 starts by confirming the denylist with Craig, and the classification's safe default (unknown → refuse) covers anything outside the known parents.
+
+* Testing / Verification
+
+From the 2026-06-10 review, the verification surface for v1:
+
+- =make install= links =knowledge-base.md= into =~/.claude/rules/=.
+- In a personal repo, the documented =rg= command finds a known note.
+- In a work repo, a durable-storage request produces no write under =~/sync/org/roam/= and the refusal report names the fact.
+- In an unknown project, the agent refuses or asks rather than guessing.
+- One approved seed node indexes via =org-roam-db-sync= and appears in the =rg '#\+filetags:.*:agent:'= inventory.
+- A =*.sync-conflict-*= file containing a unique token is excluded by the documented query.
+
+The first, second, and last checks are agent-runnable; the org-roam display check and the work/unknown behavioral checks are Craig's manual validation (tracked in todo.org).
+
+* Review dispositions
+
+Modified recommendations from the 2026-06-10 Codex review, with reasons. Everything else was accepted as written.
+
+- *Drop-in =[#B]= implementation tasks as standalone top-level TODOs* — modified: the phase tasks hang as children under the existing parent task ("Check that memories are sync'd across machines via git"), per the one-parent-owns-the-effort convention in the response workflow. Content carried over intact.
+- *Update the fact count to the exact recursive number* — modified: the spec now says ~490 (Codex's own alternative), so routine KB growth doesn't churn the spec.
+- *Define the exact work-side write destination* — modified within the review's own options: v1 adds no new work-side store. Work projects keep their existing project-tree conventions, and the KB rule's only work-side behavior is the refusal + report.
+
+* Review and iteration history
+
+** 2026-06-05 Fri @ 05:57:35 -0500 — Claude (rulesets session) — author
+- What: initial one-page draft (five decisions, mechanics recommended), after Craig redirected the memory-sync task onto the existing org-roam KB.
+- Why: the dedicated-repo and two-tier designs both failed on the work/personal boundary or left memory at-risk; the KB already syncs.
+- Artifacts: original draft at docs/design/2026-06-05-org-roam-knowledge-base-spec.org (superseded by this file; content in git history).
+
+** 2026-06-10 Wed @ 14:29:20 -0500 — Craig Jennings (cj annotations) + Claude — author revision
+- What: folded in Craig's ratification answers (D5 = option C; Syncthing does replicate to a work machine and C stands; per-fact node granularity; free KB-only writes with the explicit no-public-channels boundary) and rewrote into the spec-create format. D7 (harness memory's fate) held open with the fuller explanation Craig requested.
+- Why: all but one decision ratified; the 2026-06-05 draft predated the spec-create template.
+- Artifacts: this file; implementation explicitly deferred pending Craig's go-ahead.
+
+** 2026-06-10 Wed @ 14:35:40 -0500 — Codex — reviewer
+- What changed or was recommended: reviewed implementation readiness and wrote a blocking review. The main blockers are unresolved D7 and the missing executable personal/work/unknown write-boundary classifier; medium notes cover concrete =rg= commands for conflict-file exclusion and seed-node approval/rollback mechanics.
+- Why: implementation would otherwise force the agent to invent memory architecture and confidentiality-boundary behavior at write time.
+- Artifacts: docs/agent-knowledge-base-spec-review.org (deleted on disposition completion per the response workflow; content summarized here and in Review dispositions).
+
+** 2026-06-10 Wed @ 14:39:41 -0500 — Claude Code (rulesets) — responder
+- What: processed the Codex review with Craig's D7 ratification ("keep") as a pre-agreed input. Both blockers cleared: D7 accepted (harness memory stays the capture layer, Phase 3 mandatory) and a new "Project classification and write routing" design subsection (work-root denylist as source of truth, unknown → refuse, refusal message contract, no new work-side store). Mediums accepted: canonical =rg= commands with conflict-file exclusion baked in, Phase 2 approval/rollback mechanics, Makefile no-change note, ~490 fact count, Testing/Verification section. Three recommendations modified (see Review dispositions); none rejected.
+- Why: converge to implementation-ready. Rubric: ready with caveats — the one caveat is confirming the work-root denylist contents with Craig before Phase 1 ships the rule.
+- Artifacts: this file; implementation-task breakdown under the parent task in todo.org; review file deleted.