#+TITLE: Brainstorm: org-roam as shared human + agent knowledge base #+DATE: 2026-05-28 #+SOURCE: rulesets discussion with Codex #+TARGET: .emacs.d / ai-kb / org-roam agent tooling * Prompt If the rulesets agents had full access to org-roam, how would they best be able to use it? Would it be a good idea? Consider not only project-related documents, but also using org-roam as a shared knowledge base for Craig and the agents. * Bottom line Yes, this is a good idea, but only if org-roam is treated as a curated shared memory system rather than a dumping ground, a transcript archive, or a second task tracker. The best shape is: - org-roam holds durable linked knowledge; - =.ai/= holds operational/session workflow state; - =todo.org= holds current work; - =docs/= holds formal project documentation; - agents access org-roam through a small structured tool layer; - human and agent writes use the same safety boundary; - curation is periodic and human-gated. This is close to the current =ai-kb= direction in =.emacs.d=: a private git-backed org-roam store with a shared write protocol, summaries, provenance, lint, index generation, and Emacs browsing. * What org-roam should be for agents ** Durable cross-project memory Org-roam is strongest as the place for knowledge that should follow the agent across projects: - Craig's preferences; - recurring procedures; - engineering lessons; - architecture principles; - known gotchas; - durable decisions; - reusable workflows; - project relationships; - tool conventions; - "we tried X, rejected it because Y"; - notes that later agents should discover by relationship, not just by keyword. Why this helps: - Agents stop rediscovering context. - Preferences become durable and linkable. - Lessons learned in one project can influence another project. - The graph structure lets an agent discover consequences and related decisions rather than reading a flat memory file top to bottom. ** Project memory with links For project-related documents, org-roam should not replace the current project surfaces. It should complement them. Suggested split: | Surface | Purpose | |---------+---------| | =.ai/session-context.org= | current session facts, live recovery, wrap-up archive | | =todo.org= | current tasks and commitments | | =docs/= | formal project docs, specs, architecture notes | | org-roam / ai-kb | durable concepts, decisions, procedures, lessons, relationships | Example: - A session discovers "startup must pull rulesets before project repos." That might be logged in =.ai/session-context.org= today. - If it becomes a durable rule or reusable lesson, the agent writes an org-roam node linking to related startup, sync, and failure-recovery nodes. - A future project can point to that node by ID instead of duplicating the rule. ** Shared human + agent knowledge base The best version is not "agent memory" separate from Craig's tools. It is a shared knowledge base: - Craig can browse and edit with org-roam, backlinks, graph, node-find, and normal Emacs affordances. - Agents can query, show, link, remember, and curate through structured tools. - Both sides see the same source of truth. - Every write has provenance so later readers know whether a fact was user-stated, observed, inferred, or externally sourced. This is the main value over a flat =MEMORY.md=. * How agents should use it ** Query before acting Agents should query org-roam before: - choosing a convention; - making an architectural recommendation; - writing a new workflow; - changing rulesets behavior; - touching personal preferences; - solving a problem that looks familiar; - giving advice where Craig's past decisions may matter; - contradicting an existing decision; - starting a multi-step procedure that may already exist. They should not load the whole graph. The retrieval path should be: 1. read the tiny adapter rule; 2. query the generated index / CLI; 3. inspect summaries; 4. open only the relevant nodes; 5. follow backlinks only when they look useful. Why this helps: - Keeps token usage low. - Gives agents the right memory at the right time. - Avoids start-of-session context floods. ** Read backlinks as context Backlinks are where org-roam gives agents more than a search index. When an agent reads a node, it should be able to ask: - What decisions depend on this? - What procedures reference this? - Which projects are affected? - Is this preference superseded? - What gotchas are related? - What unresolved contradictions exist? This supports reasoning by graph neighborhood rather than by raw text search. ** Write only durable, general knowledge Agents should write unprompted only when the knowledge is: - durable: useful beyond this session; - general enough: useful across projects or likely to recur; - not re-derivable cheaply from code/git/docs; - not a secret; - not routine status. Good writes: - "Craig prefers no popup choice menus; present numbered options inline." - "When moving an org subtree to roam, write and verify the target before cutting the source." - "Rulesets install artifacts are symlinks globally but copied per-project for language bundles." - "Use ID-first pointers to ai-kb nodes because titles and filenames can change." Bad writes: - today's status; - every session summary; - task lists; - raw chat transcripts; - temporary debugging observations; - guesses with no provenance; - secrets or credentials. ** Use contradiction handling, not silent overwrite If a new observation conflicts with an existing node, agents should not silently replace the old node. Better flow: 1. mark the new claim and old claim as contested, or create a contested note; 2. explain the contradiction; 3. ask Craig whether to update, scope as an exception, supersede, or reject. Example: Existing memory says "no popup menus." A new workflow proposes popup choice menus. The agent should surface the conflict and ask whether this is an exception or a rejected design. Why this helps: - Prevents model drift from rewriting preferences. - Makes changing a durable preference explicit. - Keeps the KB trustworthy. * Tool layer agents should get Agents should not use raw unrestricted file access as their primary interface. They should get a compact API over the org-roam store. Suggested tools / CLI commands: - =ai-kb query =: ranked search over index, titles, tags, summaries, properties, and body. - =ai-kb show =: resolve ID first and print/open the node. - =ai-kb backlinks =: list nodes linking to a node, excluding generated index and raw captures. - =ai-kb remember=: write using the full protocol. - =ai-kb lint=: structural and semantic validation. - =ai-kb index=: regenerate the index. - =ai-kb status=: fast state for dashboard/startup. - =ai-kb doctor=: deeper health check. - =ai-kb curate --dry-run=: report duplicates, orphans, contested nodes, stale nodes, raw bloat. Why this helps: - Agents compose predictable operations. - Humans can test behavior. - Token usage drops because agents can request structured summaries instead of reading many org files. - Safety gates live in one place. * Required node shape Every shared KB node should have enough structure for retrieval, trust, and maintenance. Suggested required properties: #+begin_src org :PROPERTIES: :ID: :PROJECTS: :general: :rulesets: :CREATED: 2026-05-28 :UPDATED: 2026-05-28 :CREATED_BY: codex :CONFIDENCE: user-stated :VISIBILITY: personal :SOURCE: chat 2026-05-28 :STATUS: current :SUMMARY: One sentence written for retrieval and index display. :END: #+title: Concise node title #+filetags: :principle:preference: #+end_src Important conventions: - =:ID:= is the durable identity. Titles and filenames may change. - =:SUMMARY:= is required because query/index should not infer it. - =:CREATED_BY:= and =:CONFIDENCE:= separate user-stated knowledge from model inference. - =:STATUS:= supports =current=, =contested=, =superseded=. - =:VISIBILITY:= keeps privacy boundaries visible. - relation labels in body links can express =SUPERSEDES=, =CONTRADICTS=, =RELATES_TO=, =IMPLEMENTS=, =DERIVED_FROM=. * Human and agent writes need one safety boundary If both Craig and agents edit the KB, there should be exactly one write path. For agents: 1. fetch / fast-forward if safe; 2. write; 3. regenerate index; 4. run full lint; 5. scan for secrets; 6. commit locally; 7. push later or via timer; 8. surface push failures. For human Emacs edits: - an =ai-kb= minor mode should run the same post-save sequence; - save should not be blocked or made read-only; - lint failure should leave the buffer editable, avoid committing, and surface findings in a buffer/modeline/dashboard; - a clean re-save commits. Why this helps: - Human edits cannot bypass the integrity model. - Agent writes cannot introduce malformed nodes silently. - The git history becomes the recovery layer. * Personal roam boundary There are two different ideas that should not be collapsed accidentally: 1. =ai-kb=: shared human/agent operational knowledge. 2. Craig's personal org-roam: personal notes, journals, recipes, dailies, knowledge graph. Recommended default: - keep =ai-kb= as a separate private org-roam repo; - give agents rich access to =ai-kb=; - give agents narrower, permissioned access to personal roam; - bridge personal roam explicitly only when desired. Why: - Personal journals and private notes should not become agent scratch space. - Agent writes can pollute a personal graph if not isolated. - A separate repo makes sync, recovery, curation, and privacy easier. Still valuable personal-roam tools: - resolve topic to node; - return node body plus backlinks; - list nodes by tag; - surface dailies for a date range; - create notes via org-capture templates. But these should be structured affordances, not freeform agent mutation. * Curation workflow Any agent-written KB will rot unless curated. Add a periodic curation workflow that reports: - duplicate nodes; - orphan nodes; - stale nodes; - contested nodes; - superseded nodes still referenced; - over-broad nodes to split; - raw captures with no compiled node; - raw files that are too large; - external pointers that need repointing after a merge. Rules: - agents can propose merges/splits/deletions; - Craig confirms destructive changes; - merges must repoint inbound =[[id:]]= and external =ai-kb: Title (UUID)= pointers; - curation stamps =:LAST_CURATED:= or equivalent. Why this helps: - Keeps the graph useful. - Prevents "AI memory" from turning into sediment. - Makes trust a maintained property, not a one-time design claim. * What to avoid - Do not let every session summary become a roam node. - Do not store secrets, credentials, tokens, or private keys. - Do not treat org-roam as the task system. - Do not load the whole graph at startup. - Do not let agents rewrite/delete/merge nodes without human confirmation. - Do not mix personal journals and agent KB by default. - Do not rely on org-roam's SQLite database as the agent source of truth. Files and IDs should be canonical; SQLite is a browsing cache. - Do not let generated index files create semantic backlinks. - Do not let raw external captures become primary query results. * Best architecture Use three layers: ** Operational layer =.ai/=, workflows, session logs, =todo.org=. This layer answers: - What are we doing now? - What happened this session? - What workflow should run? - What tasks are open? ** Knowledge layer =ai-kb= as a private git-backed org-roam repo. This layer answers: - What should agents remember long-term? - What principles, procedures, and preferences apply? - What related decisions exist? - What has been superseded or contested? ** Adapter layer Thin rules/tools for Claude, Codex, local models, and Emacs. This layer answers: - How does this runtime query memory? - How does this runtime write safely? - How does this runtime respect the same contract? * Best ideas to carry forward 1. Treat org-roam as shared long-term semantic memory, not transcript storage. 2. Keep =ai-kb= separate from personal roam by default. 3. Give agents structured tools: query, show, backlinks, remember, lint, status, curate. 4. Require summaries and provenance on every node. 5. Use ID-first links and pointers. 6. Query before acting when durable preferences or prior decisions may matter. 7. Use backlinks for graph-neighborhood context discovery. 8. Make contradiction handling explicit. 9. Run human and agent writes through the same lint/index/commit path. 10. Make curation periodic and human-gated. * Possible next task Convert this brainstorm into a concrete design delta for the existing =docs/design/ai-kb.org= and the open =Implement ai-kb= task: - add agent query triggers; - specify personal-roam access boundaries; - define the structured tool interface for personal roam vs =ai-kb=; - add contradiction handling to the agent contract; - add curation acceptance criteria; - decide whether any subset of personal roam should be readable by default.