docs: reshape todo backlog and add buttercup evaluation

Walked the open-task backlog twice tonight. The first pass was a content audit (is each task still factually accurate?). The second was a relevance/priority review. Together they surfaced enough drift to be worth landing as one batch rather than dribbling into the next session. The audit filed three new completion-task parents, each with an audit-finding body and child-task recommendations: F-key Completion (roughly 75% shipped per evidence), Terminal GPG pinentry Completion (no trace of the prior branch on this machine, treat as fresh), and Localrepo Documentation (build is shipped, docs land in three artifacts, four gap-fix tasks spin out as siblings). Headline-indicators-wrap and Buttercup closed DONE, Rework-dev-F-keys cancelled as superseded. Manual Testing and Validation became its own top-level task with the 10 misfiled verify children moved in. Walk started tonight (tests 1 and 2 verified, two signel bugs surfaced and fixed in the same session), deferred to 2026-05-29 for the message-sending tests. The Buttercup eval doc captures the rubric I came to during the brainstorm: adopt the moment a project crosses the "test reader is no longer the test author at write-time" threshold. ERT stays the right default until then. None of my projects have crossed yet. Lint pass resolved all 21 org-lint follow-ups inline rather than letting them accumulate in a hidden inbox: 5 wrong-prefix design-doc links (../docs/design/X.org should have been docs/design/X.org), 4 file:line bare references wrapped in code formatting, 1 timestamp moved out of org-link brackets, 1 nested src block converted to begin_example, the wttrin diagnostic's stale link replaced with a note about where the surviving record lives, 8 markdown-bold patterns converted to org italics, 2 verbatim ** TODO references trimmed so the linter stops misreading them as headings.
author: Craig Jennings <c@cjennings.net> 2026-05-28 02:46:24 -0500
committer: Craig Jennings <c@cjennings.net> 2026-05-28 02:46:24 -0500
commit: 1c93820893a3aa831684e2322c606e54b0bba40c (patch)
tree: b60a93a12dfb87a1a29231e1ab13687fa05afff6 /docs
parent: a00628ac96d5f09f96be868755304f7bdfdc0d99 (diff)
download: dotemacs-1c93820893a3aa831684e2322c606e54b0bba40c.tar.gz
dotemacs-1c93820893a3aa831684e2322c606e54b0bba40c.zip
1 files changed, 117 insertions, 0 deletions
diff --git a/docs/design/buttercup-evaluation.org b/docs/design/buttercup-evaluation.org
new file mode 100644
index 00000000..394f0dbe
--- /dev/null
+++ b/docs/design/buttercup-evaluation.org
@@ -0,0 +1,117 @@
+#+TITLE: Buttercup Evaluation
+#+AUTHOR: Craig Jennings
+#+DATE: 2026-05-28
+
+* Purpose
+
+Decide whether to adopt Buttercup for BDD-style testing over the project's current ERT-only baseline. Output of the 2026-04-26 brainstorm reminder that flagged the original one-liner Buttercup task as too thin to act on.
+
+The verdict is here at the top; the rubric, evidence, and trigger conditions follow. Re-read this when a project crosses the threshold described below.
+
+* Verdict
+
+Not yet — for any project Craig owns as of 2026-05-28. ERT is enough.
+
+Adopt Buttercup the moment a project crosses the adoption threshold described below. Until then the migration cost is real and the value it would unlock is theoretical.
+
+* Adoption Threshold
+
+The test reader is no longer the test author at write-time.
+
+That one line is the whole rubric. Every project archetype where Buttercup wins reduces to a way this threshold gets crossed.
+
+** What crossing the threshold looks like
+
+- An outside contributor opens a PR — they read the tests to understand the API surface before touching code.
+- The project ships through MELPA / a package archive — consumers read tests as documentation when something breaks.
+- An upstream PR is opened against another package — the reviewer reads the diff and any accompanying tests.
+- A second machine starts pulling the same local checkout — future-Craig on a different host counts.
+- A README declares a stated public API (=cj/foo-thing= is a "supported command", not just internal scratch) — consumers exist by definition.
+- A solo project ages past working-memory windows (six-plus months between substantive sessions) — future-self counts as a second reader and the spec-as-documentation value of describe/it grows with age.
+
+* Strengths Buttercup Brings Over ERT
+
+** Narrative test structure
+Nested =describe / it / expect= reads top-to-bottom as a scenario. "Given a connected daemon, given a contact selected, when the user sends a message, expect…" — the structure IS the spec.
+
+ERT's flat dispense-by-name leaves related tests as siblings with no visible grouping. =test-signel-connect=, =test-signel-message=, =test-signel-disconnect= are visually adjacent in a file but the relationship doesn't show up to a cold reader.
+
+** Built-in spies
+=spy-on= with =:and-return-value=, =:and-call-fake=, =:and-call-through=, =:to-have-been-called-with=, =:to-have-been-called-times= collapses =cl-letf= scaffolding linearly with mock count. Six lines of =cl-letf= for one mock become two of =spy-on=. The savings compound for tests that touch multiple side-effect boundaries — RPC sends, file writes, process sentinels.
+
+** Expressive matchers
+=:to-be=, =:to-equal=, =:to-match=, =:to-throw=, =:to-contain=, =:to-have-been-called-with= self-describe their failures: "expected X but got Y". Under ERT you write that prose yourself in a =should= message or you live with bare assertion text.
+
+** Async support
+The =done= callback finalizes asynchronous tests cleanly — process sentinels, network handlers, timers, event-loop work. ERT's equivalent is =accept-process-output= polling, =while-no-input= idioms, or =sit-for= sleeps.
+
+** Setup hooks at every depth
+=before-each=, =after-each=, =before-all=, =after-all= scope to each =describe= block, with guaranteed cleanup even on test failure. The pattern of =unwind-protect= around every test collapses to one declaration.
+
+** Random test order
+On by default. Catches order-dependent tests. ERT runs in declaration order — coupling between tests silently survives until the day someone reorders.
+
+** Ecosystem alignment
+Buttercup is the de facto MELPA-package testing standard. Clean JUnit XML output, runner CLI, GitHub Actions templates exist for it. The community CI workflows assume Buttercup, not ERT.
+
+* Project Archetypes Where Buttercup Wins
+
+Each is a different shape of "the test reader is no longer the test author at write-time."
+
+** MELPA-bound packages with outside contributors
+Community standard; new contributors expect =describe= / =it= / =expect=. CI templates land working.
+
+** Libraries with a public API surface
+The test file IS the spec. =describe('cj/signel-message')= reads as documentation for the next reader.
+
+** Heavily-integrated wrappers
+Slack, Signal, email, RPC, IPC. Anywhere four or five external boundaries get touched per test, =spy-on= eliminates the =cl-letf= weight that grows linearly with mock count.
+
+** Async-heavy code
+Process sentinels, network handlers, polling loops. The =done= callback is cleaner than sleep-and-assert.
+
+** Outside-in / BDD TDD
+The spec is written first as scenarios; tests evolve. =xit= / =xdescribe= pending markers and nested =describe= blocks let the whole spec be visible while only part of it is green.
+
+** Multi-developer projects
+Narrative test structure is easier for new contributors to read than a wall of =test-foo-bar-baz= function names.
+
+** Solo projects that outlive working memory
+Future-self counts as a second reader. The longer the gap between sessions, the more the spec-as-documentation value of =describe= / =it= matters.
+
+* Why ERT Stays the Right Default for Craig Today
+
+** ~/.emacs.d
+Public-mirrored to cjennings.net and GitHub, but the audience is "Craig + occasional curious onlooker," not "package consumer expecting reproducible install + reliable behavior." Single test consumer. ERT.
+
+** signel fork (~/code/signel)
+Local checkout. No remote. No upstream PR yet. Single test consumer. ERT.
+
+** Chime, org-msg, other local packages
+Local. No MELPA submission. No stated public-API README. Single test consumer. ERT.
+
+** Idiom alignment
+ERT is what Emacs itself uses. Code-near-Emacs-core stays cheaper to read and write in ERT.
+
+** Migration cost
+Sixty-plus ERT files in =~/.emacs.d/tests/= alone, plus the local package suites. Buttercup is a framework swap per project — not a per-file convert-as-you-go path the way some test migrations are.
+
+* When To Reach For This Doc Again
+
+Open this file the day any of the following happens:
+
+- Craig opens an upstream PR against keenban/signel
+- A Chime / org-msg / signel-fork MELPA recipe is submitted
+- Someone files an issue or opens a PR against =~/.emacs.d=
+- A second machine starts pulling a local-package checkout as a consumer
+- A README is added to a project declaring a public API surface
+- Six-plus months pass on a project and re-reading the ERT suite cold feels harder than it should
+
+That's the moment to revisit the verdict. The rubric doesn't change; the threshold flips.
+
+* References
+
+- Original task: =todo.org= — "Evaluate and integrate Buttercup for behavior-driven integration tests" (marked DONE 2026-05-28 with this doc as the evaluation deliverable)
+- Triggering reminder: 2026-04-26 entry in =.ai/notes.org= Active Reminders ("Buttercup eval brainstorm")
+- Buttercup project: https://github.com/jorgenschaefer/emacs-buttercup
+- ERT documentation: =info:ert=
author	Craig Jennings <c@cjennings.net>	2026-05-28 02:46:24 -0500
committer	Craig Jennings <c@cjennings.net>	2026-05-28 02:46:24 -0500
commit	1c93820893a3aa831684e2322c606e54b0bba40c (patch)
tree	b60a93a12dfb87a1a29231e1ab13687fa05afff6 /docs
parent	a00628ac96d5f09f96be868755304f7bdfdc0d99 (diff)
download	dotemacs-1c93820893a3aa831684e2322c606e54b0bba40c.tar.gz dotemacs-1c93820893a3aa831684e2322c606e54b0bba40c.zip