chore(todo): file local-llm and uv install tasks; process inbox

Filed two new [#B] parent tasks. The local offline LLM runtime task carries design-decision and implementation children for resolving the open design questions alongside implementation work. The uv install task matches the existing eask/signal-cli tooling-codification shape — load-bearing for other projects, manually installed today, codify so fresh installs pick it up. Four cross-project handoffs moved to outbox.
author: Craig Jennings <c@cjennings.net> 2026-05-29 21:11:06 -0500
committer: Craig Jennings <c@cjennings.net> 2026-05-29 21:11:06 -0500
commit: 39970b462c8198220f33ef7323725982723d2233 (patch)
tree: 391c10e2be3207dbea2c1f4e01ca8e2944e18df8
parent: 06b2c0716b51eb73298f569752dd1d81947d9961 (diff)
download: archsetup-39970b462c8198220f33ef7323725982723d2233.tar.gz
archsetup-39970b462c8198220f33ef7323725982723d2233.zip
5 files changed, 232 insertions, 0 deletions
diff --git a/assets/outbox/2026-05-28-from-rulesets-local-llm-install.org b/assets/outbox/2026-05-28-from-rulesets-local-llm-install.org
new file mode 100644
index 0000000..c3cbdaa
--- /dev/null
+++ b/assets/outbox/2026-05-28-from-rulesets-local-llm-install.org
@@ -0,0 +1,88 @@
+#+TITLE: Install local offline LLM runtime and model cache
+#+DATE: 2026-05-28
+#+SOURCE_PROJECT: rulesets
+#+REQUEST_TYPE: install-feature
+#+STARTUP: showall
+
+* Request
+
+Please add local offline LLM support to =archsetup='s normal install process so
+machines can run a local coding agent when there is no network.
+
+This came from the =rulesets= generic-agent-runtime design pass. =rulesets=
+should become runtime-neutral, but it needs =archsetup= to provision the local
+model runtime and prefetch model files while network is available.
+
+* Hardware-specific recommendations
+
+** High-end Strix Halo machine
+
+Detected with =inxi=:
+
+- AMD Ryzen AI Max+ 395
+- 128 GiB RAM
+- Radeon 8060S / Strix Halo unified memory
+
+Install:
+
+- Default offline coding model:
+  =Qwen3-Coder-30B-A3B-Instruct-GGUF=, prefer =Q6_K= on this machine.
+- Compatibility quant:
+  =Qwen3-Coder-30B-A3B-Instruct-GGUF Q4_K_M=.
+- Larger general/long-context fallback:
+  =Qwen3-Next-80B-A3B-Instruct-GGUF Q4_K_M=.
+
+** velox
+
+Detected with =ssh velox inxi -C -G -m -S --filter=:
+
+- Intel Core i7-1370P
+- 64 GiB RAM
+- Intel Iris Xe integrated graphics
+
+Install:
+
+- Strongest practical offline coding default:
+  =Qwen3-Coder-30B-A3B-Instruct-GGUF Q4_K_M=.
+- Add an 8B fallback model for quick edits and low-latency triage.
+
+Expect =velox= to be CPU/low-end-iGPU bound. The 30B model fits, but latency
+will be the limiting factor.
+
+* Runtime stack
+
+Recommended packages/components:
+
+- =llama.cpp= with CPU and Vulkan support where practical.
+- Optional =ollama= as a simple model manager/API for workflows that prefer it.
+- A shared local model cache, e.g. =~/.local/share/llm/models= or
+  =/srv/models/llm=.
+- OpenAI-compatible local endpoints:
+  - coding model on =127.0.0.1:8081=
+  - larger/general model on =127.0.0.1:8082= when installed
+  - leave =127.0.0.1:11434= for =ollama= if used
+
+* Install behavior
+
+- Install runtime packages during normal setup.
+- Prefetch model files when network is available.
+- Make model download idempotent: skip if exact file already exists.
+- Do not make the install fail hard if model download is unavailable; surface a
+  clear follow-up saying local offline LLM support is incomplete.
+- Add a smoke test command that starts the local endpoint and asks a short prompt.
+
+* Why this belongs in archsetup
+
+=rulesets= can provide the runtime manifests, launcher behavior, and project
+instructions, but it should not own machine provisioning. =archsetup= already
+owns package installation and per-host setup, so it is the right place to install
+=llama.cpp=/=ollama= and maintain the machine-local model inventory.
+
+* Sources checked
+
+- Qwen3-Coder 30B GGUF quant listings show Q4_K_M around 18.6 GB and Q6_K around
+  25.1 GB.
+- Qwen3-Next 80B GGUF model card shows Q4_K_M around 48.4 GB and native 262K
+  context.
+- =llama.cpp= supports CPU and GPU backends including Vulkan/HIP/ROCm; keep the
+  backend configurable per host.
diff --git a/assets/outbox/2026-05-29-1111-from-health-todo-b-install-python-genanki-system.org b/assets/outbox/2026-05-29-1111-from-health-todo-b-install-python-genanki-system.org
new file mode 100644
index 0000000..3cf53ec
--- /dev/null
+++ b/assets/outbox/2026-05-29-1111-from-health-todo-b-install-python-genanki-system.org
@@ -0,0 +1,29 @@
+#+TITLE: * TODO [#B] Install ~python-genanki~ system-wide :install:py
+#+SOURCE: from health
+#+DATE: 2026-05-29 11:11:48 -0500
+
+* TODO [#B] Install ~python-genanki~ system-wide :install:python:
+
+Add ~genanki~ to the Python package set so projects can generate Anki ~.apkg~ decks from org-drill files without per-project venvs.
+
+** Why
+A converter script in the health project (~health-drill-to-anki.py~) emits an Anki package from an org-drill source file — same workflow as ~/sync/org/drill/~ but pushed into Anki for mobile review. The script is ~100 lines and runs anywhere Python has ~genanki~ available. The pattern is likely to spread: deepsat already has a drill file, more will follow.
+
+Right now Arch's PEP 668 enforcement blocks ~pip install --user genanki~, so the script needs a throwaway venv to run. Solving once at the system level removes that friction across every project.
+
+** Install options (pick one)
+- *Arch official repo or AUR.* ~pacman -S python-genanki~ if it's in extra/community, or ~yay -S python-genanki~ from AUR. Cleanest. Auto-updates with the rest of the system.
+- *pipx.* ~pipx install genanki~ — but genanki is a library, not a CLI app, so pipx is a stretch. Skip.
+- *System-wide pip with ~--break-system-packages~.* Works but circumvents PEP 668. Last resort.
+
+Recommendation: try pacman/AUR first, fall back to a managed venv at a known path (e.g. ~/opt/python-tools/~) that scripts can shebang into.
+
+** Verification
+After install:
+#+begin_src bash
+python3 -c "import genanki; print(genanki.__version__)"
+#+end_src
+Should print a version like ~0.13.1~ without traceback.
+
+** Cross-reference
+- A companion message went to rulesets' inbox proposing the export script as a template script under ~claude-templates/.ai/scripts/~. That decision depends on this install landing first — without ~genanki~ available system-wide, the template script can't run out of the box.
diff --git a/assets/outbox/2026-05-29-1114-from-health-cancelled-install-python-genanki-system.org b/assets/outbox/2026-05-29-1114-from-health-cancelled-install-python-genanki-system.org
new file mode 100644
index 0000000..877eb18
--- /dev/null
+++ b/assets/outbox/2026-05-29-1114-from-health-cancelled-install-python-genanki-system.org
@@ -0,0 +1,23 @@
+#+TITLE: * CANCELLED Install ~python-genanki~ system-wide — supersede
+#+SOURCE: from health
+#+DATE: 2026-05-29 11:14:51 -0500
+
+* CANCELLED Install ~python-genanki~ system-wide — superseded :install:python:
+
+The prior message in this inbox (also dated 2026-05-29) asked for a system-wide ~python-genanki~ install to remove pip/venv friction. That request is no longer needed.
+
+** Why
+The converter script in the health project moved to ~uv~ + PEP 723 inline script metadata. Shebang is ~#!/usr/bin/env -S uv run --script~ and the deps are declared in the script header. ~uv~ handles resolve / install / cache transparently on first run (~5ms cached). No system package needed for ~genanki~ specifically.
+
+** Implication for archsetup
+- *No genanki install required.* The earlier TODO can be deleted or marked CANCELLED.
+- *~uv~ is now a load-bearing tool for this pattern.* Make sure ~uv~ stays in the Arch base setup. Currently installed via ~/usr/bin/uv~ (version 0.11.15). If it's already part of the standard package set, nothing to do. If it's only installed ad-hoc on this machine, worth adding to the install playbook.
+
+** Verification
+#+begin_src bash
+which uv && uv --version
+#+end_src
+Should print a path and version like ~uv 0.11.15~ without error.
+
+** Cross-reference
+A follow-up went to rulesets' inbox with the updated implementation pattern and a renewed recommendation to formalize the converter as a template script.
diff --git a/assets/outbox/2026-05-29-1127-from-health-todo-a-add-uv-to-the-install-playbook.org b/assets/outbox/2026-05-29-1127-from-health-todo-a-add-uv-to-the-install-playbook.org
new file mode 100644
index 0000000..9828f0b
--- /dev/null
+++ b/assets/outbox/2026-05-29-1127-from-health-todo-a-add-uv-to-the-install-playbook.org
@@ -0,0 +1,40 @@
+#+TITLE: * TODO [#A] Add ~uv~ to the install playbook — load-bearing 
+#+SOURCE: from health
+#+DATE: 2026-05-29 11:27:58 -0500
+
+* TODO [#A] Add ~uv~ to the install playbook — load-bearing for project tooling :install:python:
+
+Add ~uv~ (Astral's Python package + script runner) to the archsetup install steps so every future machine setup picks it up automatically. Currently installed on this machine at ~/usr/bin/uv~ (~0.11.15~), but it's not yet part of the standard set — if a fresh install skipped it, project scripts that depend on it would fail silently or break at first run.
+
+** Why this matters now
+Several projects are moving Python tooling onto ~uv~ with PEP 723 inline script metadata. The pattern is: a script declares its deps in a ~# /// script~ header and shebangs ~#!/usr/bin/env -S uv run --script~, so it self-runs without venvs or pip. First example landed in the health project (~health-drill-to-anki.py~), and the rulesets follow-up recommends promoting that pattern as a template script under ~claude-templates/.ai/scripts/~ — meaning ~uv~ would become a dependency for any project pulling templates.
+
+Without ~uv~ in the install playbook, fresh machines would hit ~env: uv: No such file or directory~ on any script using this pattern. The PEP 668 detour (system pip blocked, manual venvs everywhere) was exactly what ~uv~ eliminates.
+
+** Install
+Arch official: ~pacman -S uv~. If extra/community doesn't carry it, AUR has it as ~uv-bin~ or build-from-source. Astral's official installer is also an option but adds a non-pacman path to maintain.
+
+Pacman is cleanest — auto-updates with the rest of the system, lives under ~/usr/bin/uv~.
+
+** Verification (post-install)
+#+begin_src bash
+which uv && uv --version
+#+end_src
+Should print ~/usr/bin/uv~ and a version string. To verify the PEP 723 pattern works end-to-end:
+#+begin_src bash
+cat > /tmp/uv-test.py <<'EOF'
+#!/usr/bin/env -S uv run --script
+# /// script
+# requires-python = ">=3.11"
+# dependencies = ["requests"]
+# ///
+import requests
+print("ok")
+EOF
+chmod +x /tmp/uv-test.py && /tmp/uv-test.py
+#+end_src
+First run resolves and caches ~requests~; subsequent runs are instant.
+
+** Related
+- Earlier message in this inbox CANCELLED the ~python-genanki~ install request — that one is no longer needed because ~uv~ + PEP 723 covers it.
+- The rulesets inbox carries the broader template-script proposal.
diff --git a/todo.org b/todo.org
index ae086a3..e2cfaae 100644
--- a/todo.org
+++ b/todo.org
@@ -96,6 +96,58 @@ A custom waybar module providing three time-keeping functions, surfaced in the b
 
 Implementation notes (to flesh out when picked up): waybar =custom= module(s) with =exec= polling or a persistent =exec= script emitting JSON; click actions to start/pause/reset; a small state file under =~/.local/state= or =~/.local/var=. Lives in the hyprland tier (=dotfiles/hyprland/.config/waybar/= + a backing script in =hyprland/.local/bin/=). TDD the backing script per testing.md.
 
+** TODO [#B] Local offline LLM runtime + per-host model cache :tooling:llm:
+:PROPERTIES:
+:LAST_REVIEWED: 2026-05-29
+:END:
+Add a local-LLM provisioning track so machines can run an offline coding agent when there's no network. Install =llama.cpp= (CPU + Vulkan where practical) and prefetch per-host model files while network is available; expose OpenAI-compat local endpoints (=127.0.0.1:8081= coding, =:8082= general; =:11434= reserved for =ollama= if used). Per the rulesets generic-agent-runtime design pass — rulesets becomes runtime-neutral and owns the runtime manifests + project instructions; archsetup owns machine provisioning + the per-machine model inventory. Source: handoff from rulesets 2026-05-28 ([[file:assets/outbox/2026-05-28-from-rulesets-local-llm-install.org][outbox copy]]).
+
+Per-host model targets (from the handoff):
+- *ratio* (Strix Halo, 128 GiB) — Qwen3-Coder-30B Q6_K (default) + Q4_K_M (compat) + Qwen3-Next-80B Q4_K_M (long-context fallback).
+- *velox* (i7-1370P, 64 GiB iGPU) — Qwen3-Coder-30B Q4_K_M + an 8B fallback for low-latency triage.
+
+Install behavior: prefetch idempotent (skip if file exists, match size/hash); download failure must NOT fail the install — surface a clear "local LLM support incomplete" follow-up instead. Ship a smoke-test command (boot endpoint + short prompt).
+
+Decisions to resolve before code:
+*** TODO Decide model cache location: per-user vs system-wide
+Handoff lists both =~/.local/share/llm/models= (per-user) and =/srv/models/llm= (system-wide). Per-user matches the existing archsetup user-config style and avoids root ownership of large model files. System-wide matches the "machine-local model inventory" phrasing and shares cache across users on multi-user boxes (not the case here — single user per machine). Pick one as the default; the other stays available via =LLM_MODEL_CACHE=.
+*** TODO Decide whether =ollama= ships by default or is opt-in
+Handoff calls =ollama= "optional". Likely shape: =llama.cpp= is the only mandatory runtime; =ollama= behind =INSTALL_OLLAMA= (default no) for users who prefer its model-manager API. Confirm.
+*** TODO Define config keys for the LLM block in =archsetup.conf.example=
+Likely: =INSTALL_LOCAL_LLM= (default yes), =LLM_RUNTIME= (=llama.cpp= / =ollama=), =LLM_MODEL_CACHE= (path), =LLM_MODELS= (space-separated, or empty → per-host autodetect). Lock names + defaults before writing install code.
+*** TODO Decide per-host model selection: auto-detect by =uname -n= vs explicit =LLM_MODELS=
+Auto-detect against a known-host table (ratio → Q6_K + 80B, velox → Q4_K_M + 8B) is simple for current machines but brittle for any new host (silently picks no models). Explicit =LLM_MODELS= per machine in =archsetup.conf= is more verbose but never surprises. Pick the default; the other stays available.
+*** TODO Decide network-down behavior for model prefetch
+Three shapes: (a) emit =error_warn= and write =/var/lib/archsetup/state/llm-models-pending= for inspection; (b) install a one-shot systemd unit that retries on next boot with network; (c) just log and forget — user re-runs the prefetch helper manually when network returns.
+
+Implementation work (gated on the decisions above):
+*** TODO Install =llama.cpp= with CPU + Vulkan backend where supported
+Add to the appropriate install section in =archsetup= (=llama.cpp= / =llama.cpp-vulkan= in AUR). Decide CPU-only vs Vulkan per host from the hardware detection already used for GPU drivers.
+*** TODO Install =ollama= behind config flag (if Decision 2 = opt-in)
+Add =ollama= package install gated on =INSTALL_OLLAMA=yes=.
+*** TODO Configure shared model cache + OpenAI-compat local endpoints
+Create =$LLM_MODEL_CACHE= with the right ownership; configure llama.cpp (and ollama if installed) to serve =127.0.0.1:8081= (coding) and =:8082= (general). Likely systemd user units; decide launcher pattern when implementing.
+*** TODO Prefetch per-host models (idempotent, non-fatal on network failure)
+Download the per-host model set (from Decision 4) into the cache; skip files that exist with matching size/hash. On failure, fall back per Decision 5. Models from HuggingFace GGUF mirrors (URLs locked at implementation time).
+*** TODO Ship a local-LLM smoke-test command
+Boot the configured endpoint and send a short prompt; surface success/failure + timing. Useful as both a post-install check and a triage tool when something later breaks. Likely =scripts/llm-smoke-test.sh=; runs at end of install if =INSTALL_LOCAL_LLM=yes=.
+
+Acceptance: fresh VM install of the ratio profile reaches an endpoint on =:8081= that answers a smoke prompt; velox profile gets Q4_K_M + 8B and answers a prompt within reasonable laptop latency; network-down install completes successfully with the pending-models warning surfaced.
+
+** TODO [#B] Add =uv= to the install playbook :tooling:python:
+:PROPERTIES:
+:LAST_REVIEWED: 2026-05-29
+:END:
+Add =uv= (Astral's Python package + script runner) to archsetup so fresh machines pick it up automatically. Currently installed by hand on ratio + velox (=/usr/bin/uv= 0.11.15), not in the standard set — a fresh install would skip it, and project scripts using PEP 723 inline-script metadata (=#!/usr/bin/env -S uv run --script= shebangs) would fail with =env: uv: No such file or directory=. Source: handoff from health 2026-05-29 ([[file:assets/outbox/2026-05-29-1127-from-health-todo-a-add-uv-to-the-install-playbook.org][outbox copy]]).
+
+Health requested [#A] (load-bearing for the PEP 723 pattern they're promoting + the rulesets template-script proposal). Demoted to [#B] for archsetup: no current install is broken (uv is pre-installed everywhere it's needed), and the shape matches the existing [#B] tooling-codification tasks (eask, signal-cli) — load-bearing for other projects, manually installed today, codify so fresh installs pick it up.
+
+- *Install via pacman* — =uv= is in extra (=pacman -S uv=). Cleanest path; auto-updates with the rest of the system. AUR =uv-bin= and Astral's official installer are alternatives but add a non-pacman path to maintain.
+- *Placement* — alongside the existing language-tooling block in =archsetup= (near =rustup=, =nvm=, or the Python set). Decide the exact section at implementation time.
+- *Verification* — post-install =which uv && uv --version=; PEP 723 end-to-end check per the health handoff (=/tmp/uv-test.py= shebang script with inline =requests= dep).
+
+Related: the new [#B] LLM task above may grow scripts that benefit from PEP 723 (e.g. =scripts/llm-smoke-test.sh= if Python-based). =uv= landing here removes that friction.
+
 ** DOING [#A] Separate dotfiles from archsetup
 SCHEDULED: <2026-05-21 Thu>
 :PROPERTIES:
author	Craig Jennings <c@cjennings.net>	2026-05-29 21:11:06 -0500
committer	Craig Jennings <c@cjennings.net>	2026-05-29 21:11:06 -0500
commit	39970b462c8198220f33ef7323725982723d2233 (patch)
tree	391c10e2be3207dbea2c1f4e01ca8e2944e18df8
parent	06b2c0716b51eb73298f569752dd1d81947d9961 (diff)
download	archsetup-39970b462c8198220f33ef7323725982723d2233.tar.gz archsetup-39970b462c8198220f33ef7323725982723d2233.zip