chore(todo): file local-llm and uv install tasks; process inbox

Filed two new [#B] parent tasks. The local offline LLM runtime task carries design-decision and implementation children for resolving the open design questions alongside implementation work. The uv install task matches the existing eask/signal-cli tooling-codification shape — load-bearing for other projects, manually installed today, codify so fresh installs pick it up. Four cross-project handoffs moved to outbox.
author: Craig Jennings <c@cjennings.net> 2026-05-29 21:11:06 -0500
committer: Craig Jennings <c@cjennings.net> 2026-05-29 21:11:06 -0500
commit: 39970b462c8198220f33ef7323725982723d2233 (patch)
tree: 391c10e2be3207dbea2c1f4e01ca8e2944e18df8 /todo.org
parent: 06b2c0716b51eb73298f569752dd1d81947d9961 (diff)
download: archsetup-39970b462c8198220f33ef7323725982723d2233.tar.gz
archsetup-39970b462c8198220f33ef7323725982723d2233.zip
1 files changed, 52 insertions, 0 deletions
diff --git a/todo.org b/todo.org
index ae086a3..e2cfaae 100644
--- a/todo.org
+++ b/todo.org
@@ -96,6 +96,58 @@ A custom waybar module providing three time-keeping functions, surfaced in the b
 
 Implementation notes (to flesh out when picked up): waybar =custom= module(s) with =exec= polling or a persistent =exec= script emitting JSON; click actions to start/pause/reset; a small state file under =~/.local/state= or =~/.local/var=. Lives in the hyprland tier (=dotfiles/hyprland/.config/waybar/= + a backing script in =hyprland/.local/bin/=). TDD the backing script per testing.md.
 
+** TODO [#B] Local offline LLM runtime + per-host model cache :tooling:llm:
+:PROPERTIES:
+:LAST_REVIEWED: 2026-05-29
+:END:
+Add a local-LLM provisioning track so machines can run an offline coding agent when there's no network. Install =llama.cpp= (CPU + Vulkan where practical) and prefetch per-host model files while network is available; expose OpenAI-compat local endpoints (=127.0.0.1:8081= coding, =:8082= general; =:11434= reserved for =ollama= if used). Per the rulesets generic-agent-runtime design pass — rulesets becomes runtime-neutral and owns the runtime manifests + project instructions; archsetup owns machine provisioning + the per-machine model inventory. Source: handoff from rulesets 2026-05-28 ([[file:assets/outbox/2026-05-28-from-rulesets-local-llm-install.org][outbox copy]]).
+
+Per-host model targets (from the handoff):
+- *ratio* (Strix Halo, 128 GiB) — Qwen3-Coder-30B Q6_K (default) + Q4_K_M (compat) + Qwen3-Next-80B Q4_K_M (long-context fallback).
+- *velox* (i7-1370P, 64 GiB iGPU) — Qwen3-Coder-30B Q4_K_M + an 8B fallback for low-latency triage.
+
+Install behavior: prefetch idempotent (skip if file exists, match size/hash); download failure must NOT fail the install — surface a clear "local LLM support incomplete" follow-up instead. Ship a smoke-test command (boot endpoint + short prompt).
+
+Decisions to resolve before code:
+*** TODO Decide model cache location: per-user vs system-wide
+Handoff lists both =~/.local/share/llm/models= (per-user) and =/srv/models/llm= (system-wide). Per-user matches the existing archsetup user-config style and avoids root ownership of large model files. System-wide matches the "machine-local model inventory" phrasing and shares cache across users on multi-user boxes (not the case here — single user per machine). Pick one as the default; the other stays available via =LLM_MODEL_CACHE=.
+*** TODO Decide whether =ollama= ships by default or is opt-in
+Handoff calls =ollama= "optional". Likely shape: =llama.cpp= is the only mandatory runtime; =ollama= behind =INSTALL_OLLAMA= (default no) for users who prefer its model-manager API. Confirm.
+*** TODO Define config keys for the LLM block in =archsetup.conf.example=
+Likely: =INSTALL_LOCAL_LLM= (default yes), =LLM_RUNTIME= (=llama.cpp= / =ollama=), =LLM_MODEL_CACHE= (path), =LLM_MODELS= (space-separated, or empty → per-host autodetect). Lock names + defaults before writing install code.
+*** TODO Decide per-host model selection: auto-detect by =uname -n= vs explicit =LLM_MODELS=
+Auto-detect against a known-host table (ratio → Q6_K + 80B, velox → Q4_K_M + 8B) is simple for current machines but brittle for any new host (silently picks no models). Explicit =LLM_MODELS= per machine in =archsetup.conf= is more verbose but never surprises. Pick the default; the other stays available.
+*** TODO Decide network-down behavior for model prefetch
+Three shapes: (a) emit =error_warn= and write =/var/lib/archsetup/state/llm-models-pending= for inspection; (b) install a one-shot systemd unit that retries on next boot with network; (c) just log and forget — user re-runs the prefetch helper manually when network returns.
+
+Implementation work (gated on the decisions above):
+*** TODO Install =llama.cpp= with CPU + Vulkan backend where supported
+Add to the appropriate install section in =archsetup= (=llama.cpp= / =llama.cpp-vulkan= in AUR). Decide CPU-only vs Vulkan per host from the hardware detection already used for GPU drivers.
+*** TODO Install =ollama= behind config flag (if Decision 2 = opt-in)
+Add =ollama= package install gated on =INSTALL_OLLAMA=yes=.
+*** TODO Configure shared model cache + OpenAI-compat local endpoints
+Create =$LLM_MODEL_CACHE= with the right ownership; configure llama.cpp (and ollama if installed) to serve =127.0.0.1:8081= (coding) and =:8082= (general). Likely systemd user units; decide launcher pattern when implementing.
+*** TODO Prefetch per-host models (idempotent, non-fatal on network failure)
+Download the per-host model set (from Decision 4) into the cache; skip files that exist with matching size/hash. On failure, fall back per Decision 5. Models from HuggingFace GGUF mirrors (URLs locked at implementation time).
+*** TODO Ship a local-LLM smoke-test command
+Boot the configured endpoint and send a short prompt; surface success/failure + timing. Useful as both a post-install check and a triage tool when something later breaks. Likely =scripts/llm-smoke-test.sh=; runs at end of install if =INSTALL_LOCAL_LLM=yes=.
+
+Acceptance: fresh VM install of the ratio profile reaches an endpoint on =:8081= that answers a smoke prompt; velox profile gets Q4_K_M + 8B and answers a prompt within reasonable laptop latency; network-down install completes successfully with the pending-models warning surfaced.
+
+** TODO [#B] Add =uv= to the install playbook :tooling:python:
+:PROPERTIES:
+:LAST_REVIEWED: 2026-05-29
+:END:
+Add =uv= (Astral's Python package + script runner) to archsetup so fresh machines pick it up automatically. Currently installed by hand on ratio + velox (=/usr/bin/uv= 0.11.15), not in the standard set — a fresh install would skip it, and project scripts using PEP 723 inline-script metadata (=#!/usr/bin/env -S uv run --script= shebangs) would fail with =env: uv: No such file or directory=. Source: handoff from health 2026-05-29 ([[file:assets/outbox/2026-05-29-1127-from-health-todo-a-add-uv-to-the-install-playbook.org][outbox copy]]).
+
+Health requested [#A] (load-bearing for the PEP 723 pattern they're promoting + the rulesets template-script proposal). Demoted to [#B] for archsetup: no current install is broken (uv is pre-installed everywhere it's needed), and the shape matches the existing [#B] tooling-codification tasks (eask, signal-cli) — load-bearing for other projects, manually installed today, codify so fresh installs pick it up.
+
+- *Install via pacman* — =uv= is in extra (=pacman -S uv=). Cleanest path; auto-updates with the rest of the system. AUR =uv-bin= and Astral's official installer are alternatives but add a non-pacman path to maintain.
+- *Placement* — alongside the existing language-tooling block in =archsetup= (near =rustup=, =nvm=, or the Python set). Decide the exact section at implementation time.
+- *Verification* — post-install =which uv && uv --version=; PEP 723 end-to-end check per the health handoff (=/tmp/uv-test.py= shebang script with inline =requests= dep).
+
+Related: the new [#B] LLM task above may grow scripts that benefit from PEP 723 (e.g. =scripts/llm-smoke-test.sh= if Python-based). =uv= landing here removes that friction.
+
 ** DOING [#A] Separate dotfiles from archsetup
 SCHEDULED: <2026-05-21 Thu>
 :PROPERTIES:
author	Craig Jennings <c@cjennings.net>	2026-05-29 21:11:06 -0500
committer	Craig Jennings <c@cjennings.net>	2026-05-29 21:11:06 -0500
commit	39970b462c8198220f33ef7323725982723d2233 (patch)
tree	391c10e2be3207dbea2c1f4e01ca8e2944e18df8 /todo.org
parent	06b2c0716b51eb73298f569752dd1d81947d9961 (diff)
download	archsetup-39970b462c8198220f33ef7323725982723d2233.tar.gz archsetup-39970b462c8198220f33ef7323725982723d2233.zip