diff options
| author | Craig Jennings <c@cjennings.net> | 2026-05-29 21:11:06 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-05-29 21:11:06 -0500 |
| commit | 39970b462c8198220f33ef7323725982723d2233 (patch) | |
| tree | 391c10e2be3207dbea2c1f4e01ca8e2944e18df8 /todo.org | |
| parent | 06b2c0716b51eb73298f569752dd1d81947d9961 (diff) | |
| download | archsetup-39970b462c8198220f33ef7323725982723d2233.tar.gz archsetup-39970b462c8198220f33ef7323725982723d2233.zip | |
chore(todo): file local-llm and uv install tasks; process inbox
Filed two new [#B] parent tasks. The local offline LLM runtime task carries design-decision and implementation children for resolving the open design questions alongside implementation work. The uv install task matches the existing eask/signal-cli tooling-codification shape — load-bearing for other projects, manually installed today, codify so fresh installs pick it up. Four cross-project handoffs moved to outbox.
Diffstat (limited to 'todo.org')
| -rw-r--r-- | todo.org | 52 |
1 files changed, 52 insertions, 0 deletions
@@ -96,6 +96,58 @@ A custom waybar module providing three time-keeping functions, surfaced in the b Implementation notes (to flesh out when picked up): waybar =custom= module(s) with =exec= polling or a persistent =exec= script emitting JSON; click actions to start/pause/reset; a small state file under =~/.local/state= or =~/.local/var=. Lives in the hyprland tier (=dotfiles/hyprland/.config/waybar/= + a backing script in =hyprland/.local/bin/=). TDD the backing script per testing.md. +** TODO [#B] Local offline LLM runtime + per-host model cache :tooling:llm: +:PROPERTIES: +:LAST_REVIEWED: 2026-05-29 +:END: +Add a local-LLM provisioning track so machines can run an offline coding agent when there's no network. Install =llama.cpp= (CPU + Vulkan where practical) and prefetch per-host model files while network is available; expose OpenAI-compat local endpoints (=127.0.0.1:8081= coding, =:8082= general; =:11434= reserved for =ollama= if used). Per the rulesets generic-agent-runtime design pass — rulesets becomes runtime-neutral and owns the runtime manifests + project instructions; archsetup owns machine provisioning + the per-machine model inventory. Source: handoff from rulesets 2026-05-28 ([[file:assets/outbox/2026-05-28-from-rulesets-local-llm-install.org][outbox copy]]). + +Per-host model targets (from the handoff): +- *ratio* (Strix Halo, 128 GiB) — Qwen3-Coder-30B Q6_K (default) + Q4_K_M (compat) + Qwen3-Next-80B Q4_K_M (long-context fallback). +- *velox* (i7-1370P, 64 GiB iGPU) — Qwen3-Coder-30B Q4_K_M + an 8B fallback for low-latency triage. + +Install behavior: prefetch idempotent (skip if file exists, match size/hash); download failure must NOT fail the install — surface a clear "local LLM support incomplete" follow-up instead. Ship a smoke-test command (boot endpoint + short prompt). + +Decisions to resolve before code: +*** TODO Decide model cache location: per-user vs system-wide +Handoff lists both =~/.local/share/llm/models= (per-user) and =/srv/models/llm= (system-wide). Per-user matches the existing archsetup user-config style and avoids root ownership of large model files. System-wide matches the "machine-local model inventory" phrasing and shares cache across users on multi-user boxes (not the case here — single user per machine). Pick one as the default; the other stays available via =LLM_MODEL_CACHE=. +*** TODO Decide whether =ollama= ships by default or is opt-in +Handoff calls =ollama= "optional". Likely shape: =llama.cpp= is the only mandatory runtime; =ollama= behind =INSTALL_OLLAMA= (default no) for users who prefer its model-manager API. Confirm. +*** TODO Define config keys for the LLM block in =archsetup.conf.example= +Likely: =INSTALL_LOCAL_LLM= (default yes), =LLM_RUNTIME= (=llama.cpp= / =ollama=), =LLM_MODEL_CACHE= (path), =LLM_MODELS= (space-separated, or empty → per-host autodetect). Lock names + defaults before writing install code. +*** TODO Decide per-host model selection: auto-detect by =uname -n= vs explicit =LLM_MODELS= +Auto-detect against a known-host table (ratio → Q6_K + 80B, velox → Q4_K_M + 8B) is simple for current machines but brittle for any new host (silently picks no models). Explicit =LLM_MODELS= per machine in =archsetup.conf= is more verbose but never surprises. Pick the default; the other stays available. +*** TODO Decide network-down behavior for model prefetch +Three shapes: (a) emit =error_warn= and write =/var/lib/archsetup/state/llm-models-pending= for inspection; (b) install a one-shot systemd unit that retries on next boot with network; (c) just log and forget — user re-runs the prefetch helper manually when network returns. + +Implementation work (gated on the decisions above): +*** TODO Install =llama.cpp= with CPU + Vulkan backend where supported +Add to the appropriate install section in =archsetup= (=llama.cpp= / =llama.cpp-vulkan= in AUR). Decide CPU-only vs Vulkan per host from the hardware detection already used for GPU drivers. +*** TODO Install =ollama= behind config flag (if Decision 2 = opt-in) +Add =ollama= package install gated on =INSTALL_OLLAMA=yes=. +*** TODO Configure shared model cache + OpenAI-compat local endpoints +Create =$LLM_MODEL_CACHE= with the right ownership; configure llama.cpp (and ollama if installed) to serve =127.0.0.1:8081= (coding) and =:8082= (general). Likely systemd user units; decide launcher pattern when implementing. +*** TODO Prefetch per-host models (idempotent, non-fatal on network failure) +Download the per-host model set (from Decision 4) into the cache; skip files that exist with matching size/hash. On failure, fall back per Decision 5. Models from HuggingFace GGUF mirrors (URLs locked at implementation time). +*** TODO Ship a local-LLM smoke-test command +Boot the configured endpoint and send a short prompt; surface success/failure + timing. Useful as both a post-install check and a triage tool when something later breaks. Likely =scripts/llm-smoke-test.sh=; runs at end of install if =INSTALL_LOCAL_LLM=yes=. + +Acceptance: fresh VM install of the ratio profile reaches an endpoint on =:8081= that answers a smoke prompt; velox profile gets Q4_K_M + 8B and answers a prompt within reasonable laptop latency; network-down install completes successfully with the pending-models warning surfaced. + +** TODO [#B] Add =uv= to the install playbook :tooling:python: +:PROPERTIES: +:LAST_REVIEWED: 2026-05-29 +:END: +Add =uv= (Astral's Python package + script runner) to archsetup so fresh machines pick it up automatically. Currently installed by hand on ratio + velox (=/usr/bin/uv= 0.11.15), not in the standard set — a fresh install would skip it, and project scripts using PEP 723 inline-script metadata (=#!/usr/bin/env -S uv run --script= shebangs) would fail with =env: uv: No such file or directory=. Source: handoff from health 2026-05-29 ([[file:assets/outbox/2026-05-29-1127-from-health-todo-a-add-uv-to-the-install-playbook.org][outbox copy]]). + +Health requested [#A] (load-bearing for the PEP 723 pattern they're promoting + the rulesets template-script proposal). Demoted to [#B] for archsetup: no current install is broken (uv is pre-installed everywhere it's needed), and the shape matches the existing [#B] tooling-codification tasks (eask, signal-cli) — load-bearing for other projects, manually installed today, codify so fresh installs pick it up. + +- *Install via pacman* — =uv= is in extra (=pacman -S uv=). Cleanest path; auto-updates with the rest of the system. AUR =uv-bin= and Astral's official installer are alternatives but add a non-pacman path to maintain. +- *Placement* — alongside the existing language-tooling block in =archsetup= (near =rustup=, =nvm=, or the Python set). Decide the exact section at implementation time. +- *Verification* — post-install =which uv && uv --version=; PEP 723 end-to-end check per the health handoff (=/tmp/uv-test.py= shebang script with inline =requests= dep). + +Related: the new [#B] LLM task above may grow scripts that benefit from PEP 723 (e.g. =scripts/llm-smoke-test.sh= if Python-based). =uv= landing here removes that friction. + ** DOING [#A] Separate dotfiles from archsetup SCHEDULED: <2026-05-21 Thu> :PROPERTIES: |
