docs(commands): fix prompt-engineering citation and add an eval harness

Two audit fixes. The Meincke citation had the wrong title and was used to imply persuasion framing improves prompt quality. It now reads as the safety caution it is: applying the principles raised an LLM's compliance with objectionable requests from ~33% to ~72%, a reason for care, not a recipe. The correct title ("Call Me A Jerk...") and SSRN id are fixed in all three spots. Critique mode also gains an eval-harness step: for fragile or production prompts, run 3-5 adversarial examples against the old and new prompt and record the delta, so quality is verified rather than asserted.
author: Craig Jennings <c@cjennings.net> 2026-05-22 14:46:02 -0500
committer: Craig Jennings <c@cjennings.net> 2026-05-22 14:46:02 -0500
commit: e469a3ba641f227d65bca1d6c616cf4d8d6fb869 (patch)
tree: 686f6b9dc213798db8ff79772f667edbf786b4b4
parent: e2f04c16197afdc694b7ff66a18916e64e5db6b3 (diff)
download: rulesets-e469a3ba641f227d65bca1d6c616cf4d8d6fb869.tar.gz
rulesets-e469a3ba641f227d65bca1d6c616cf4d8d6fb869.zip
1 files changed, 5 insertions, 3 deletions
diff --git a/.claude/commands/prompt-engineering.md b/.claude/commands/prompt-engineering.md
index 8e03367..9dff9a0 100644
--- a/.claude/commands/prompt-engineering.md
+++ b/.claude/commands/prompt-engineering.md
@@ -1,5 +1,5 @@
 ---
-description: Craft prompts (commands, hooks, skill descriptions, sub-agent instructions, system prompts, one-shot requests to other LLMs) that do what they're meant to and resist common failure modes. Covers four moves that determine whether a prompt holds up: classify the prompt type (discipline-enforcing / guidance / collaborative / reference) to pick the right tone and techniques; apply the persuasion framework appropriate to that type (seven principles from Meincke et al. 2025, including which to avoid — notably Liking, which breeds sycophancy); match task fragility to degrees of freedom (high/medium/low); and spend the context window like a shared resource. Also contains a brief reference for classical techniques (few-shot, chain-of-thought, system prompts, templates). Use both in design mode (asking for help writing a new prompt from scratch) and critique mode (paste a draft, get it rewritten to resist common failure modes). Do NOT use for prose editing unrelated to LLM prompts (use a writing skill), for implementing application code that uses an LLM (different scope), or for content moderation / prompt-injection defense (adjacent but separate domain).
+description: Craft prompts (commands, hooks, skill descriptions, sub-agent instructions, system prompts, one-shot requests to other LLMs) that do what they're meant to and resist common failure modes. Covers four moves that determine whether a prompt holds up: classify the prompt type (discipline-enforcing / guidance / collaborative / reference) to pick the right tone and techniques; apply the persuasion framework appropriate to that type (seven principles, including which to avoid — notably Liking, which breeds sycophancy); match task fragility to degrees of freedom (high/medium/low); and spend the context window like a shared resource. Also contains a brief reference for classical techniques (few-shot, chain-of-thought, system prompts, templates). Use both in design mode (asking for help writing a new prompt from scratch) and critique mode (paste a draft, get it rewritten to resist common failure modes). Do NOT use for prose editing unrelated to LLM prompts (use a writing skill), for implementing application code that uses an LLM (different scope), or for content moderation / prompt-injection defense (adjacent but separate domain).
 disable-model-invocation: true
 ---
 
@@ -76,7 +76,7 @@ LLMs are parahuman — they were trained on human text that's full of persuasion
 
 ### The Seven Principles
 
-Adapted from Meincke et al., *Persuasion and Compliance in Large Language Models* (2025, N≈28,000 conversations). Two-principle combinations shifted compliance rates from ~33% to ~72%.
+These are the seven Cialdini persuasion principles, framed here for prompt design. One caution before using them: Meincke et al., *Call Me A Jerk: Persuading AI to Comply with Objectionable Requests* (2025, N≈28,000 conversations) found that applying these same principles raised an LLM's compliance with *objectionable* requests from ~33% to ~72%. That is a prompt-safety finding, not evidence that persuasion framing makes engineering prompts better. The principles below are a vocabulary for matching tone to prompt type — not a lever to pull for higher compliance. Stronger framing hardens whatever behavior the prompt encodes, including the wrong one, which is exactly why the wrong principle for the type (Liking on a collaborative prompt) backfires.
 
 - **Authority** — Non-negotiable framing. "YOU MUST", "Never", "Always", "No exceptions."
 - **Commitment** — Force explicit action. "Announce the rule you're applying before applying it." "Output your checklist with each item checked."
@@ -169,6 +169,7 @@ When handed an existing draft:
 4. **Check token economy.** Redundancy, restated instructions, unnecessary preamble.
 5. **Check for footguns** from the anti-patterns list.
 6. **Rewrite** — show before/after. Name the changes.
+7. **For fragile or reusable prompts, verify the rewrite — don't assert it.** A prompt that runs once for a throwaway task can ship on the rewrite alone. A prompt that gates discipline, gets reused, or runs in production cannot. Write 3-5 adversarial or edge-case inputs — the cases most likely to make the prompt fail: the exception the discipline rule must refuse, the ambiguous scope the trigger must catch, the input that tempts a sycophantic answer. Run both the old prompt and the new prompt against each, and record the observed behavioral delta. Without examples, "the rewrite is better" is an assertion, not a result.
 
 ## Ethics Test
 
@@ -202,6 +203,7 @@ Before declaring a prompt done:
 - [ ] Each sentence earns its tokens
 - [ ] Trigger phrases and (for reference prompts) explicit negative conditions are present
 - [ ] Sycophancy traps (for collaborative) and rationalization loopholes (for discipline) are closed
+- [ ] For fragile or reusable/production prompts, 3-5 adversarial examples were run against old and new, and the behavioral delta is recorded
 - [ ] The ethics test passes
 
 ## Output
@@ -223,6 +225,6 @@ No long explanations. The prompt itself should demonstrate the principles, not n
 
 ## References
 
-- Meincke, L., et al. (2025). *Persuasion and Compliance in Large Language Models.* N≈28,000 AI conversations; 7 persuasion principles; compliance shifts of ~2x with appropriate combinations.
+- Meincke, L., et al. (2025). *Call Me A Jerk: Persuading AI to Comply with Objectionable Requests.* SSRN abstract_id=5357179. N≈28,000 AI conversations; the seven Cialdini principles roughly doubled compliance with objectionable requests (~33% → ~72%). Read as a prompt-safety caution — persuasion framing can override an LLM's reluctance — not as a recipe for better engineering prompts.
 - Anthropic prompt engineering guidance — context window as shared resource; progressive disclosure; degrees-of-freedom framing.
 - Classical prompt engineering literature (few-shot, CoT, system prompts) — assumed background; not re-taught here.
author	Craig Jennings <c@cjennings.net>	2026-05-22 14:46:02 -0500
committer	Craig Jennings <c@cjennings.net>	2026-05-22 14:46:02 -0500
commit	e469a3ba641f227d65bca1d6c616cf4d8d6fb869 (patch)
tree	686f6b9dc213798db8ff79772f667edbf786b4b4
parent	e2f04c16197afdc694b7ff66a18916e64e5db6b3 (diff)
download	rulesets-e469a3ba641f227d65bca1d6c616cf4d8d6fb869.tar.gz rulesets-e469a3ba641f227d65bca1d6c616cf4d8d6fb869.zip