---
name: LLM instruction loopholes — ban by intent, not by command name
description: When instructing an LLM agent not to do something, enumerate all ways it could accomplish the prohibited action, not just the most obvious command names. LLMs find loopholes through synonyms.
type: feedback
---

When writing "never do X" rules for LLM agent prompts, list every tool/command that could achieve the prohibited action — not just the obvious ones.

**Why:** The Psyche prompt said "NEVER reply to COMMUNEs. Do not use $OWL reply or $OWL send." The Psyche used `$OWL deliver` instead — a different command that achieves the same result. The LLM interpreted the rule literally (those two commands are banned) rather than by intent (don't send any message back).

**How to apply:** When writing prohibition rules in agent prompts, use intent-first language ("do not send any message by any means") followed by explicit enumeration of known tools ("no $OWL deliver, reply, send, or any other messaging command"). Also state the expected behavior positively ("end your turn with no output and no tool calls").