Prompt injection exploits the fact that LLMs process instructions and data in the same input stream, making it hard to distinguish legitimate from malicious commands.
Attack Vectors
- Direct injection in user messages
- Indirect injection via retrieved content
- Jailbreaks that disable safety features
- Context manipulation
Defenses
- Input sanitization
- Instruction hierarchy
- Output filtering
- Anomaly detection