Red teaming borrows from cybersecurity practices to stress-test AI systems. Red teams attempt to find vulnerabilities before malicious actors do.
Approaches
- Manual red teaming: Human experts craft adversarial inputs
- Automated red teaming: AI systems generate attack vectors
- Hybrid: AI-generated attacks refined by humans
What Red Teams Test
- Safety guardrail bypasses
- Harmful content generation
- Prompt injection vulnerabilities
- Factual accuracy under pressure