Safety layers provide defense in depth, catching problems that slip through other safeguards.
Types
- Input classifiers
- Output filters
- Action validators
- Anomaly detectors
Design Principles
- Fail closed (block if uncertain)
- Log all interventions
- Regular updates
- Human escalation paths