Tag: safety

Blog Post·2024-06-19·5 min read

Red-Teaming vs Automated Evals: Tradeoffs and When to Use Each

Human red-teaming finds attacks automated evals miss. Automated evals achieve scale humans can't. Here's how to combine them, and what each can and can't tell you.

responsible-ai red-teaming evaluation safety llm

Blog Post·2024-06-19·5 min read

Runtime Guardrails: Architecture Patterns for Production AI Safety

Training-time alignment is not enough. Production AI systems need runtime layers that detect, intercept, and respond to harmful inputs and outputs. Here's how to build them.

responsible-ai safety guardrails production llm

Blog Post·2024-06-19·6 min read

Designing Safety Benchmarks for LLMs: What Makes an Eval Good

Most safety benchmarks are gameable, distribution-shifted, or measure the wrong thing. Here's what separates a rigorous safety evaluation from a checkbox.

responsible-ai safety evaluation benchmarking