Ethics & SafetyFeatured
AI Alignment
Definition
The challenge of ensuring AI systems behave in accordance with human intentions, values, and goals.In-Depth Explanation
Alignment research aims to solve problems like specification (defining what we want), robustness (maintaining behavior under distribution shift), and assurance (verifying alignment). As AI systems become more capable, alignment becomes increasingly critical for safety.
Real-World Example
RLHF is one alignment technique used to make language models follow instructions helpfully and safely.
0 views0 found helpful