Blog Post··7 min read
RL as a Skill Acquisition Engine
The reward signal determines what the model learns to do. Swap the reward, swap the capability. Here's how RL elicits reasoning, code generation, math, and tool use.
The reward signal determines what the model learns to do. Swap the reward, swap the capability. Here's how RL elicits reasoning, code generation, math, and tool use.