Tag: reasoning

Blog Post·2024-06-19·7 min read

RL as a Skill Acquisition Engine

The reward signal determines what the model learns to do. Swap the reward, swap the capability. Here's how RL elicits reasoning, code generation, math, and tool use.

rl reasoning code llm-training