S. Roy

Tag: rl

Blog Post··7 min read

RL for Agentic Systems

Single-turn RL teaches a model to produce good responses. Agentic RL teaches it to complete multi-step tasks in an environment — with delayed rewards, partial observability, and real consequences.