Blog Post··4 min read
Continuous Batching: How vLLM Serves Thousands of Requests
Static batching wastes GPU capacity whenever sequences finish at different times. Continuous batching fixes this by treating the decode loop as a queue — adding new requests the moment a slot opens up.