Blog Post··7 min read
vLLM Cache Metrics: KV Cache Usage, Prefix Cache Hit Rate, and the Block Pool
Two numbers determine whether a vLLM deployment is healthy: KV cache usage and prefix cache hit rate. This post explains what they measure, how vLLM computes them from its block pool, and what the LRU evictor does when memory runs out.