Blog Post··4 min read
KV Cache Memory: Quantization, Eviction, and the Long-Context Problem
The KV cache is the memory bottleneck in LLM inference. As context length grows, it dominates GPU memory. Here's how quantization, eviction policies, and architectural changes manage it.