News

Distributed Memory (KV) Manager: Offloads and reloads inference data (particularly “keys and values” cache data from prior token generation) to lower-cost memory or storage tiers when appropriate.