High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
12 points by jchandra 3 days ago | 1 comments
vivahir215 3 days ago
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
replyjchandra 3 days ago
[dead]
reply