\begin{figure*}[htbp]
    \centering
    % 子图 a: 边际收益
    \begin{subfigure}[b]{0.32\textwidth}
        \centering
        \includegraphics[width=\textwidth]{pics/insight_fig1.pdf}
        \caption{Marginal Spatial Gain}
        \label{fig:empirical_a}
    \end{subfigure}
    \hfill
    % 子图 b: 自相关
    \begin{subfigure}[b]{0.32\textwidth}
        \centering
        \includegraphics[width=\textwidth]{pics/insight_fig2.pdf}
        \caption{Locality of Importance}
        \label{fig:empirical_b}
    \end{subfigure}
    \hfill
    % 子图 c: 累积覆盖率对比
    \begin{subfigure}[b]{0.32\textwidth}
        \centering
        \includegraphics[width=\textwidth]{pics/insight_fig3.pdf}
        \caption{Cumulative Context Coverage}
        \label{fig:empirical_c}
    \end{subfigure}
    
    \caption{\textbf{Empirical Motivation for Redundancy-Penalized Eviction.} 
    \textbf{(a)} The marginal spatial coverage of Top-K declines due to score clustering. 
    \textbf{(b)} Autocorrelation analysis indicates that proxy importance scores are statistically localized. 
    \textbf{(c)} By suppressing local score clusters, 1D-NMS improves global segment coverage.}
    \label{fig:empirical_analysis}
\end{figure*}

\begin{figure}[htbp]
    \centering
    \includegraphics[width=0.48\textwidth]{pics/raw_attention_heatmaps.pdf}
    \caption{\textbf{Visualization of Localized Attention mass.} Raw attention weights across heads reveal clustered high-attention anchors (vertical stripes, highlighted by red dashed boxes) versus diffuse background attention (horizontal variance), motivating head-aware score calibration.}
    \label{fig:raw_attention_heatmaps}
\end{figure}
    
\subsection{Empirical Analysis and Observations}
\label{sec:empirical}

Current score-based KV cache eviction methods predominantly rely on a Top-K selection strategy, implicitly assuming that the marginal information gain of each token is independent~\cite{li2024snapkv,feng2026ada,kim2026kvzip,fastkv2026}. In this section, we reveal the breakdown of this modular assumption through a proxy-based spatial analysis.

To isolate the structural properties of token importance from heuristic biases, we utilize context reconstruction \cite{kim2026kvzip} as a proxy metric. We evaluate Qwen3-8B on the RULER benchmark, measuring each KV pair's contribution to the reconstruction of the original prompt. We extract importance scores across 500 samples to serve as a target profile for our spatial profiling.
Appendix~\ref{app:observation_details} provides the capture and analysis details for these observation experiments.

\textbf{Observation 1: Declining Marginal Spatial Coverage of Top-K.}
We first examine the efficiency of the standard Top-K strategy by measuring its \textit{marginal spatial gain}---the number of new document segments (from a total of 64) covered per selected token. As illustrated in Figure~\ref{fig:empirical_analysis}(a), the marginal spatial gain of Top-K exhibits a rapid decline, dropping significantly after only a few tokens ($K \approx 4$). 
This observation motivates \textbf{query-agnostic} compression, where the specific information requested by a future query is unknown. Under such uncertainty, an ideal strategy should act as a representative summary that preserves diverse semantic information across the context \cite{lin2011documentsummarization}. While random selection trivially achieves high spatial coverage, it often lacks the necessary \textit{information density} by allocating the strict cache budget to low-utility tokens. Conversely, Top-K tends to allocate the budget to isolated semantic clusters, potentially leaving a large portion of the document segments under-represented.

\textbf{Observation 2: Statistical Locality and Functional Diversity.}
To investigate the cause of this clustering, we compute the normalized autocorrelation $R(d)$ of the proxy importance scores. Figure~\ref{fig:empirical_analysis}(b) shows that token importance is highly correlated at short distances ($R(1) \approx 0.87$), suggesting that high-importance indicators are statistically localized. Under the modeling assumption that nearby tokens tend to share contextual information, this locality motivates a local-overlap view of cache utility. Furthermore, Figure~\ref{fig:raw_attention_heatmaps} visualizes raw attention weights, revealing: (1) shared high-attention columns (indicating tokens attended by many heads), and (2) head-specific concentration patterns (indicating that some heads are more selective than others).

\textbf{Observation 3: Balancing Importance and Coverage via Local Suppression.}
The identified locality suggests that we can improve global coverage while maintaining information density by penalizing local score clustering. We apply a 1D Non-Maximum Suppression (1D-NMS) filter to the proxy scores. This mechanism encourages the selection process to skip nearby neighbors of a selected ``hub'' and move to the next high-scoring region. 
As demonstrated in Figure~\ref{fig:empirical_analysis}(c), 1D-NMS achieves a $\sim 2.6\times$ higher cumulative segment coverage than Top-K. Unlike Random sampling, 1D-NMS retains local peaks (highest-importance tokens within their neighborhoods), thereby improving spatial diversity while still favoring high-scoring representatives. We use this hard-suppression experiment only as a diagnostic pilot; the final HubKV method softens the suppression through SMD and separately calibrates head-wise selectivity.
