Attention Calibration for Position-Fair Dense Information Retrieval
Title: Attention Calibration for Position-Fair Dense Information Retrieval
Abstract:
Existing dense retrieval models suffer from positional bias, a phenomenon where retrieval performance declines if relevant content is located toward the end of a passage (Zeng et al., 2025). This study investigates whether such bias can be mitigated during inference without requiring retraining or compromising overall retrieval quality. We adapt the inference-time attention calibration technique (Schuhmacher et al., 2026) for downstream retrieval tasks, introducing a strength coefficient ($\lambda$) to interpolate between uncalibrated and fully calibrated attention distributions.
Evaluating three embedding models on the SQuAD-PosQ and FineWeb-PosQ datasets, we analyze the impact of basket size, the selection of calibrated layers, and the strength coefficient on the balance between positional fairness and retrieval effectiveness. Our results indicate that partial calibration often yields better outcomes than full calibration. Specifically, a standardized configuration (basket size $B=128$, $\lambda=0.5$, and calibration at 50% layer depth) enhances the harmonic mean of nDCG@10 across positional groups on FineWeb-PosQ for all tested models. This approach requires no per-model tuning and is compatible with both -pooled and last-token-pooled architectures.
Furthermore, this default configuration transfers seamlessly to PosIR, a benchmark covering 10 languages and 31 domains. It successfully reduces the Position Sensitivity Index across all 16 combinations of length quartiles, models, and retrieval settings, while maintaining or improving aggregate nDCG@10 scores. We have open-sourced our extended codebase at https://github.com/impresso/fair-sentence-transformers.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



