MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining
Title: MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining
Original: arXiv:2604.24374v2 Announce Type: replace
Abstract: While representation learning serves as a cornerstone of Natural Language Processing (NLP), developing embeddings that maintain efficacy across varying computational constraints remains a significant hurdle. Matryoshka Representation Learning (MRL) addresses this by offering a flexible inference approach utilizing nested embeddings. Nevertheless, acquiring such structures necessitates deliberate coordination regarding how information is distributed across both embedding dimensionality and model depth. To address this, we introduce MIPIC (Matryoshka Representation Learning via Self-Distilled Intra-Relational Alignment and Progressive Information Chaining), a comprehensive training framework aimed at generating Matryoshka representations that are both structurally coherent and semantically dense.
MIPIC ensures structural consistency across dimensions through Self-Distilled Intra-Relational Alignment (SIA). This mechanism aligns the geometric and attention-based relationships at the token level between complete and truncated representations, employing top-k CKA self-distillation. In tandem, the framework facilitates semantic consolidation across depth via Progressive Information Chaining (PIC). PIC operates as a scaffolded alignment strategy that progressively transfers established task semantics from deeper layers to earlier ones. Comprehensive evaluations on STS, NLI, and classification benchmarks—covering a wide range of models from TinyBERT to BGEM3 and Qwen3—showcase that MIPIC produces Matryoshka representations that are highly competitive across all capacity levels. Notably, the method demonstrates substantial performance gains in scenarios involving extreme low-dimensional constraints.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





