arXiv

SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

Title: SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

Abstract:

Although pretraining techniques have expanded the context window sizes of large language models (LLMs), these models continue to struggle with effectively processing real-world long-context data. This limitation stems largely from inadequate long-context alignment, which is driven by poor data quality, training inefficiencies, and the absence of well-structured optimization objectives. To overcome these hurdles, we introduce Sh\textbf{o}rt-to-\textbf{Lo}ng \textbf{P}reference \textbf{O}ptimization (SoLoPO). Backed by both theoretical analysis and empirical validation, our framework decouples long-context preference optimization (PO) into two distinct phases: short-context PO and short-to-long reward alignment (SoLo-RA).

The short-context PO component utilizes preference pairs derived from shorter contexts to bolster the model’s capacity to utilize contextual information. Concurrently, SoLo-RA promotes consistency in reward scores for responses conditioned on both short and long contexts, provided they contain the same task-relevant information. This mechanism effectively transfers the model’s proficiency with short contexts into long-context scenarios. SoLoPO is designed to be compatible with existing mainstream preference optimization algorithms, significantly streamlining both data construction and training workflows. Our experiments demonstrate that integrating SoLoPO improves all tested algorithms, yielding superior generalization across length and domain in various long-context benchmarks, while also delivering substantial gains in computational and memory efficiency.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.