arXiv

MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation

Title: MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation

Original: arXiv:2605.01374v2 Announce Type: replace

Abstract: While knowledge distillation serves as a primary method for compressing large language models (LLMs), current approaches typically restrict alignment to fixed layers or token-level outputs. This narrow focus overlooks the dynamic evolution of representations across network depth, resulting in weak guidance for students attempting to replicate the teacher’s internal relational structures and ultimately hindering effective knowledge transfer. To overcome this bottleneck, we introduce Multi-Granular Trajectory Alignment (MTA), a novel framework designed to align teacher and student representations throughout their layer-wise transformation journey. MTA employs a layer-adaptive mechanism: it aligns lower layers at the word level to safeguard lexical details, while operating at the phrase level—such as noun and verb phrases—in higher layers to better capture compositional semantics. We realize this concept via a Dynamic Structural Alignment loss, which synchronizes the relative geometry of semantic units within each layer. This architectural choice is supported by empirical evidence showing that Transformer representations grow more abstract as depth increases, aligning with linguistic theories that posit higher-level meaning arises from the composition of basic lexical elements. Additionally, we integrate a Hidden Representation Alignment loss to facilitate direct alignment between specific teacher and student layers. Our experimental results demonstrate that MTA consistently surpasses state-of-the-art baselines on standard benchmarks, with ablation studies validating the efficacy of each individual component.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...