DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer
Title: DuDi: Leveraging Dual-Signal Distillation and Cross-Lingual Verbalizers
Abstract: While small language models (SLMs) offer efficiency and scalability, their multilingual proficiency deteriorates significantly at sub-billion parameter scales, particularly for Southeast Asian (SEA) languages. To address this, we present DuDi, a novel multilingual distillation framework that integrates an online sequence-level signal with both off-policy and on-policy token-level signals. Additionally, DuDi employs a cross-lingual verbalizer to enhance teacher feedback and boost the transferability between teacher and student models in multilingual contexts. Our experiments on the SEA-HELM benchmark, spanning various model families, scales, and teacher-student configurations, demonstrate that DuDi consistently surpasses competitive distillation baselines. Further ablation studies and analyses validate that sequence-level optimization, token-level supervision, and cross-lingual verbalization yield complementary and transferable learning signals, thereby improving multilingual SLMs.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






