Segment-driven Structural Induction and Semantic Alignment for Heterogeneous Tabular Representation
Title: Structural Induction and Semantic Alignment in Heterogeneous Tabular Data via Segments
Abstract: In many real-world scenarios, tables are heterogeneous, featuring varying headers despite sharing underlying attribute semantics. This complexity hinders the ability to derive domain-specific meanings relying solely on table-local evidence. While current encoders address aspects of this challenge, they frequently neglect column-level value distributions and employ uniform objectives for attributes that hold different semantic roles. To address these limitations, we introduce NAVI, a pretraining framework centered on segments. This approach considers each header-value pair as the fundamental unit for synthesizing both schema-level structural evidence and column-level distributional evidence. We implement this architecture through Masked Segment Modeling and Entropy-driven Segment Alignment, techniques that simultaneously ensure structured coupling between headers and values and facilitate semantic alignment across both stable and instance-specific attributes. Our experiments on heterogeneous in-domain tables demonstrate enhanced performance in reconstruction, semantic consistency, and downstream utility across various evaluation metrics.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





