Semimage: HSV-Based Semantic Image Encoding for Disentangled Text Representation
Title: Semimage: HSV-Based Semantic Image Encoding for Disentangled Text Representation
Abstract:
This paper introduces SemImage, an innovative approach that transforms text documents into two-dimensional semantic images suitable for analysis by Convolutional Neural Networks (CNNs). Within this framework, every word in a document is mapped to a single pixel within a 2D grid. The structure organizes sentences into rows, with dynamically inserted boundary rows serving as markers for semantic transitions between sentences. Rather than utilizing standard RGB values, each pixel is defined as a vector within a disentangled HSV color space, where specific channels capture distinct linguistic attributes: the Hue channel, comprising components H_cos and H_sin to handle circularity, represents the topic; Saturation indicates sentiment; and Value reflects intensity or certainty.
To maintain this disentanglement, we employ a multi-task learning architecture. A ColorMapper network projects word embeddings into the HSV space, while auxiliary supervision targets the Hue and Saturation channels to predict topic and sentiment labels, complementing the primary task objective. The inclusion of these dynamically calculated boundary rows creates distinct visual separations in the image when adjacent sentences lack semantic similarity, thereby highlighting paragraph breaks. By combining SemImage with conventional 2D CNNs, such as ResNet, for document classification tasks, we demonstrate its efficacy. Evaluations on both multi-label datasets (featuring topic and sentiment annotations) and single-label benchmarks reveal that SemImage matches or exceeds the performance of robust text classification baselines, including hierarchical attention networks and BERT, while providing superior interpretability. Ablation studies underscore the critical roles of the multi-channel HSV representation and dynamic boundary rows. Furthermore, visualizations of SemImage qualitatively expose clear patterns associated with shifts in topic and sentiment, indicating that this representation renders these linguistic features accessible to both human observers and machine learners.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





