arXiv

AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification

June 3, 2026 · Anling Xiang, Yuwen Yang, Yang Shen · Original Source

Title: AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification

Abstract:

While scholarly text classification facilitates literature organization, subject indexing, and research intelligence, Chinese scholarly corpora frequently suffer from imbalanced data and semantically adjacent disciplinary labels. To address these challenges, we introduce AutoTail-BSFGM, a class-balance-aware fine-tuning approach. This method integrates an automatically gated tail-prior adjustment, a weak Balanced Softmax auxiliary loss, and Fast Gradient Method (FGM) adversarial regularization. Notably, AutoTail-BSFGM modifies only the training objective and procedure; during inference, it utilizes the same single base-size encoder and linear classifier as the corresponding label-smoothed baseline.

We assessed the proposed method on two tasks derived from the CSL dataset: an abstract-to-discipline classification involving 67 labels, and a title-to-category task comprising 13 categories. For the primary abstract task, AutoTail-BSFGM enhanced both validation and lockbox accuracy when applied to Chinese RoBERTa-WWM and MacBERT-base models. Specifically, using MacBERT-base, validation accuracy rose by 0.83 percentage points and lockbox accuracy by 0.49 points, with a pooled paired McNemar test indicating statistical significance on validation (p = 0.023). In the title-to-category task, the method increased validation accuracy by 0.70 points and validation balanced accuracy by 2.64 points. While lockbox accuracy remained approximately neutral, lockbox balanced accuracy improved by 1.22 points. These findings indicate a bounded contribution: AutoTail-BSFGM enhances class-balance-sensitive performance and delivers consistent improvements for abstract-based scholarly classification, although it does not uniformly boost every metric across all splits.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC