ALINC: Active Learning for Inductive Node Classification via Graph Sampling
Title: ALINC: Active Learning for Inductive Node Classification via Graph Sampling
Abstract:
Active learning (AL) in the context of node classification has traditionally centered on identifying the most informative individual nodes for labeling within one or several large-scale graphs, a common scenario in social network analysis. Conversely, other fields such as electronic design automation and molecular chemistry often involve datasets comprising thousands of distinct, independent graphs. In these inductive settings, the process of annotating a single node necessitates a comprehensive graph-level analysis, which simultaneously provides labels for the remaining nodes. Consequently, these environments demand AL approaches that prioritize the selection of entire graphs rather than isolated nodes—a challenge that has previously remained unaddressed in academic literature.
To bridge this gap, we present ALINC, an active learning framework designed for inductive node classification through graph sampling. ALINC adapts node-level utility metrics into graph-level selection criteria by employing various aggregation mechanisms. Through a comprehensive benchmark evaluation featuring four datasets, three aggregation methods, and ten different strategies, we determined that CoreSet, TypiClust, and BADGE emerged as the most effective graph sampling strategies. Our in-depth analysis highlights that the selection of the aggregation method is critical, as it significantly influences both annotation costs and model performance. We further validate the utility of ALINC through two practical applications: predicting the site of metabolism in molecules and automating the design of printed circuit board schematics.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




