arXiv

Beyond Text and Tables: Vision-Language Model Integration in ComProScanner for Extracting Materials Data from Scientific Figures with High Accuracy

June 2, 2026 · Aritra Roy, Enrico Grisan, Chiara Gattinoni, John Buckeridge · Original Source

Title: Enhancing Materials Data Extraction: Vision-Language Model Integration in ComProScanner for High-Accuracy Figure Analysis

The automated extraction of composition-property data from scientific literature has seen significant progress through large language model-based pipelines. However, current frameworks are largely confined to textual and tabular information, neglecting the vast amount of quantitative property data presented exclusively in scientific figures. To address this gap, we have enhanced ComProScanner—a fully end-to-end, multi-agent framework designed for the automated construction of composition-property databases—by integrating native vision-language model (VLM) capabilities for figure extraction.

This upgrade incorporates a FigureExtractor utility that filters figures based on captions and keywords across all supported publishers. Additionally, a GraphExtractorTool agent forwards these extracted figures to a configurable VLM to retrieve composition-property pairs from scientific charts and plots. We evaluated four VLMs, selected based on the LMArena Diagram leaderboard and a cost constraint of less than $1.50 per million tokens.

Testing was conducted on a corpus of 50 piezoelectric ceramic articles from the established $d_{33}$ test dataset. The results indicate that Gemini-3-Flash-Preview delivered the superior performance, achieving a composition accuracy of 0.97 and a normalized F1 score of 0.97. Notably, it remained the most cost-effective option among the four models assessed. Furthermore, we introduced a range-based value error threshold parameter to the evaluation framework. This adjustment offers a more physically meaningful evaluation of numeric property values extracted from figures compared to strict exact-value matching.

These advancements position VLM-integrated ComProScanner as the first materials-specific, fully automated, multimodal literature mining platform. It uniquely enables the extraction of structured composition-property data from text, tables, and figures within a single, unified pipeline.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC