arXiv

Physical Plausibility Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance

June 3, 2026 · Zhiyuan Hu, Zheng Sun, Yi Wei, Long Yu · Original Source

Title: Elevating Compact Models with Superior Performance through HCM-GRPO for Physical Plausibility Reasoning

Abstract:

While image generation capabilities have seen remarkable advancements in recent years, the field of image screening remains underexplored. Specifically, the application of Multimodal Large Language Models (MLLMs) in this domain yields unsatisfactory results, primarily due to data scarcity and the models' limited capacity for physical plausibility reasoning. To tackle these challenges, this study presents a holistic solution addressing both data availability and methodological frameworks.

On the data front, we have compiled a robust image screening dataset containing over 128,000 samples, which encompass approximately 640,000 images. Each entry features an original image alongside four generated variants. This dataset is designed to assess physical plausibility reasoning across four distinct dimensions: appearance deformation, physical shadows, placement layout, and extension rationality. To ensure high-quality chains of thought (CoT) data are acquired in the most cost-effective way, we explored various annotation strategies, ranging from purely manual and fully automated processes to answer-driven annotations.

Methodologically, we introduce Hard Cases Mining (HCM) combined with a Dynamic Proportional Accuracy (DPA) reward mechanism into the Group Relative Policy Optimization (GRPO) framework, creating a novel approach termed HCM-GRPO. This enhanced methodology exhibits significantly stronger physical plausibility reasoning capabilities than the standard GRPO. Our experimental findings highlight that even state-of-the-art closed-source MLLMs, including GPT5.2 and Gemini3-Pro, struggle with physical plausibility reasoning. In contrast, by utilizing HCM-GRPO, our compact model achieves scores that exceed those of both large-scale open-source models and top-tier closed-source alternatives.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC