arXiv

Physical Plausibility Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance

Title: Elevating Compact Models with Superior Performance through HCM-GRPO for Physical Plausibility Reasoning

Abstract:

While image generation capabilities have seen remarkable advancements in recent years, the field of image screening remains underexplored. Specifically, the application of Multimodal Large Language Models (MLLMs) in this domain yields unsatisfactory results, primarily due to data scarcity and the models' limited capacity for physical plausibility reasoning. To tackle these challenges, this study presents a holistic solution addressing both data availability and methodological frameworks.

On the data front, we have compiled a robust image screening dataset containing over 128,000 samples, which encompass approximately 640,000 images. Each entry features an original image alongside four generated variants. This dataset is designed to assess physical plausibility reasoning across four distinct dimensions: appearance deformation, physical shadows, placement layout, and extension rationality. To ensure high-quality chains of thought (CoT) data are acquired in the most cost-effective way, we explored various annotation strategies, ranging from purely manual and fully automated processes to answer-driven annotations.

Methodologically, we introduce Hard Cases Mining (HCM) combined with a Dynamic Proportional Accuracy (DPA) reward mechanism into the Group Relative Policy Optimization (GRPO) framework, creating a novel approach termed HCM-GRPO. This enhanced methodology exhibits significantly stronger physical plausibility reasoning capabilities than the standard GRPO. Our experimental findings highlight that even state-of-the-art closed-source MLLMs, including GPT5.2 and Gemini3-Pro, struggle with physical plausibility reasoning. In contrast, by utilizing HCM-GRPO, our compact model achieves scores that exceed those of both large-scale open-source models and top-tier closed-source alternatives.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

23andMe Is Back as Nonprofit Aiming to Reach 100 Million Users
Bloomberg

23andMe Is Back as Nonprofit Aiming to Reach 100 Million Users

23andMe has transitioned into a nonprofit, aiming to onboard 100 million users to democratize genetic access and advance...

Trump Officials Held Millions of Dollars of SpaceX Ahead of IPO
Bloomberg

Trump Officials Held Millions of Dollars of SpaceX Ahead of IPO

Reports indicate Trump administration officials withheld millions in SpaceX payments ahead of its IPO. The delay occurre...

AI Jitters Fuel Biggest Swings in India’s IT Stocks Since 2020
Bloomberg

AI Jitters Fuel Biggest Swings in India’s IT Stocks Since 2020

AI uncertainty is driving the largest volatility in Indian IT stocks since 2020, causing significant market swings.

SpaceX IPO Terms Due & Trump's New Tariffs | The Pulse 6/3/2026
Bloomberg

SpaceX IPO Terms Due & Trump's New Tariffs | The Pulse 6/3/2026

Spacecraft giant SpaceX nears finalizing its IPO structure, while former President Trump announces new tariffs, reshapin...

News Publishers Weigh Whether AI is Industry Killer or Savior
Bloomberg

News Publishers Weigh Whether AI is Industry Killer or Savior

NYT shares fell after missing financial forecasts, following a tech staff strike. This occurs amid industry debates on A...

Reuters

When IPOs go wrong: SpaceX, AI firms face a delicate process

SpaceX and AI firms face a delicate IPO process amid complex markets. Their transition to public offerings is fraught wi...