arXiv

\textsc{CR-Seg}: Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation

Title: \textsc{CR-Seg}: Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation

Original: arXiv:2606.03564v1 Announce Type: cross Abstract: Reasoning segmentation aims to segment target objects described by complex language through joint visual-textual reasoning. Existing methods typically rely on either learned semantic tokens to bridge Multimodal Large Language Models (MLLMs) and segmentation models, suffering from difficult cross-modal alignment, or explicit spatial prompts such as bounding boxes, which may lose holistic response semantics. To address these limitations, we propose Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation, termed CR-Seg, a two-stage framework for coarse-to-refined reasoning segmentation. Specifically, we design an Extract Attention Maps and Points (EAP) module to extract attention maps for coarse target localization and select informative points, both of which are fed into SAM for mask refinement. To alleviate reasoning--answer inconsistency, we further introduce Global-to-Local Chain-of-Thought (GLCoT), which guides the model to reason progressively from global scene context to local target details. Extensive experiments on reasoning segmentation benchmarks demonstrate the effectiveness of CR-Seg.

Rewritten: Title: \textsc{CR-Seg}: Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation

Abstract: Reasoning segmentation seeks to isolate target objects defined by intricate linguistic descriptions via integrated visual-textual analysis. Current approaches generally depend on either learned semantic tokens to connect Multimodal Large Language Models (MLLMs) with segmentation architectures, a process often hampered by challenging cross-modal alignment, or explicit spatial cues like bounding boxes, which risk discarding holistic semantic context. To overcome these shortcomings, we introduce \textsc{CR-Seg} (Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation), a novel two-stage framework designed for coarse-to-fine reasoning segmentation. Central to our approach is the Extract Attention Maps and Points (EAP) module, which generates attention maps for initial target localization and identifies key informative points; these elements are subsequently utilized by SAM to refine segmentation masks. Additionally, to mitigate inconsistencies between reasoning processes and final answers, we incorporate a Global-to-Local Chain-of-Thought (GLCoT) mechanism. This component steers the model toward a progressive reasoning trajectory, moving from broad scene understanding to specific local target attributes. Comprehensive evaluations on reasoning segmentation benchmarks confirm the efficacy of the proposed \textsc{CR-Seg} method.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...