arXiv

High-Quality Entity Segmentation and Grounding

Title: High-Quality Entity Segmentation and Grounding

Abstract:

This study introduces ESG, a novel pipeline designed for robust entity segmentation and grounding, backed by the introduction of a new dataset named EntitySeg. The EntitySeg dataset features images from diverse domains and entity types, providing a rich collection of high-resolution images paired with precise mask annotations to support both training and evaluation phases.

The ESG architecture is composed of two distinct modules: CropFormer, which specializes in high-quality entity segmentation, and GELLA, which facilitates accurate noun extraction from text and performs semantic matching between linguistic inputs and visual regions. Departing from conventional methods that rely on joint training of segmentation networks and large language models, ESG employs a two-stage decoupled framework. This design strategy maintains the integrity of high-quality masks and ensures grounding robustness, effectively avoiding the compromises typically associated with joint training approaches. By first utilizing CropFormer to generate superior entity segmentation results, these outputs are subsequently encoded into the GELLA model to enable effective grounding.

Comprehensive experiments validate the efficacy of the proposed pipeline across five distinct tasks: entity segmentation, panoptic segmentation, open-vocabulary segmentation, referring segmentation, and panoptic localized narratives. Additionally, the GELLA module within the ESG pipeline exhibits significant flexibility, allowing it to process mask inputs from any segmentation framework. This versatility is attributed to its lightweight colormap and vision encoder, combined with a language/mask decoder and an association module. The code for the entity segmentation dataset and grounding implementation will be made publicly available at https://github.com/qqlu/Entity.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...