arXiv

Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts

Title: Achieving Flexible Control in 3D CT Generation Through Text and Semantic Segmentation Prompts

Abstract: Volumetric medical image generative models have established significant utility across various medical imaging tasks, including serving as priors for inverse problems and facilitating data augmentation. However, producing high-resolution 3D images with robust controllability remains a formidable challenge for these applications. Current methodologies generally rely on either text prompts derived from radiology reports or full-image segmentation maps for control. While text-based conditioning offers flexibility, it lacks precise spatial definition regarding the location, morphology, and boundaries of anomalies. Conversely, segmentation-driven approaches provide accurate spatial guidance but are constrained by the necessity for comprehensive organ annotations.

To address these limitations, we introduce a versatile multimodal framework for controllable volumetric image generation that accommodates both radiology reports and segmentation prompts, with either being optional. This system enables users to supply segmentation data for specific anatomical structures or pathologies without the need for complete organ-level annotations. The semantic context of each segmentation mask is clarified via an associated textual description, creating a highly adaptable and scalable conditioning strategy. Our architecture, built upon a modified diffusion transformer, is designed to be memory-efficient and simultaneously processes tokens for both images and segmentation data. Additionally, the model employs gated attention mechanisms to effectively manage long radiology reports.

Experimental results indicate that our approach delivers state-of-the-art perceptual and semantic performance, notably achieving a 24% relative improvement in mean FID. The model successfully generates high-resolution CT volumes that maintain anatomical consistency and enhances data efficiency when applied to data augmentation tasks. Furthermore, evaluations conducted by radiologists validate the strong alignment between the generated images and authentic medical scans.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...