arXiv

VT-3DAD: Cross-Category 3D Anomaly Detection via Visual-Text Normal Space Alignment

Title: VT-3DAD: Aligning Visual-Text Normal Spaces for Cross-Category 3D Anomaly Detection

Abstract

The objective of few-shot cross-category 3D anomaly detection is to identify whether an unlabelled point cloud belongs to a specific normal class, utilizing merely a small set of normal reference samples. While traditional approaches depend on category-specific training, recent training-free techniques leveraging multi-view CLIP visual features often struggle with categories that share similar geometric structures, as they rely predominantly on visual likeness. To address these limitations, we introduce VT-3DAD, a novel training-free framework designed for cross-category 3D anomaly detection through Visual-Text Normal Space Alignment.

The VT-3DAD process begins by converting few-shot normal references and test point clouds into realistic multi-view depth maps, from which view-wise features are extracted via a frozen CLIP visual encoder. The visual component calculates the deviation between test samples and references within this multi-view feature space. Simultaneously, the framework employs a frozen CLIP text encoder to process depth-aware and 3D-aware prompts, creating textual normal anchors. These anchors establish semantic constraints for normality relative to the target category. The ultimate anomaly score is derived by combining the visual deviation observed against normal references with the semantic deviation measured against the textual normal space.

Evaluation on the ShapeNetPart dataset indicates that VT-3DAD delivers state-of-the-art results. Notably, when compared to a visual-only baseline, VT-3DAD boosts the one-shot average AUC-ROC from 92.49% to 94.80% and significantly lowers the average standard deviation from 5.64 to 3.41.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.