Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable Regimes
Title: The Role of Prior Knowledge in Industrial Visual Sim-to-Real: An Analysis of CAD-Available and CAD-Unavailable Scenarios
Abstract:
While industrial visual sim-to-real is frequently characterized simply as the transfer of models from synthetic to real imagery, actual industrial deployments often face a more complex discrepancy between the information available and the decisions required. Although systems may be trained using a variety of synthetic resources—such as CAD renderings, simulated RGB-D data, normal reference images, synthetic defects, pretrained feature spaces, or language prompts—they are frequently deployed in environments with differing sensors, lighting conditions, materials, fixtures, calibration standards, production variations, and rare defect modes.
This paper reframes industrial visual sim-to-real as a domain-gap challenge defined by the availability of prior information. We categorize these scenarios into three distinct regimes: CAD-available settings, where explicit object geometry facilitates rendering, calibration, pose estimation, segmentation, and geometric verification at test time; CAD-unavailable settings, where geometric data is substituted by normal-reference appearance, feature distributions, teacher-student residuals, synthetic anomaly assumptions, foundation features, or vision-language priors; and boundary-prior settings, where approximate models, templates, reference views, or semantic correspondences provide only partial support for the functions typically served by CAD.
This classification bridges the literature on CAD-based detection and 6D pose estimation with that on industrial anomaly detection and surface inspection, which are often treated as separate fields. To ground this taxonomy empirically, we utilize data from the T-LESS/BOP, MVTec AD, and VisA benchmarks. Our findings indicate that the sheer number of CAD renders is insufficient to bridge the transfer gap; instead, the design of the source distribution, the capacity of the detector, and minimal real-world calibration play more critical roles. Furthermore, the results demonstrate that utilizing CAD during inference establishes a unique verification channel based on consistency in masks, poses, and depth, whereas inspections in CAD-unavailable contexts depend on calibrated normality and feature deviation. Consequently, this review challenges the utility of a unified cross-task leaderboard, urging the field to focus instead on identifying which prior knowledge supports specific deployment decisions.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





