Edge Prediction for Roof Wireframe Reconstruction with Transformers
Title: Transformer-Based Edge Prediction for Reconstructing Roof Wireframes
Abstract:
This study introduces a high-performance solution for the S23DR Challenge 2026, a competition focused on generating 3D wireframe models of residential roofs. The reconstruction task relies on sparse Structure-from-Motion (SfM) point clouds alongside ground-level semantic segmentations and depth maps. Our methodology employs an end-to-end Transformer encoder-decoder framework, drawing inspiration from the DETR architecture.
To efficiently handle geometric and semantic inputs, we dynamically subsample the sparse SfM point cloud according to semantic importance. These points are then enriched with features derived from Gestalt and ADE20k classifications. To expand the segmentation context, point features are combined with additional Gestalt encodings. These encodings are generated by projecting the points into latent feature maps created by a frozen autoencoder. Finally, learned query embeddings are processed through cross-attention mechanisms to decode the 3D wireframe edges directly.
When tested on the "HoHo 22k" dataset, our method demonstrated superior performance compared to both handcrafted and learned baseline models. It achieved a Hybrid Structure Score (HSS) of 0.6476, earning the second-highest rank on the challenge’s private leaderboard.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





