arXiv

An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

Title: Open-Source Two-Stage Computer Vision Framework for High-Resolution Vehicle Classification via Vision Transformers

Abstract:

While vehicle body type is a critical factor in determining the severity of injuries sustained by cyclists during overtaking incidents, there is currently no open-source automated solution capable of categorizing vehicles into these injury-relevant classes using naturalistic road footage. Existing standard object detection datasets typically offer only broad vehicle labels, such as "car" or "bus," and current fine-grained recognition models are generally trained on controlled imagery, lacking validation for robust deployment across diverse recording environments. To address this gap, we introduce an open-source, two-stage computer vision pipeline. This system integrates a pre-trained RT-DETR detector for initial coarse vehicle localization with a fine-tuned Vision Transformer (ViT-Base/16) designed to classify vehicles into six specific body types: passenger cars, SUVs, pickup trucks, minivans, large vans, and commercial trucks.

To mitigate silent misclassifications, the pipeline employs a confidence-based abstention mechanism that assigns an "unknown" label when the softmax output drops below a threshold of 0.60. We evaluated this approach on 3,805 annotated overtaking events recorded along a bicycle-lane corridor in Ann Arbor, Michigan. In this in-distribution test, the system achieved an overall accuracy of 0.94, with per-class F1 scores ranging from 0.91 for minivans to 0.97 for SUVs.

Further testing on an independent, out-of-distribution dataset comprising 311 events from an open cycling repository—without any retraining—yielded an accuracy of 0.89. Under this domain shift, three of the four well-represented categories sustained F1 scores of at least 0.90. The most significant performance drop occurred in the minivan category, where the F1 score fell to 0.72. This decline was primarily driven by an increase in the abstention rate, which rose from 2.4% in the in-distribution setting to 25.0% in the out-of-distribution setting, rather than by active errors, thereby reflecting the model’s appropriate handling of genuine uncertainty. The complete pipeline, encompassing inference scripts, training code, evaluation utilities, and model weights, has been released as open-source software to facilitate reproducibility and support research in cycling safety and roadside video analysis.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.