arXiv

Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection

June 2, 2026 · Atmika Bhardwaj, Silvia Vock, Nico Steckhan · Original Source

Title: Iterative Training and Re-evaluation: Assessing Generative Data for Hand Detection Through a Schedule-Aware Lens

Abstract:

The use of generated or synthetic imagery is becoming a common strategy to supplement or substitute real-world training data, particularly when authentic samples are limited, costly, or suffer from bias. In the specific context of hand detection—especially within occupational safety environments—existing public datasets predominantly feature bare hands. This lack of diversity fails to capture the visual variations caused by gloves, tattoos, jewelry, and other personal protective equipment (PPE), leading to a distribution shift that impacts performance in real-world, safety-critical deployments. This study investigates whether generative inpainting techniques, which modify only the hand region of existing photographs to add accessories, can mitigate this discrepancy.

We utilized a paired dataset consisting of authentic images and their synthetic modifications to train YOLOv8n hand detectors across six distinct training and scheduling configurations (Experiments A–F), utilizing three random seeds for each. Each model was assessed on a standard real-world test set as well as a specialized test split featuring real images with gloves. Performance metrics included mean average precision (mAP) at overlap thresholds of 0.5 and 0.5:0.95, accompanied by paired statistical analyses.

The results indicate that the efficacy of synthetic data for safety-critical hand detection is heavily dependent on the training methodology. A two-stage approach—initially training on a combination of real and synthetic data, followed by fine-tuning the resulting weights on real-only data at a reduced learning rate—yielded higher mAP@0.5 scores on the standard test set compared to a real-only baseline, while also narrowing the performance gap on out-of-distribution glove scenarios. Furthermore, a three-stage experimental configuration achieved the best preservation of bounding box tightness, securing the highest mAP@0.5:0.95 score among all trials. Ultimately, these findings suggest that simple multi-stage training protocols can significantly enhance the utility of inpainted accessory data for real-world deployment.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC