Diagnosis of Human Object Interaction Detectors for Real World Educational Applications
Title: Diagnosing Human-Object Interaction Detectors for Practical Educational Deployment
Abstract
Accurately recognizing human-object interactions (HOI) is essential for the automated analysis of student conduct within intricate educational settings. While state-of-the-art (SOTA) HOI detectors demonstrate strong results on standard benchmark datasets, their effectiveness frequently diminishes when applied to actual training environments. This performance drop is largely attributed to domain-specific objects, visual occlusions, and complex lighting or visual conditions. To address these challenges, this study presents a diagnosis-driven framework that combines a triplet-level HOI error taxonomy with error-factor attribution analysis, specifically tailored for real-world educational video data.
We investigate this issue within the context of Critical Care Air Transport Team (CCATT) mixed-reality medical training. By examining specific HOI failure modes and their underlying causes, we formulate a diagnosis-informed refinement strategy designed to adapt pretrained HOI models to the target domain. Experimental evaluations conducted on the CCATT dataset reveal that this method significantly enhances the macro-F1 score of a pretrained CDN model, raising it from 48.6 to 90.2 through targeted refinement driven by diagnosed error factors. These findings underscore the importance of comprehensive diagnostic analysis in guiding the precise adaptation of HOI models for practical educational applications.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





