Training-Free Object-Agnostic Jam Detection in Fulfillment Centers
Title: Eliminating Annotation Effort: A Training-Free Method for Detecting Object Jams in Fulfillment Centers
Abstract:
In modern fulfillment centers, a continuous flow of varied items transitions from receiving to shipping processes. These items can frequently become stuck due to issues such as excessive friction on conveyors, improper positioning, or mechanical malfunctions. Conventional systems for identifying these jams typically employ a two-stage process: first, object detection models identify specific items, and second, tracking algorithms—such as Kalman filters or Intersection over Union (IoU) overlap metrics—monitor their movement over time. This traditional pipeline is resource-intensive, requiring thousands of manual annotations and roughly two weeks of labor, while also being restricted to detecting only those object classes that have been explicitly annotated.
To address these limitations, we introduce a novel, training-free jam detection technique that operates without the need for labeled datasets. Our method, named AllTracker, functions by uniformly distributing reference points across the monitored area during periods when no objects are visible. As items move through the scene, they occlude these points, allowing the system to register motion. If a significant portion of these reference points remains occluded for longer than a specified temporal threshold, the system identifies the situation as a jam.
Unlike standard point-tracking techniques, which view occlusion as a system failure, our approach leverages persistent occlusion as a primary detection signal. Instead of attempting to track the trajectory of points, we monitor whether they remain hidden. We validated this method using a dataset of 1,069 videos, where AllTracker achieved a perfect 100.00% precision and an F1 score of 93.33%. These results significantly surpass classical sparse tracking methods, all while maintaining the ability to deploy without any training phase. The proposed solution provides three distinct benefits: it requires no training data or manual labeling, it generalizes effectively to any type of object, and it drastically cuts down development time.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





