Technical Program

Paper Detail

Paper IDD-3-2.7
Paper Title Efficient Human-In-The-Loop Object Detection using Bi-Directional Deep SORT and Annotation-Free Segment Identification
Authors Koki Madono, Waseda University, Japan; Teppei Nakano, Waseda University, Intelligent Framework Lab, Japan; Tetsunori Kobayashi, Tetsuji Ogawa, Waseda University, Japan
Session D-3-2: Multimedia Analysis and Others
TimeThursday, 10 December, 15:30 - 17:15
Presentation Time:Thursday, 10 December, 17:00 - 17:15 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Image, Video, and Multimedia (IVM):
Abstract The present study proposes a method for detecting objects with a high recall rate for human-supported video annotation. In recent years, automatic annotation techniques such as object detection and tracking have become more powerful; however, detection and tracking of occluded objects, small objects, and blurred objects are still difficult. In order to annotate such objects, manual annotation is inevitably required. For this reason, we envision a human-supported video annotation framework in which over-detected objects (i.e., false positives) are allowed to minimize oversight (i.e., false negatives) in automatic annotation and then the over-detected objects are removed manually. This study attempts to achieve human-in-the-loop object detection with an emphasis on suppressing the oversight for the former stage of processing in the aforementioned annotation framework: bi-directional deep SORT is proposed to reliably capture missed objects and annotation-free segment identification (AFSID) is proposed to identify video frames in which manual annotation is not required. These methods are reinforced each other, yielding an increase in the detection rate while reducing the burden of human intervention. Experimental comparisons using a pedestrian video dataset demonstrated that bi-directional deep SORT with AFSID was successful in capturing object candidates with a higher recall rate over the existing deep SORT while reducing the cost of manpower compared to manual annotation at regular intervals.