PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, cilt.91, sa.5, ss.339-364, 2023 (SCI-Expanded)
Detecting objects in Wide Area Motion Imagery (WAMI), an essential task for many practical applications, is particularly challenging in crowded scenes, such as areas with heavy traffic, since pixel resolutions of objects and ground sampling distance are highly compromised, and different factors disrupt visual signals. To address this challenge, we design a framework that combines preprocessing operations and deep detectors. To train deep networks for detection in WAMI for improved performance in especially crowded areas, we propose a novel crowd-aware thresholded loss (CATLoss) function. Moreover, we introduce a hard sampling mining method to strengthen the discriminative ability of the proposed solution. Additionally, we extend prior networks used in the literature using novel spatio-temporal cascaded architectures to incorporate more contextual information without introducing additional parameters. Overall, our approach is causal, more generalizable, and more robust even in reduced spatial sizes. On the WPAFB-2009 dataset, we show that our solution performs better than or on par with state-of-the-art without introducing any computational complexity during inference. The code and trained models will be released at (https://github.com/poyrazhatipoglu/CATLoss).