JOURNAL OF REAL-TIME IMAGE PROCESSING, vol.16, no.2, pp.339-353, 2019 (SCI-Expanded)
The recently proposed tracking-learning-detection (TLD) method has become a popular visual tracking algorithm as it was shown to provide promising long-term tracking results. On the other hand, the high computational cost of the algorithm prevents it being used at higher resolutions and frame rates. In this paper, we describe the design and implementation of a heterogeneous CPU-GPU TLD (H-TLD) solution using OpenMP and CUDA. Leveraging the advantages of the heterogeneous architecture, serial parts are run asynchronously on the CPU while the most computationally costly parts are parallelized and run on the GPU. Design of the solution ensures keeping data transfers between CPU and GPU at a minimum and applying stream compaction and overlapping data transfer with computation whenever such transfers are necessary. The workload is balanced for a uniform work distribution across the GPU multiprocessors. Results show that 10.25 times speed-up is achieved at 1920 x 1080 resolution compared to the baseline TLD. The source code has been made publicly available to download from the following address: http://gpuresearch.ii.metu.edu.tr/codes/.