Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

Dulger, ÖZCAN; Oguztuzun, MEHMET; DEMİREKLER, MÜBECCEL

doi:10.1007/s11265-017-1254-6

Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, cilt.90, sa.3, ss.433-447, 2018 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 90 Sayı: 3
Basım Tarihi: 2018
Doi Numarası: 10.1007/s11265-017-1254-6
Dergi Adı: JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.433-447
Anahtar Kelimeler: Coalesced global memory access, Graphics processing unit, Metropolis resampling, Parallel resampling, Particle filter, PARTICLE FILTERS, ARCHITECTURES
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.