PARTITIONING AND REORDERING FOR SPIKE-BASED DISTRIBUTED-MEMORY PARALLEL GAUSS-SEIDEL

Torun, Tugba; TORUN, FAHREDDİN; Mangulu, MURAT; Aykanat, Cevdet

doi:10.1137/21m1411603

PARTITIONING AND REORDERING FOR SPIKE-BASED DISTRIBUTED-MEMORY PARALLEL GAUSS-SEIDEL

Torun T., TORUN F. Ş., Mangulu M., Aykanat C.

SIAM JOURNAL ON SCIENTIFIC COMPUTING, cilt.44, sa.2, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 44 Sayı: 2
Basım Tarihi: 2022
Doi Numarası: 10.1137/21m1411603
Dergi Adı: SIAM JOURNAL ON SCIENTIFIC COMPUTING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Agricultural & Environmental Science Database, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, MathSciNet, zbMATH, Civil Engineering Abstracts
Anahtar Kelimeler: parallel Gauss-Seidel, distributed-memory, Spike algorithm, parallel sparse trian-gular solve, hypergraph partitioning, sparse matrix reordering, ALGEBRAIC MULTIGRID SOLVER, PRECONDITIONER, DECOMPOSITION, CONVERGENCE, SYSTEMS, MODELS
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Gauss-Seidel (GS) is a widely used iterative method for solving sparse linear sys-tems of equations and also known to be effective as a smoother in algebraic multigrid methods. Parallelization of GS is a challenging task since solving the sparse lower triangular system in GS constitutes a sequential bottleneck at each iteration. We propose a distributed-memory parallel GS (dmpGS) by implementing a parallel sparse triangular solver (stSpike) based on the Spike algorithm. stSpike decouples the global triangular system into smaller systems that can be solved concurrently and requires the solution of a much smaller reduced sparse lower triangular system which constitutes a sequential bottleneck. In order to alleviate this bottleneck and to reduce the communication over-head of dmpGS, we propose a partitioning and reordering model consisting of two phases. The first phase is a novel hypergraph partitioning model whose partitioning objective simultaneously encodes minimizing the reduced system size and the communication volume. The second phase is an in-block row reordering method for decreasing the nonzero count of the reduced system. Extensive experi-ments on a dataset consisting of 359 sparse linear systems verify the effectiveness of the proposed partitioning and reordering model in terms of reducing the communication and the sequential com-putational overheads. Parallel experiments on 12 large systems using up to 320 cores demonstrate that the proposed model significantly improves the scalability of dmpGS.