Document Type

Article

Publication Date

1-1-2013

DOI

http://dx.doi.org/10.1016/j.parco.2012.10.002

Abstract

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different GPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that GPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on GPU clusters with two GPUs per node, but on clusters with higher GPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further.

Copyright Statement

NOTICE: this is the author's version of a work that was accepted for publication in Parallel Computing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Parallel Computing, 2012. DOI: 10.1016/j.parco.2012.10.002

Publication Information

Jacobsen, Dana A. and Senocak, Inanc. (2013). "Multi-Level Parallelism for Incompressible Flow Computations on GPU Clusters". Parallel Computing, 39(1), 1-20.

Previous Versions

Jun 16 2017

Download

Find in your library

Included in

Biomedical Engineering and Bioengineering Commons, Mechanical Engineering Commons

COinS

ScholarWorks

Mechanical and Biomedical Engineering Faculty Publications and Presentations

Multi-Level Parallelism for Incompressible Flow Computations on GPU Clusters

Document Type

Publication Date

DOI

Abstract

Copyright Statement

Publication Information

Previous Versions

Included in

Browse

Links

Search

Author Corner

ScholarWorks

Mechanical and Biomedical Engineering Faculty Publications and Presentations

Multi-Level Parallelism for Incompressible Flow Computations on GPU Clusters

Authors

Document Type

Publication Date

DOI

Abstract

Copyright Statement

Publication Information

Previous Versions

Included in

Share

Browse

Links

Search

Author Corner