Overcoming GPU memory capacity limitations in hybrid MPI implementations of CFD
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

3

초록

In this paper, we describe a hybrid MPI implementation of a discontinuous Galerkin scheme in Computational Fluid Dynamics which can utilize all the available processing units (CPU cores or GPU devices) on each computational node. We describe the optimization techniques used in our GPU implementation making it up to 74.88x faster than the single core CPU implementation in our machine environment. We also perform experiments on work partitioning between heterogeneous devices to measure the ideal load balance achieving the optimal performance in a single node consisting of heterogeneous processing units. The key problem is that CFD workloads need to allocate large amounts of both host and GPU device memory in order to compute accurate results. There exists an economic burden, not to mention additional communication overheads of simply scaling out by adding more nodes with high-end scientific GPU devices. In a micro-management perspective, workload size in each single node is also limited by its attached GPU memory capacity. To overcome this, we use ZFP, a floating-point compression algorithm to save at least 25% of data usage in our workloads, with less performance degradation than using NVIDIA UM. © Springer Nature Switzerland AG 2019.

키워드

CFDCompressionCUDAGPUMemoryMPICompactionComputational fluid dynamicsData storage equipmentDigital arithmeticGalerkin methodsCommunication overheadsCUDADiscontinuous galerkin schemesFloating-point compressionsHeterogeneous devicesHeterogeneous processingOptimization techniquesPerformance degradationGraphics processing unit
제목
Overcoming GPU memory capacity limitations in hybrid MPI implementations of CFD
저자
Choi, JakeKim, YoonheeYeom, Heon-Young
DOI
10.1007/978-3-030-34914-1_10
발행일
2019-10
유형
Conference Paper
저널명
Lecture Notes in Computer Science
11874 LNCS
페이지
100 ~ 111