Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

Authors
Choi, HyeonSeongKim, YoungrangLee, JaehwanKim, Yoonhee
Issue Date
Mar-2021
Publisher
KSII-KOR SOC INTERNET INFORMATION
Keywords
Docker; Collective Communication; Distributed Deep Leaning; Multi-GPU
Citation
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, v.15, no.3, pp 911 - 931
Pages
21
Journal Title
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS
Volume
15
Number
3
Start Page
911
End Page
931
URI
https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/146186
DOI
10.3837/tiis.2021.03.006
ISSN
1976-7277
1976-7277
Abstract
Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies - allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA's collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.
Files in This Item
Go to Link
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Yoonhee photo

Kim, Yoonhee
공과대학 (소프트웨어학부(첨단))
Read more

Altmetrics

Total Views & Downloads

BROWSE