Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

Choi, HyeonSeong; Kim, Youngrang; Lee, Jaehwan; Kim, Yoonhee

doi:10.3837/tiis.2021.03.006

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

Authors: Choi, HyeonSeong; Kim, Youngrang; Lee, Jaehwan; Kim, Yoonhee

Issue Date: Mar-2021

Publisher: KSII-KOR SOC INTERNET INFORMATION

Keywords: Docker; Collective Communication; Distributed Deep Leaning; Multi-GPU

Citation: KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, v.15, no.3, pp 911 - 931

Pages: 21

Journal Title: KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS

Volume: 15

Number: 3

Start Page: 911

End Page: 931

URI: https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/146186

DOI: 10.3837/tiis.2021.03.006

ISSN: 1976-7277
1976-7277

Abstract: Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies - allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA's collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.

Files in This Item: Go to Link

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kim, Yoonhee photo

Kim, Yoonhee: 공과대학 (소프트웨어학부(첨단))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,656,669; Today View :5,209

RSS_1.0 RSS_2.0 ATOM_1.0

Sookmyung Women's University. Cheongpa-ro 47-gil 100 (Cheongpa-dong 2ga), Yongsan-gu, Seoul, 04310, Korea02-710-9127

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE