Co-scheML: Interference-aware Container Co-scheduling Scheme Using Machine Learning Application Profiles for GPU Clusters
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

13

초록

Recently, efficient execution of applications on Graphic Processing Unit(GPU) has emerged as a research topic to increase overall system throughput in cluster environment. As a current cluster orchestration platform using GPUs only supports an exclusive execution of an application on a GPU, the platform may not utilize resource of GPUs fully relying on application characteristics. Nonetheless, co-execution of GPU applications leads to interference coming from resource contention among applications. If diverse resource usage characteristics of GPU applications are not deliberated, unbalanced usage of computing resources and performance degradation could be induced in a GPU cluster. This study introduces Co-scheML for co-execution of various GPU applications such as High Performance Computing (HPC), Deep Learning (DL) Training, and DL Inference. Interference model is constructed by applying Machine Learning (ML) model with GPU metrics since predicting interference has a difficulty. Predicted interference is utilized and deployment of an application is determined by Co-scheML scheduler. Experimental results of the Co-ScheML strategy show that average job completion time is improved by 23%, and the makespan is shortened by 22% in average, as compared to baseline schedulers. © 2020 IEEE.

키워드

co-executionCo-scheMLGPU applicationsGPU utilizationinterferenceresource contentionCluster computingComputer aided instructionDeep learningLearning systemsProgram processorsSchedulingCluster environmentsComputing resourceGraphic processing unit(GPU)High performance computing (HPC)Interference modelingMachine learning applicationsPerformance degradationResource contentionGraphics processing unit
제목
Co-scheML: Interference-aware Container Co-scheduling Scheme Using Machine Learning Application Profiles for GPU Clusters
저자
Kim, SejinKim, Yoonhee
DOI
10.1109/CLUSTER49012.2020.00020
발행일
2020-11
유형
Conference Paper
저널명
Proceedings - IEEE International Conference on Cluster Computing, ICCC
페이지
104 ~ 108