Interference-aware execution framework with Co-scheML on GPU clusters
- Authors
- Kim, Sejin; Kim, Yoonhee
- Issue Date
- Oct-2023
- Publisher
- SPRINGER
- Keywords
- GPU applications; Interference; Co-execution; Co-ScheML scheduler; Resource contention; GPU utilization
- Citation
- CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, v.26, no.5, pp 2577 - 2589
- Pages
- 13
- Journal Title
- CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
- Volume
- 26
- Number
- 5
- Start Page
- 2577
- End Page
- 2589
- URI
- https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/146180
- DOI
- 10.1007/s10586-021-03299-z
- ISSN
- 1386-7857
1573-7543
- Abstract
- Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers.
- Files in This Item
-
Go to Link
- Appears in
Collections - ETC > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.