Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Interference-aware execution framework with Co-scheML on GPU clusters

Authors
Kim, SejinKim, Yoonhee
Issue Date
Oct-2023
Publisher
SPRINGER
Keywords
GPU applications; Interference; Co-execution; Co-ScheML scheduler; Resource contention; GPU utilization
Citation
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, v.26, no.5, pp 2577 - 2589
Pages
13
Journal Title
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
Volume
26
Number
5
Start Page
2577
End Page
2589
URI
https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/146180
DOI
10.1007/s10586-021-03299-z
ISSN
1386-7857
1573-7543
Abstract
Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers.
Files in This Item
Go to Link
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Yoonhee photo

Kim, Yoonhee
공과대학 (소프트웨어학부(첨단))
Read more

Altmetrics

Total Views & Downloads

BROWSE