Interference-aware execution framework with Co-scheML on GPU clusters

Kim, Sejin; Kim, Yoonhee

doi:10.1007/s10586-021-03299-z

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Interference-aware execution framework with Co-scheML on GPU clusters

Authors: Kim, Sejin; Kim, Yoonhee

Issue Date: Oct-2023

Publisher: SPRINGER

Keywords: GPU applications; Interference; Co-execution; Co-ScheML scheduler; Resource contention; GPU utilization

Citation: CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, v.26, no.5, pp 2577 - 2589

Pages: 13

Journal Title: CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

Volume: 26

Number: 5

Start Page: 2577

End Page: 2589

URI: https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/146180

DOI: 10.1007/s10586-021-03299-z

ISSN: 1386-7857
1573-7543

Abstract: Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers.

Files in This Item: Go to Link

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kim, Yoonhee photo

Kim, Yoonhee: 공과대학 (소프트웨어학부(첨단))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,550,007; Today View :685

RSS_1.0 RSS_2.0 ATOM_1.0

Sookmyung Women's University. Cheongpa-ro 47-gil 100 (Cheongpa-dong 2ga), Yongsan-gu, Seoul, 04310, Korea02-710-9127

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE