GPU의 효율적인 자원 활용을 위한 동시 멀티태스킹 성능 분석

김세진; 진계신; 염헌영; 김윤희

doi:10.5626/JOK.2021.48.6.604

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

GPU의 효율적인 자원 활용을 위한 동시 멀티태스킹 성능 분석Performance Analysis of Concurrent Multitasking for Efficient Resource Utilization of GPUs

Other Titles: Performance Analysis of Concurrent Multitasking for Efficient Resource Utilization of GPUs

Authors: 김세진; 진계신; 염헌영; 김윤희

Issue Date: Jun-2021

Publisher: 한국정보과학회

Keywords: GPU; 멀티태스킹; 응용 분류; 스케줄링; smCompactor; GPU; multitasking; application classification; scheduling; smCompactor

Citation: 정보과학회논문지, v.48, no.6, pp 604 - 611

Pages: 8

Journal Title: 정보과학회논문지

Volume: 48

Number: 6

Start Page: 604

End Page: 611

URI: https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/146613

DOI: 10.5626/JOK.2021.48.6.604

ISSN: 2383-630X
2383-6296

Abstract: 계산 집약적인 응용을 가속화하기 위해 GPU(Graphics Processing Unit)가 널리 사용됨에 따라 데이터 센터 및 클라우드에서 GPU는 점점 더 많이 활용되고 있다. 여러 응용들의 동시 실행 요청이 있을 때 GPU 자원을 효율적으로 공유하도록 하는 연구는 아직 충분하지 않다. 또한, GPU 내의 자원을 효과적으로 공유하는 것은 응용의 자원 사용 패턴을 인지하지 않고서는 어렵다. 본 논문은 응용의 실행 패턴에 기반한 응용 분류법을 제시하고 자원 할당량 증가에도 성능이 향상되지 않는 이유를 런타임 특성에 따라 설명한다. 또한, 스레드 블록 기반 스케줄링 프레임워크인 smCompactor를 사용하여 분류된 응용을 기반으로 응용 조합의 동시 멀티태스킹 특성을 분석한다. 이를 통해 자원의 효율적인 활용이 가능한 응용의 조합을 파악한다. 응용 실행 특성을 고려하여 GPU상 멀티태스킹 실험을 진행한 결과, 기존 동시 실행 방법인 NVIDIA의 MPS와 비교하여 평균 28% 이상의 성능 향상을 보였다.
As Graphics Processing Units (GPUs) are widely utilized to accelerate compute-intensive applications, their application has expanded especially in data centers and clouds. However, the existing resource sharing methods within GPU are limited and cannot efficiently handle several requests of concurrent cloud users’ executions on GPU while effectively utilizing the available system resources. In addition, it is challenging to effectively partition resources within GPU without under-standing and assimilating application execution patterns. This paper proposes an execution pattern- based application classification method and analyzes run-time characteristics: why the performance of an application is saturated at a point regardless of the allocated resources. In addition, we analyze the multitasking performance of the co-allocated applications using smCompactor, a thread block-based scheduling framework. We identify near-best co-allocated application sets, which effectively utilize the available system resources. Based on our results, there was a performance improvement of approxi-mately 28% compared to NVIDIA MPS.

Files in This Item: Go to Link

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kim, Yoonhee photo

Kim, Yoonhee: 공과대학 (소프트웨어학부(첨단))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,546,144; Today View :5,439

RSS_1.0 RSS_2.0 ATOM_1.0

Sookmyung Women's University. Cheongpa-ro 47-gil 100 (Cheongpa-dong 2ga), Yongsan-gu, Seoul, 04310, Korea02-710-9127

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE