상세 보기
- Choi, Jake;
- Yeom, Heon Young;
- Kim, Yoonhee
WEB OF SCIENCE
2SCOPUS
3초록
Popular deep learning frameworks like PyTorch utilize GPUs heavily for training, and suffer from out-of-memory (OOM) problems if memory is not managed properly. CUDA Unified Memory (UM) allows the oversubscription of tensor objects in the GPU, but suffers from heavy performance penalties. In this paper, we build upon our UM implementation and create and utilize a minimal overhead CUPTI dynamic profiler to trace unified memory page fault and memory transfer statistics in PyTorch applications. We also implement CUDA memory prefetch and advise API which can be called directly from the PyTorch application based on the dynamically profiled statistics to improve oversubscription performance in various PyTorch models including Resnet and BERT.
키워드
- 제목
- Improving Oversubscribed GPU Memory Performance in the PyTorch Framework
- 저자
- Choi, Jake; Yeom, Heon Young; Kim, Yoonhee
- 발행일
- 2023-10
- 유형
- Article; Early Access
- 권
- 26
- 호
- 5
- 페이지
- 2835 ~ 2850