Comparing unified, pinned, and host/device memory allocations for memory-intensive workloads on Tegra SoC
  • Choi, Jake
  • You, Hojun
  • Kim, Chongam
  • Yeom, Heon Young
  • Kim, Yoonhee
Citations

WEB OF SCIENCE

11
Citations

SCOPUS

11

초록

Edge computing focuses on processing near the source of the data. Edge computing devices using the Tegra SoC architecture provide a physically distinct GPU memory architecture. In order to take advantage of this architecture, different modes of memory allocation need to be considered. Different GPU memory allocation techniques yield different results in memory usage and execution times of identical applications on Tegra devices. In this article, we implement several GPU application benchmarks, including our custom CFD code with unified, pinned, and normal host/device memory allocation modes. We evaluate and compare the memory usage and execution time of such workloads on edge computing Tegra system-on-chips (SoC) equipped with integrated GPUs using a shared memory architecture, and non-SoC machines with discrete GPUs equipped with distinct VRAM. We discover that utilizing normal memory allocation methods on SoCs actually use double the required memory because of unnecessary device memory copies, despite being physically shared with host memory. We show that GPU application memory usage can be reduced up to 50%, and that even performance improvements can occur just by replacing normal memory allocation and memory copy methods with managed unified memory or pinned memory allocation.

키워드

benchmarkCFDCUDAGPUmemorypinnedRodiniaunified
제목
Comparing unified, pinned, and host/device memory allocations for memory-intensive workloads on Tegra SoC
저자
Choi, JakeYou, HojunKim, ChongamYeom, Heon YoungKim, Yoonhee
DOI
10.1002/cpe.6018
발행일
2021-02
저널명
Concurrency Computation Practice and Experience
33
4
페이지
1 ~ 10