CNN-based Fast Split Mode Decision Algorithm for Versatile Video Coding (VVC) Inter Prediction
- Authors
- Woon-Ha Yeo; Byung-Gyu Kim
- Issue Date
- Sep-2021
- Publisher
- 한국멀티미디어학회
- Keywords
- Versatile Video Coding (VVC); Inter Prediction; Fast algorithm; Convolutional Neural Network (CNN); Deep learning.
- Citation
- Journal of Multimedia Information System, v.8, no.3, pp 147 - 158
- Pages
- 12
- Journal Title
- Journal of Multimedia Information System
- Volume
- 8
- Number
- 3
- Start Page
- 147
- End Page
- 158
- URI
- https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/146156
- DOI
- 10.33851/JMIS.2021.8.3.147
- ISSN
- 2383-7632
- Abstract
- Versatile Video Coding (VVC) is the latest video coding standard developed by Joint Video Exploration Team (JVET). In VVC, the quadtree plus multi-type tree (QT+MTT) structure of coding unit (CU) partition is adopted, and its computational complexity is considerably high due to the brute-force search for recursive rate-distortion (RD) optimization. In this paper, we aim to reduce the time complexity of inter-picture prediction mode since the inter prediction accounts for a large portion of the total encoding time. The problem can be defined as classifying the split mode of each CU. To classify the split mode effectively, a novel convolutional neural network (CNN) called multi-level tree (MLT-CNN) architecture is introduced. For boosting classification performance, we utilize additional information including inter-picture information while training the CNN. The overall algorithm including the MLT-CNN inference process is implemented on VVC Test Model (VTM) 11.0. The CUs of size 128\times128 can be the inputs of the CNN. The sequences are encoded at the random access (RA) configuration with five QP values {22, 27, 32, 37, 42}. The experimental results show that the proposed algorithm can reduce the computational complexity by 11.53% on average, and 26.14% for the maximum with an average 1.01% of the increase in Bjøntegaard delta bit rate (BDBR). Especially, the proposed method shows higher performance on the sequences of the A and B classes, reducing 9.81%~26.14% of encoding time with 0.95%~3.28% of the BDBR increase.
- Files in This Item
-
Go to Link
- Appears in
Collections - ETC > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.