Deep Transformer Based Video Inpainting Using Fast Fourier Tokenization
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Taewan | - |
dc.contributor.author | Kim, Jinwoo | - |
dc.contributor.author | Oh, Heeseok | - |
dc.contributor.author | Kang, Jiwoo | - |
dc.date.accessioned | 2024-04-09T02:30:31Z | - |
dc.date.available | 2024-04-09T02:30:31Z | - |
dc.date.issued | 2024-02 | - |
dc.identifier.issn | 2169-3536 | - |
dc.identifier.uri | https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/159847 | - |
dc.description.abstract | Bridging distant space-time interactions is important for high-quality video inpainting with large moving masks. Most existing technologies exploit patch similarities within the frames, or leaverage large-scale training data to fill the hole along spatial and temporal dimensions. Recent works introduce promissing Transformer architecture into deep video inpainting to escape from the dominanace of nearby interactions and achieve superior performance than their baselines. However, such methods still struggle to complete larger holes containing complicated scenes. To alleviate this issue, we first employ a fast Fourier convolutions, which cover the frame-wide receptive field, for token representation. Then, the token passes through the seperated spatio-temporal transformer to explicitly moel the long-range context relations and simultaneously complete the missing regions in all input frames. By formulating video inpainting as a directionless sequence-to-sequence prediction task, our model fills visually consistent content, even under conditions such as large missing areas or complex geometries. Furthermore, our spatio-temporal transformer iteratively fills the hole from the boundary enabling it to exploit rich contextual information. We validate the superiority of the proposed model by using standard stationary masks and more realistic moving object masks. Both qualitative and quantitative results show that our model compares favorably against the state-of-the-art algorithms. | - |
dc.format.extent | 14 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | Deep Transformer Based Video Inpainting Using Fast Fourier Tokenization | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1109/ACCESS.2024.3361283 | - |
dc.identifier.scopusid | 2-s2.0-85184326166 | - |
dc.identifier.wosid | 001163607000001 | - |
dc.identifier.bibliographicCitation | IEEE ACCESS, v.12, pp 21723 - 21736 | - |
dc.citation.title | IEEE ACCESS | - |
dc.citation.volume | 12 | - |
dc.citation.startPage | 21723 | - |
dc.citation.endPage | 21736 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | Y | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.subject.keywordAuthor | video completion | - |
dc.subject.keywordAuthor | free-form inpainting | - |
dc.subject.keywordAuthor | object removal | - |
dc.subject.keywordAuthor | adversarial learning | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/10418237/ | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
Sookmyung Women's University. Cheongpa-ro 47-gil 100 (Cheongpa-dong 2ga), Yongsan-gu, Seoul, 04310, Korea02-710-9127
Copyright©Sookmyung Women's University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.