Skip-Gram-KR: Korean Word Embedding for Semantic Clusteringopen access
- Authors
- Ihm, Sun-Young; Lee, Ji-Hye; Park, Young-Ho
- Issue Date
- Mar-2019
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Keywords
- Word embedding; natural language processing; Korean word embedding; text mining; deep learning; semantic clustering; machine learning
- Citation
- IEEE ACCESS, v.7, pp 39948 - 39961
- Pages
- 14
- Journal Title
- IEEE ACCESS
- Volume
- 7
- Start Page
- 39948
- End Page
- 39961
- URI
- https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/3739
- DOI
- 10.1109/ACCESS.2019.2905252
- ISSN
- 2169-3536
- Abstract
- Deep learning algorithms are used in various applications for pattern recognition, natural language processing, speech recognition, and so on. Recently, neural network-based natural language processing techniques use fixed length word embedding. Word embedding is a method of digitizing a word at a specific position into a low-dimensional dense vector with fixed length while preserving the similarity of the distribution of its surrounding words. Currently, the word embedding methods for foreign language are used for Korean words; however, existing word embedding methods are developed for English originally, so they do not reflect the order and structure of the Korean words. In this paper, we propose a word embedding method for Korean, which is called Skip-gram-KR, and a Korean affix tokenizer. Skip-gram-KR creates similar word training data through backward mapping and the two-word skipping method. The experiment results show the proposed method achieved the most accurate performance.
- Files in This Item
-
Go to Link
- Appears in
Collections - ICT융합공학부 > IT공학전공 > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.