Skip-Gram-KR: Korean Word Embedding for Semantic Clustering

Ihm, Sun-Young; Lee, Ji-Hye; Park, Young-Ho

doi:10.1109/ACCESS.2019.2905252

상세 보기

Skip-Gram-KR: Korean Word Embedding for Semantic Clustering

Ihm, Sun-Young;
Lee, Ji-Hye;
Park, Young-Ho

Citations

WEB OF SCIENCE

5

Citations

SCOPUS

10

초록

Deep learning algorithms are used in various applications for pattern recognition, natural language processing, speech recognition, and so on. Recently, neural network-based natural language processing techniques use fixed length word embedding. Word embedding is a method of digitizing a word at a specific position into a low-dimensional dense vector with fixed length while preserving the similarity of the distribution of its surrounding words. Currently, the word embedding methods for foreign language are used for Korean words; however, existing word embedding methods are developed for English originally, so they do not reflect the order and structure of the Korean words. In this paper, we propose a word embedding method for Korean, which is called Skip-gram-KR, and a Korean affix tokenizer. Skip-gram-KR creates similar word training data through backward mapping and the two-word skipping method. The experiment results show the proposed method achieved the most accurate performance.

키워드

Word embedding; natural language processing; Korean word embedding; text mining; deep learning; semantic clustering; machine learning

제목: Skip-Gram-KR: Korean Word Embedding for Semantic Clustering

저자: Ihm, Sun-Young; Lee, Ji-Hye; Park, Young-Ho

DOI: 10.1109/ACCESS.2019.2905252

발행일: 2019-04

유형: Article

저널명: IEEE Access

권: 7

페이지: 39948 ~ 39961

ScholarWorks@숙명여자대학교

상세 보기

초록

키워드