Detailed Information

Cited 0 time in webofscience Cited 1 time in scopus
Metadata Downloads

Skip-Gram-KR: Korean Word Embedding for Semantic Clusteringopen access

Authors
Ihm, Sun-YoungLee, Ji-HyePark, Young-Ho
Issue Date
Mar-2019
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
Word embedding; natural language processing; Korean word embedding; text mining; deep learning; semantic clustering; machine learning
Citation
IEEE ACCESS, v.7, pp 39948 - 39961
Pages
14
Journal Title
IEEE ACCESS
Volume
7
Start Page
39948
End Page
39961
URI
https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/3739
DOI
10.1109/ACCESS.2019.2905252
ISSN
2169-3536
Abstract
Deep learning algorithms are used in various applications for pattern recognition, natural language processing, speech recognition, and so on. Recently, neural network-based natural language processing techniques use fixed length word embedding. Word embedding is a method of digitizing a word at a specific position into a low-dimensional dense vector with fixed length while preserving the similarity of the distribution of its surrounding words. Currently, the word embedding methods for foreign language are used for Korean words; however, existing word embedding methods are developed for English originally, so they do not reflect the order and structure of the Korean words. In this paper, we propose a word embedding method for Korean, which is called Skip-gram-KR, and a Korean affix tokenizer. Skip-gram-KR creates similar word training data through backward mapping and the two-word skipping method. The experiment results show the proposed method achieved the most accurate performance.
Files in This Item
Go to Link
Appears in
Collections
ICT융합공학부 > IT공학전공 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Park, Young Ho photo

Park, Young Ho
공과대학 (인공지능공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE