불균형 자료에서 불순도 지수를 활용한 분류 임계값 선택
Selecting the optimal threshold based on impurity index in imbalanced classification
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

In this paper, we propose the method of adjusting thresholds using impurity indices in classification analysis on imbalanced data. Suppose the minority category is Positive and the majority category is Negative for the imbalanced binomial data. When categories are determined based on the commonly used 0.5 basis, the specificity tends to be high in unbalanced data while the sensitivity is relatively low. Increasing sensitivity is important when proper classification of objects in minority categories is relatively important. We explore how to increase sensitivity through adjusting thresholds. Existing studies have adjusted thresholds based on measures such as G-Mean and F1-score, but in this paper, we propose a method to select optimal thresholds using the chi-square statistic of CHAID, the Gini index of CART, and the entropy of C4.5. We also introduce how to get a possible unique value when multiple optimal thresholds are obtained. Empirical analysis shows what improvements have been made compared to the results based on 0.5 through classification performance metrics.

키워드

imbalanced databinomial classificationthreshold movingimpurity index불균형 자료이항 분류분류 기준점 조정불순도 지수
제목
불균형 자료에서 불순도 지수를 활용한 분류 임계값 선택
제목 (타언어)
Selecting the optimal threshold based on impurity index in imbalanced classification
저자
장서인여인권
DOI
10.5351/KJAS.2021.34.5.711
발행일
2021-10
유형
Article
저널명
응용통계연구
34
5
페이지
711 ~ 721