상세 보기
- 장서인;
- 여인권
WEB OF SCIENCE
0SCOPUS
0초록
In this paper, we propose the method of adjusting thresholds using impurity indices in classification analysis on imbalanced data. Suppose the minority category is Positive and the majority category is Negative for the imbalanced binomial data. When categories are determined based on the commonly used 0.5 basis, the specificity tends to be high in unbalanced data while the sensitivity is relatively low. Increasing sensitivity is important when proper classification of objects in minority categories is relatively important. We explore how to increase sensitivity through adjusting thresholds. Existing studies have adjusted thresholds based on measures such as G-Mean and F1-score, but in this paper, we propose a method to select optimal thresholds using the chi-square statistic of CHAID, the Gini index of CART, and the entropy of C4.5. We also introduce how to get a possible unique value when multiple optimal thresholds are obtained. Empirical analysis shows what improvements have been made compared to the results based on 0.5 through classification performance metrics.
키워드
- 제목
- 불균형 자료에서 불순도 지수를 활용한 분류 임계값 선택
- 제목 (타언어)
- Selecting the optimal threshold based on impurity index in imbalanced classification
- 저자
- 장서인; 여인권
- 발행일
- 2021-10
- 유형
- Article
- 저널명
- 응용통계연구
- 권
- 34
- 호
- 5
- 페이지
- 711 ~ 721