Handling Imbalance Data in Classification Model

Penulis: Kartika Fithriasari, Iswari Hariastuti, Kinanthi Sukma Wening

Abstract:

Decision tree, one of classification method, can be done to find out the factors that predict something with interpretable result. However, a small and unbalanced percentage will make the classification only lead to the majority class. Therefore, handling imbalance class needs to be done. One method that often used in nominal predictor data is SMOTE-N. For accuracy improving, a hybrid SMOTE-N and ADASYN-N was developed. SMOTE-N-ENN and ADASYN-N were developed for accuracy improvement. In this study, SMOTE-N, SMOTE-N-ENN and ADASYN-N will be compared in handling imbalance class in the classification of premarital sex among adolescent using base class CART. The conclusion obtained regarding the best method for handling class imbalance is ADASYN-N because it provides the highest AUC compared to SMOTE-N and SMOTE-N-ENN. The best decision tree provides information that factors that can predict adolescents having premarital sexual relations are dating style, knowledge of the fertile period, knowledge of the risk of young marriage, gender, recent education, and area of residence.

Keywords: ADASYN-N, CART, hybrid SMOTE-N, imbalanced data, premarital sex

Diterbitkan di: International Journal of Computing Science and Applied Mathematics 6(1):33

Link Artikel/DOI: http://dx.doi.org/10.12962/j24775401.v6i1.6643

Tinggalkan Komentar

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *