Indexed by:
Abstract:
With the development of information society, tens of thousands of data will be generated at any time. How to obtain valuable information from data has always been explored by scholars. As an important research field in machine learning, data classification can discover classification rules and make data prediction by analyzing the training set of known categories. In order to solve the problem of multi-classification unbalanced data, we use DPC-based data preprocessing method to optimize the classifier, and select three mainstream classifiers, namely random forest, decision tree, k-nearest neighbor and support vector machine model for analysis and research. It is worth noting that in the KNN model, we use the weighted KNN algorithm based on kernel function for prediction training. Specifically, this paper first uses DPC algorithm to quickly select the cluster center, and selects the optimal clipping threshold through experiments to pry the training set, uses the optimized training set to train and optimize the classifier, and finally uses the classifier to predict the test set. In order to better test the model accuracy, we choose the 50% cross test. Experimental results show that the optimization algorithm can improve the classifier accuracy in 99.9% of the cases. © 2023 IEEE.
Keyword:
Reprint 's Address:
Email:
Source :
Year: 2023
Page: 332-338
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 10
Affiliated Colleges: