In this study, the manufacturing data of two sets of data imbalance problems are taken as the research object. The data sources are the data of UCI database and the actual collection of a factory's machine respectively. It is expected to find out the impact of fault classification problems on the performance of classifiers, and provide relevant solutions, and then compare the performance of each classifier.<br>In this study, two sampling methods, i.e. the majority method and the smote method, are used to deal with the problem of data imbalance, and the chi square feature selection method is used to find out the features that can effectively improve the performance of classifier and training performance. Four machine learning algorithms, decision tree, random forest, xgboost and KNN, are used to classify. After the test of test group data, the accuracy and AUC value are used as the indicators of performance evaluation Mark.<br>
正在翻譯中..
