How to do feature selection in Machine Learning | by Jay Hui | Oct, 2020


A extra refined manner is to do it by supervised finding out. Back to our analogy of soccer applicants selection, we do it by two rounds:

In the primary spherical, we testify the soccer abilities (Supervised manner), comparable to penalty kick, taking pictures, short-pass, for each and every candidate, and rank them. Suppose we will be able to choose the highest 50 applicants out of the 200 applicants now.

In the second one spherical, since we wish to in finding the most efficient mixture of 12 applicants out of the 50 applicants, we’d like to testify how will those 50 applicants cooperate. Finally, we discover the most efficient 12 applicants. (Why we do no longer do the second one spherical immediately? Running all iterations takes numerous time and so we’d like the initial check in the primary spherical.)

Technical talking, the primary spherical is the “feature selection by type” and the second one spherical is “Recursive Feature Elimination” (RFE). Let’s return to device finding out and coding now.

CPU instances: consumer 15.nine s, sys: 271 ms, overall: 16.2 s
Wall time: 14.three s
========== LogisticRegression ==========
Accuracy in coaching: 0.4138598854833277
Accuracy in legitimate: 0.41020945163666983
Show best 10 necessary options:
Image by the Author
========== ExtraTreesClassifier ==========
Accuracy in coaching: 0.3467497473896935
Accuracy in legitimate: 0.3467213977476055
Show best 10 necessary options:
Image by writer
========== RandomForestClassifier ==========
Accuracy in coaching: 0.3473391714381947
Accuracy in legitimate: 0.34581622987053995
Show best 10 necessary options:
Image by the Author

We additionally plot the type significance scores for each and every type:

Image by the Author

Since L1-based logistic regression has the best possible accuracy, so we can simplest choose the highest 60 options (from the graph) by logistic regression:

selected_model = ‘LogisticRegression’
number_of_features = 60
selected_features_by_model = importance_fatures_sorted_all[importance_fatures_sorted_all[‘model’] == selected_model].index[:number_of_features].tolist()

  • Recursive Feature Elimination (RFE)
    The 2nd section is to choose the most efficient mixture of options. We do it by “Recursive Feature Elimination” (RFE). Now as a substitute of establishing one type, we construct n fashions (the place n = the collection of options). In the primary iteration, we teach the type by all 60 options and calculate the cross-validation accuracy and the feature significance of all columns. Then we drop the least necessary feature and so we’ve got 59 options now. Based on those 59 options, we repeat the above procedure and we finish on the ultimate unmarried feature. This way takes time however will provide you with a competent feature significance score. If time isn’t allowed for the massive dataset, one can believe do it by sampling.
Image by the Author
CPU instances: consumer 7min 2s, sys: 334 ms, overall: 7min 2s
Wall time: 26min 32s

As you’ll see, the validation accuracy will after all saturate (round 0.475) as skilled with extra options. We can test our feature significance score now:

Image by the Author

Again, there is not any Golden Rule to do the feature selection. In the trade global with manufacturing, we’ve got to stability the {hardware} capacity, required time, balance of the type, and the type efficiency. After we’ve got discovered the most efficient mixture of columns, we now can choose the most efficient set of hyper-parameter.


Please enter your comment!
Please enter your name here