Member-only story
Quick Tip: Return Column Names for sklearn’s SelectKBest
It is easy to return more than a Boolean list.
When you’re using Scikit Learn’s SelectKBest
, you are interested in returning the k
best features for you model. Let’s say you have a dataset with 7 variables, but you want to know the three that best match with your label, therefore returning a better and simpler model, that’s when you should use that algorithm SelectKBest
.
When using sklearn’s
SelectKBest
to select the best K features for your model, it will use the score classification function to match the explanatory variable (x) vs. the explained variable (y), one by one, returning True where the K scores are the highest.
The algorithm, once executed, usually returns a Boolean list, like [True False True True]
(count of Trues == k) instead of the expected list of features to be used for your model.
A simple knowledge of Pandas indexing and of the sklearn algorithm can help you correct that with a single line of code.
Coding
Let’s start import some modules.
import pandas as pd
import seaborn as sns
from…