Member-only story

Quick Tip: Return Column Names for sklearn’s SelectKBest

Gustavo R Santos
3 min readJan 4, 2023

It is easy to return more than a Boolean list.

This is the entire code. Image by the author.

When you’re using Scikit Learn’s SelectKBest, you are interested in returning the k best features for you model. Let’s say you have a dataset with 7 variables, but you want to know the three that best match with your label, therefore returning a better and simpler model, that’s when you should use that algorithm SelectKBest.

When using sklearn’s SelectKBest to select the best K features for your model, it will use the score classification function to match the explanatory variable (x) vs. the explained variable (y), one by one, returning True where the K scores are the highest.

The algorithm, once executed, usually returns a Boolean list, like [True False True True] (count of Trues == k) instead of the expected list of features to be used for your model.

A simple knowledge of Pandas indexing and of the sklearn algorithm can help you correct that with a single line of code.

Coding

Let’s start import some modules.

import pandas as pd
import seaborn as sns
from…

--

--

Gustavo R Santos
Gustavo R Santos

Written by Gustavo R Santos

Data Scientist | I solve business challenges through the power of data. | Visit my site: https://gustavorsantos.me

No responses yet