Member-only story

Use Partial Dependence Plot for Feature Selection

Gustavo R Santos
2 min readNov 21, 2024

A good tool to use in Python for model evaluation and feature selection.

PDP Plot. Image created by DALL•E by Open AI. https://openai.com. Open AI DALL•E, 2024

Intro

Use Partial Dependence Plots (PDPs) to analyze feature impact without retraining!

PDPs show how a single feature influences predictions while holding others constant.

Key Components of the PDP

  • X-Axis (Feature Values): This represents the range of values for the feature being analyzed. For instance, if you’re analyzing MedianIncome, the x-axis will show different income levels.
  • Y-Axis (Predicted Response): This represents the average predicted outcome of the model for the corresponding feature value on the x-axis.
  • Line (Partial Dependence): The line shows the relationship between the feature and the predicted target. It’s an aggregated view of how changing the feature affects the model’s predictions.

Here’s how to create PDPs in Python with scikit-learn.

from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import PartialDependenceDisplay
from sklearn.datasets import fetch_california_housing
# Load data and train model
data = fetch_california_housing()
model = RandomForestRegressor().fit(data.data, data.target)

--

--

Gustavo R Santos
Gustavo R Santos

Written by Gustavo R Santos

Data Scientist | I solve business challenges through the power of data. | Visit my site: https://gustavorsantos.me

Responses (1)