Member-only story
Use Partial Dependence Plot for Feature Selection
2 min readNov 21, 2024
A good tool to use in Python for model evaluation and feature selection.
Intro
Use Partial Dependence Plots (PDPs) to analyze feature impact without retraining!
PDPs show how a single feature influences predictions while holding others constant.
Key Components of the PDP
X-Axis (Feature Values)
: This represents the range of values for the feature being analyzed. For instance, if you’re analyzing MedianIncome, the x-axis will show different income levels.Y-Axis (Predicted Response)
: This represents the average predicted outcome of the model for the corresponding feature value on the x-axis.- Line (Partial Dependence): The line shows the relationship between the feature and the predicted target. It’s an aggregated view of how changing the feature affects the model’s predictions.
Here’s how to create PDPs in Python with scikit-learn.
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import PartialDependenceDisplay
from sklearn.datasets import fetch_california_housing
# Load data and train model
data = fetch_california_housing()
model = RandomForestRegressor().fit(data.data, data.target)