Calculating Variance Inflation Factor (VIF) in Python

Gustavo R Santos
4 min readDec 11, 2024

Get rid of multicollinearity by calculating this great metric and improve your model using Python.

Variables inflated | Image generated by AI. Meta Llama 3, 2024. https://meta.ai.

Introduction

What should we do about highly correlated features in your dataset?

That is a problem that can reduce the effectiveness of a linear model. When more than one variable explains the same behavior of the y variable, we say that they are colinear.

That issue can bring problems to the model, such as inflating beta coefficients or decreasing the value of one beta to compensate for another variable — a colinear one — that explains the same behavior in y. In some cases, the distortion can be so high that could change the signal of the beta coefficient.

Solving the Problem

To solve this problem, we can reduce multicollinearity by calculating the Variance Inflation Factor, or just VIF, ensuring your model stays interpretable and robust.

VIF is calculated from the Regression between the explanatory variables.

VIF is calculated from Regression models between the explanatory variables. So, we drop our target y and calculate the R² from all the explanatory variables (our X matrix) against each other.

--

--

Gustavo R Santos
Gustavo R Santos

Written by Gustavo R Santos

Data Scientist | I solve business challenges through the power of data. | Visit my site: https://gustavorsantos.me

Responses (1)