Calculating Variance Inflation Factor (VIF) in Python
Get rid of multicollinearity by calculating this great metric and improve your model using Python.
Introduction
What should we do about highly correlated features in your dataset?
That is a problem that can reduce the effectiveness of a linear model. When more than one variable explains the same behavior of the y variable, we say that they are colinear.
That issue can bring problems to the model, such as inflating beta coefficients or decreasing the value of one beta to compensate for another variable — a colinear one — that explains the same behavior in y. In some cases, the distortion can be so high that could change the signal of the beta coefficient.
Solving the Problem
To solve this problem, we can reduce multicollinearity by calculating the Variance Inflation Factor, or just VIF, ensuring your model stays interpretable and robust.
VIF is calculated from the Regression between the explanatory variables.
VIF is calculated from Regression models between the explanatory variables. So, we drop our target y and calculate the R² from all the explanatory variables (our X matrix) against each other.