Start with preparing multiple regression models for plotting. Firstly, we will create two models from a dataset; one with moderately correlated variables and another with highly correlated variables.

Then we will slightly modify the data to see which model will be affected more by the small modification. Start with importing libraries.

`import numpy as np`

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns%matplotlib inline

## Getting data

For example, this article will work with the *Cars Data* dataset, which has a public domain license. It can be freely and directly downloaded from the Seaborn library with `seaborn.load_dataset()`

function.

The dataset contains 392 cars’ prices and features between 1970–1982 in USA, Europe, and Japan. More information about the dataset can be found here: link.

`data = sns.load_dataset('mpg')`

data.dropna(inplace=True) #remove null rows

data.head()

Doing exploratory data analysis to understand the dataset is always a good idea.

`data.describe()`

The ranges (max value – min value) in the columns are quite different. Thus, performing the standardization can help interpret the coefficients later.

`data = data.iloc[:,0:-2] # select columns`from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

df_s = scaler.fit_transform(data)

df = pd.DataFrame(df_s, columns=data.columns)

df.head()

Continue with plotting a heat map to show the correlation among the variables.

`plt.figure(figsize=(9, 6))`

sns.heatmap(df.corr(), cmap='coolwarm', annot=True)

plt.show()

The result shows that there are some variables highly correlated with other variables. If every predictor is put in a multiple regression model to predict the ‘mpg’ value, multicollinearity will affect the model.

## Calculating the VIF values

This can be quickly proved by calculating the VIF values with the Statsmodels library.

`from statsmodels.stats.outliers_influence import variance_inflation_factor as vif`vif_data = pd.DataFrame()

vif_data["feature"] = df.columns

# calculating VIF for each feature

vif_data["VIF"] = [vif(df.values, i) for i in range(len(df.columns))]

print(vif_data)

Some VIF values are pretty high (> 5), which can be interpreted as highly correlated. If we directly put every predictor in a multiple regression model, the model can suffer from the multicollinearity problem.

Next, only some predictors will be selected. This article will work with two models: one with moderately correlated and another with highly correlated predictors.

The first multiple regression model uses ‘cylinders’ and ‘acceleration’ to predict the ‘mpg,’ while the second uses’ cylinders’ and ‘displacement.’ Let’s calculate the VIF values again.

`df1 = df[['mpg', 'cylinders', 'acceleration']]`

df2 = df[['mpg', 'cylinders', 'displacement']]# VIF dataframe1

vif_data1 = pd.DataFrame()

vif_data1['feature'] = df1.columns

vif_data1['VIF'] = [vif(df1.values, i) for i in range(len(df1.columns))]

print('model 1')

print(vif_data1)

# VIF dataframe2

vif_data2 = pd.DataFrame()

vif_data2['feature'] = df2.columns

vif_data2['VIF'] = [vif(df2.values, i) for i in range(len(df2.columns))]

print('model 2')

print(vif_data2)

The difference between these two models is changing from the variable ‘acceleration’ to ‘displacement.’ However, it can be noticed that none of the VIF values in the first model are higher than 5, while the second model has relatively high VIF values, more than 10.

## Creating a multiple regression model

We will use the OLS function from the statsmodels library to create a multiple regression model.

`import statsmodels.api as sm`

y1 = df1[['mpg']]

X1 = df1[['cylinders', 'acceleration']]lm1 = sm.OLS(y1, X1)

model1 = lm1.fit()

model1.summary()

## Plotting the model

From the obtained model, we will define a function to run the model on a mesh grid for use in the next step.

`def run_model(v1, v2, pd_):`

mesh_size = 0.02

x_min, x_max = pd_[[v1]].min()[0], pd_[[v1]].max()[0]

y_min, y_max = pd_[[v2]].min()[0], pd_[[v2]].max()[0]

xrange = np.arange(x_min, x_max, mesh_size)

yrange = np.arange(y_min, y_max, mesh_size)

xx, yy = np.meshgrid(xrange, yrange)

return xx, yy, xrange, yrange

Here comes the fun part. Let’s plot the model with the Plotly library, which helps create an interactive plot easily. Thus, we can interact with the visualization, such as zooming or rotating.

`import plotly.express as px`

import plotly.graph_objects as go

from sklearn.svm import SVR#run and apply the model

xx1, yy1, xr1, yr1 = run_model('cylinders', 'acceleration', X1)

pred1 = model1.predict(np.c_[xx1.ravel(), yy1.ravel()])

pred1 = pred1.reshape(xx1.shape)

# plot the result

fig = px.scatter_3d(df1, x='cylinders', y='acceleration', z='mpg')

fig.update_traces(marker=dict(size=5))

fig.add_traces(go.Surface(x=xr1, y=yr1, z=pred1, name='pred1'))

fig.show()

We can perform the same process with the second model, which has the multicollinearity problem.

`y2 = df2[['mpg']]`

X2 = df2[['cylinders', 'displacement']]lm2 = sm.OLS(y2, X2)

model2 = lm2.fit()

model2.summary()

`#run and apply the model`

xx2, yy2, xr2, yr2 = run_model('cylinders', 'displacement', X2)

pred2 = model2.predict(np.c_[xx2.ravel(), yy2.ravel()])

pred2 = pred2.reshape(xx2.shape)# plot the result

fig = px.scatter_3d(df2, x='cylinders', y='displacement', z='mpg')

fig.update_traces(marker=dict(size=5))

fig.add_traces(go.Surface(x=xr2, y=yr2, z=pred2, name='pred2'))

fig.show()