Multiple Regression using Python
Implement multiple regression with gradient descent using Python
November 15, 2016
Table of Contents
Whenever I do any machine learning I either manually implement models in MATLAB or use Python libraries like scikit-learn where all of the work is done for me. However, I wanted to learn how to manually implement some of these things in Python so I figured I’d document this learning process over a series of posts. Lets start with something simple: ordinary least squares multiple regression.
The goal of multiple regression is predict the value of some outcome from a series of input variables. Here, I’ll be using the Los Angeles Heart Data
Setting up the data
Let's import some modules:
Next, we’ll need to load the dataset:
We’ll need to create a design matrix (x
) containing our predictor variables, and a vector (y
) for the outcome we’re trying to predict. In this case, I’m just going to predict the first column from all of the others for demonstration purposes.
We can use slicing to grab every row and all columns (after the first) to create x
.
Optionally, we can scale (standardize) the data so gradient descent has an easier time converging later:
We’ll need to add a column of 1’s so we can estimate a bias/intercept:
We can do the same thing to pull out the first column to create y. We want this as a column vector for later so we need to reshape it:
Training the model
First we need to initialize some weights. We should also initialize some other variables/parameters that we’ll use during training:
At this stage, our weights (theta
) will just be set to some initial values (1’s in this case) that will be updated during training:
The actual training process for multiple regression is pretty straightforward.
For a given set of weight values, we need to calculate the associated cost/loss:
- Evaluate the hypothesis by multiplying each variable by their weight
- Calculate the residual and squared error
- Calculate the cost using quadratic loss
Now that we know how ‘wrong’ our current set of weights are, we need to go back and update those weights to better values. ‘Better’ in this case just means weight values that will lead to a smaller amount of error. We can use (batch) gradient descent to do this.
I won’t go into details about gradient descent here. The general idea is that we are trying to minimize the cost and to do that we can calculate the partial derivative (gradient) with respect to the weights. Once we know the gradient, we can adjust the value of the weights (theta) in the direction of the minimum. Over many iterations, the weights will converge towards values that will give us the smallest cost value.
Note: the speed of this update is controlled by the learning rate alpha
. Setting this value too large can cause gradient descent to diverge, which is not what we want.
We simply repeat this entire process over many iterations, and we should end up learning weights that give us the smallest error:
You can see the cost dropping across each iteration:
We can visualize learning with a plot. This can be useful for determining whether gradient descent is converging or diverging:
Visualization
We can visualize learning with a plot. This can be useful for determining whether gradient descent is converging or diverging:
theta
now contains the learned weight values for each variable (including the bias/intercept):
The full code can be found in my GitHub repo here.