💡Linear Regression
Linear regression is a supervised learning algorithm. It is used to predict the real-valued output based on the input features
We denote as the feature vector and as the bias. Here, we want to find the weight vector and bias that best fit the model.
In the formula, is regression coefficient that quantify how much each feature impacts the outcome. The bias shows how far the linear regression line is alway from the zero point - or, the outcome when all features are 0. Here, we can use gradient descent to solve for and .
Gradient Descent
The idea of gradient descent is as follow: we initialize value for and in the first iteration, the we gradually learn from the costs and try to minimize the deviation. Below are the building blocks of a linear regressor:
Cost Function: We can use Mean squared error (MSE) as cost function: . Cost function measures how “off” the model predictions are. is the observed value of observation , is the predicted value of ,and is the number of observations in the dataset. The objective is to find values of and that minimize
Partial Derivative: A gradient is the partial derivative of a function with respect to a parameter.
Stopping Criteria: Each iteration, we calculate the next point using gradient at the current position, scales it (by a learning rate) and subtract obtained value from the current position (makes a step). We subtract the value because we want to minimise the cost function. Thus, we stop when the gradient approach zero, global or local minima; or after fix number of iterations.
# SGDRegressor is a stochastic gradient descent package inside Scikit-Learn.
# Instead of using all observations each time, SGD randomly chooses one observation
# each epoch to computer gradients, thus speeding up the learning process.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
# load the boston house pricing dataset for demonstration
X, y = datasets.load_boston(return_X_y=True)
# use 80% for training and 20% for test
X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=0.8, random_state=42)
# create a new model
lr = SGDRegressor(learning_rate="optimal", max_iter=10000)
# fit model to training data
lr.fit(X_train, y_train)
# use fitted model to predict test data
y_pred = lr.predict(X_test)
# Gradient descent code from scratch. No use of Scikit-Learn
class LinearRegression:
def _init_(self,lr=0.01,epoch=10):
self.lr=lr # set hyperparameter learning rate
self.epoch=epoch # set hyperparameter epoch
def fit(self, X, y):
self.n_obs, self.n_feature=X.shape
# weight initialization
self.W=np.zeros(self.n_feature)
self.b=0
self.X=X
self.Y=Y
# gradient descent learning step
for i in range(self.epoch):
self.update_weights()
return self
def update_weights(self):
Y_pred=self.predict(self.X)
#calculate gradients
partial_W=-(2*(self.X.T).dot(self.Y-Y_pred))/self.n_obs
partial_b=-2*np.sum(self.Y-Y_pred)/self.n_obs
#update weights by substracting partial derivative times learning rate
self.W=self.W-self.lr*partial_W
self.b=self.b-self.lr*partial_b
return self
def predict(self, X):
return X.dot(self.W)+self.b
Code snippets are adapted from GeeksforGeeks.
Ordinary Least Squares (OLS)
Instead of gradient descent, OLS is another method used in linear regression. The idea is same: minimize the prediction error.
Formula: . Therefore, we want to minimize the residual error . We present the result directly here:
# OLS with Scikit-Learn
from sklearn.linear_model import LinearRegression
from sklearn import datasets
from sklearn.model_selection import train_test_split
# load the boston house pricing dataset for demonstration
X, y = datasets.load_boston(return_X_y=True)
# use 80% for training and 20% for test
X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=0.8, random_state=42)
# LinearRegression use OLS method as default
model=LinearRegression()
model.fit(X_train, y_train)
y_pred=model.predict(X_test)
import numpy as np
import copy
class LinearRegression:
def _init_(self):
#no hyperparameter
self.w=None
self.b=None
def fit(self, X, y):
self.X=X
self.y=y
X=copy.deepcopy(X)
X=np.concatenate((np.ones(X.shape[0]),1)
# implement the formula
betas=np.linalg.inv(X.transpose().dot(X)).dot(X.transpose()).dot(y)
self.b=betas[0]
self.w=betas[1:]
return self
def predict(self,X):
return X.dot(self.w)+self.b
Code snippets are adapted from IBM Developer.
Last updated