💡Logistic Regression

Overview

Logistic regression still use the real-valued feature to first predict the real-valued output. Then, it transform the predicted output into a probability ([0, 1]) and give a binary output based on the probability (i.e. if $p$ > 0.5 then it is true , otherwise it is false).

$z=w^Tx+b$
Logistic function transform to probability: $p=\frac{1}{1+exp(-z)}$
Decision using threshold: $\hat{y}=1$ if $p>=t$ , otherwise $\hat{y}=0$

Cost Function: In logistic regression, we use cross entropy as cost function: $J(\mathbf{w}, b) = \frac{\sum_{i}^{n} y_i*\log(\hat{y_i})}{n}$
Partial Derivative:
- $\frac{\partial J(\mathbf{w}, b)}{\partial \mathbf{w}} = \frac{- \mathbf{X}^T \cdot (\mathbf{y}-\hat{\mathbf{y}})}{n}$
- $\frac{\partial J(\mathbf{w}, b)}{\partial {b}} = \frac{-\sum_{i=0}^{n} (y_i-\hat{y_i})}{n}$
Stopping Criteria: Each iteration, we calculate the next point using gradient at the current position, scales it (by a learning rate) and subtract obtained value from the current position (makes a step). We subtract the value because we want to minimise the cost function. Thus, we stop when the gradient approach zero, global or local minima; or after fix number of iterations.

Code Implementation

# Logistic regression with Scikit-Learn
from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([3.58, 2.34, 2.09, 1.14, 0.22, 1.65, 4.92, 2.35, 3.01, 5.23, 8.69, 4.85]).reshape(-1,1)
y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = LogisticRegression()
logr.fit(X,y)

predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))

# Logistic regression code from scratch. No use of Scikit-Learn

class LogisticRegression:
    def _init_(self,lr=0.01,epoch=10):
        self.lr=lr # set hyperparameter learning rate
        self.epoch=epoch # set hyperparameter epoch
    
    def fit(self, X, y):
        self.n_obs, self.n_feature=X.shape
        # weight initialization
        self.W=np.zeros(self.n_feature)
        self.b=0
        self.X=X
        self.Y=Y
        
        # gradient descent learning step
        for i in range(self.epoch):
            self.update_weights()
        return self
    
    def update_weights(self):
        Y_pred=self.predict(self.X)
        #calculate gradients
        partial_W=-((self.X.T).dot(self.Y-Y_pred))/self.n_obs
        partial_b=-np.sum(self.Y-Y_pred)/self.n_obs
        #update weights by substracting partial derivative times learning rate
        self.W=self.W-self.lr*partial_W
        self.b=self.b-self.lr*partial_b
        return self
        
    def transform(self, z):
        return 1/(1+np.exp(-z))
        
    def predict(self, X, threshold=0.5):
        z = X.dot(self.W)+self.b
        prob = self.transform(z)
        if prob >= threshold:
            return 1
        else:
            return 0

PreviousLinear Regression NextK-Nearest Neighbors (KNN)

Last updated 2 years ago