- HydroTrenches
- Posts
- Gradient Descent
Gradient Descent
Resources:
Download: Linear Regression Cheat Sheet
GitHub: Linear Regression Model
Gradient Descent Motivation
Machine Learning has 2 phases: Learning and Prediction.
Learning: The learning algorithm job is to create a hypothesis function.
Prediction: The hypothesis functions job is to predict outputs.
The cost function represents the error between predicted output and actual outputs. We can tune parameters of learning algorithm to minimize the error. Gradient descent is a popular way of doing so.
Gradient Descent Algorithm
Start with any value for parameters of learning.
Can pick random value or initiate to 0. Doesn’t matter much for linear regression.
Example,
parameters = np.zeros(dimensions + 1)Update the parameters to go down fast (on the surface of the cost function)
parameters += learning_rate * (y_i - predict(x_i)) * x_iRepeat until you get to a local optimum.
Strong sign of a local optimum when the cost function is no longer significantly decreasing, given that the learning rate is small.
Gradient Descent Example
Linear regression model with a gradient descent based learning algorithm
"""Machine Learning Models"""
import numpy as np
class LinearRegression:
"""Linear Regression"""
def __init__(self, dimensions: int = 1):
"""init
Parameters
----------
dimensions
number of dimensions for linear regression, by default 1
"""
# Initilize parameters
self.parameters = np.zeros(dimensions + 1)
def learn(self, x: np.ndarray, y: np.ndarray, learning_rate: float, max_iter: int):
"""learn maps input-output training dataset to a hypothesis function
Parameters
----------
x
input data
y
output data
learning_rate
learning rate
max_iter
maximum number of iterations, currently equal to total iterations
"""
# Gradient descent
for _ in range(max_iter):
for xy in zip(x, y):
x_i, y_i = self.reconstruct_input_vec(xy[0]), xy[1]
self.parameters += learning_rate * (y_i - self.predict(x_i)) * x_i
def reconstruct_input_vec(self, x: np.ndarray | float):
"""Create a dummy input, x_0, always defined as 1
Parameters
----------
x
input
Returns
-------
(x_0,...x_n), x_0 = 1
"""
return np.concatenate(([1], np.atleast_1d(x)))
def predict(self, x: np.ndarray):
"""Predict using linear regression hypothesis function
Parameters
----------
x
input
Returns
-------
prediction
"""
return np.dot(self.parameters, x)
Batch Vs Stochastic
Batch gradient descent scans through the entire training set before taking a single step, where a stochastic gradient descent takes a step with each example. Stochastic is often preferred over batch gradient descent when the data set is large.