Gradient Descent vs Gradient Boosting

# Gradient Descent vs Gradient Boosting They're two different algorithms, but there is some connection between them: **Gradient descent** is an algorithm for finding a set of parameters that optimizes a loss function. Given a loss function $f(x, \theta)$, where $x$ is an $n$-dimensional vector and $\theta$ is a set of parameters, gradient descent operates by computing the gradient of $f$ with respect to $\theta$. It then "descends" the gradient by nudging the parameters in the opposite direction of the gradient. This process is repeated for different points in the space of inputs (i.e. different $xs) until a minimum of $f$ is found. **Gradient boosting** is a technique for building an [ensemble](https://blog.statsbot.co/ensemble-learning-d1dcd548e936) of weak models such that the predictions of the ensemble minimize a loss function. I think the Wikipedia article on gradient boosting explains the connection to gradient descent really well: > . . . boosting algorithms [are] iterative functional gradient descent algorithms. That is, algorithms that optimize a cost function over function space by iteratively choosing a function (weak hypothesis) that points in the negative gradient direction. So the connection is this: Both algorithms descend the gradient of a differentiable loss function. Gradient descent "descends" the gradient by introducing changes to parameters, whereas gradient boosting descends the gradient by introducing new models. --- Date: 20211220 Links to: [Gradient Boosting](Gradient%20Boosting.md) Tags: References: * [Fantastic overview/answer](https://datascience.stackexchange.com/questions/61501/what-is-the-difference-between-gradient-descent-and-gradient-boosting-are-they)