Decision Trees - Nate's Notes

# Decision Trees **Decision Trees** are an algorithm that recursively build a tree data structure a top of a dataset. This structure has leaves that contain specific rows ($x$ data points) in the dataset. These leaves have a *union* that is equal to the original data set. We can say that they are [collectively exhaustive](https://en.wikipedia.org/wiki/Collectively_exhaustive_events). The key idea of decision trees is: > As the tree data structure is built out a top the dataset, each successive node should become more and more **pure**. To do this the algorithm selects how to split the data at each step based on the **information gain** (similar to **entropy**) at each step. If the algorithm selects a split that minimizes *entropy* it will necessarily maximize *information gain*. ### Information Gain As talked about in [Information Theory](Information-Theory%201.md), *gaining information* is the same thing as *reducing entropy*. So, when building a decision tree we wish to find the splits that maximally reduce our entropy of that which we wish to predict. ![](Information%20theory%202.png) We see how this could work in the case of trying to make a discrete prediction, where we are trying to predict between six possible outcomes. How would this work in the case of a regression tree? To be concrete, consider if we were trying to predict price. How might we go about that? Well, we can take the same approach! We wish to find splits that reduce the entropy of our distribution. ![](Information%20theory%203.png) In practice, the way that this is done is via: * For a given group/split/set of data (i.e. some data points/rows), the prediction for *each point* is simply the average of the group * We then calculate the [RSS](https://en.wikipedia.org/wiki/Residual_sum_of_squares) across all points. We attempt to minimize this value by picking good splits. * In this case, picking good splits amounts to picking splits where there is *low variance* in the dependent variable ($y$), meaning our RSS will be low. --- Date: 20211220 Links to: [Machine-Learning-Models](Machine-Learning-Models.md) Tags: References: * [Josh Starmer Intro](https://www.youtube.com/watch?v=7VeUPuFGJHk) * [Good set of slides](https://www.cc.gatech.edu/~bboots3/CS4641-Fall2018/Lecture2/02_DecisionTrees.pdf)