# Perceptrons vs MultiLayerPerceptrons
A perceptron can be nicely visualized below:

And here is an MLP:

The big idea to keep in mind here is that in a simple Perceptron each input, $x_i$, has a single associated weight, $w_i$. It gets multiplied via this weight and then summed up (a simple [Linear Combination](Linear%20Combination.md)) with the other inputs and their corresponding weights. This sum is then passed through some sort of activation function. We can see that there is *very little* **communication** between inputs with this architecture (the only communication occurs with the summation).
Now consider the MLP. We see that in order to get to $h_1$ (the hidden layer), $x_1$ is multiplied by $w_{11}$, $x_2$ is multiplied by $w_{12}$, and so on. These are then summed up and passed through an activation function to yield $h_1$:
$h_1 = activation(\sum_i x_i w_{1i})$
There are many ways to interpret what this hidden unit $h_1$ represents, but one is that it is an *updated representation of $x_1$*, where $x_1$ is specifically updated to include information about the other input values.
It is useful to realize that the computation of $h_1$ is in fact done via a **perceptron**! So, if we have a hidden layer consisting of $n$ nodes, there are $n$ internal (and independent) perceptrons used to compute these nodes.
### Jeremy Howard: Deep Learning is all about adding ReLU's
Consider our hidden nodes, $h_1, \dots, h_n$. Each of these nodes is the result of a linear combination being passed through a ReLU. That linear combination that is being passed into the ReLU is a function of our weights, so each ReLU will updated as our weights are learned.
What do we then do with the results of these ReLU's (our $h