# Graphs and Distance Metrics
At unsupervised we are frequently dealing with the problem of have a BOP that is made up of patterns (nodes) and trying to determine how *similar* two nodes are to each other. There are many ways that we can go about this, namely:
* What is the rowset overlap ([jaccard distance](https://en.wikipedia.org/wiki/Jaccard_index)) between our two patterns?
* What is the filterset distance between our two patterns?
* What is the attribute distance between our two patterns?
* What is the difference in KPI distribution (via KS score) between our two patterns?
If we calculate the distance matrix (pairwise) of our BOP (points) we get:
$D_{ij} = \big\{ d(i, j) \mid \; \; i \in B, j \in B \big\} $
Where $B$ is our BOP. The key idea to keep in mind here is that in machine learning/data science we are often trying to determine *similarity* between points. In general, we think about things living in a euclidean space, where distance is clearly understood. But, the *information* that we can extract about the similarities between any two points in euclidean space *will be represented via a distance matrix built from a set containing those points*. A distance matrix allows us to capture the *structure* of a set, whether that set be made up of euclidean points, nodes in a BOP, friends in a graph, etc.
In general, graphs represent *connections* between objects. So here is a question: if a graph does not have any connection between nodes $A$ and $B$, what would the distance between them be? Well, if we are very interested in capturing the relations contained via the edges, we could look for the *minimum path* from $A$ to $B$. For instance, if both $A$ and $B$ are connected to $C$, the path length from $A$ to $B$ would be $2$ (i.e. two edges separate $A$ from $B$).
A great video on graph theory and dual graphs can be found [here](https://www.youtube.com/watch?v=-9OUyo8NFZg).
#mathematics #graph-theory #distance #metric