# Graphs and Distance Metrics At unsupervised we are frequently dealing with the problem of have a BOP that is made up of patterns (nodes) and trying to determine how *similar* two nodes are to each other. There are many ways that we can go about this, namely: * What is the rowset overlap ([jaccard distance](https://en.wikipedia.org/wiki/Jaccard_index)) between our two patterns? * What is the filterset distance between our two patterns? * What is the attribute distance between our two patterns? * What is the difference in KPI distribution (via KS score) between our two patterns? If we calculate the distance matrix (pairwise) of our BOP (points) we get: $D_{ij} = \big\{ d(i, j) \mid \; \; i \in B, j \in B \big\} $ Where $B$ is our BOP. The key idea to keep in mind here is that in machine learning/data science we are often trying to determine *similarity* between points. In general, we think about things living in a euclidean space, where distance is clearly understood. But, the *information* that we can extract about the similarities between any two points in euclidean space *will be represented via a distance matrix built from a set containing those points*. A distance matrix allows us to capture the *structure* of a set, whether that set be made up of euclidean points, nodes in a BOP, friends in a graph, etc. In general, graphs represent *connections* between objects. So here is a question: if a graph does not have any connection between nodes $A$ and $B$, what would the distance between them be? Well, if we are very interested in capturing the relations contained via the edges, we could look for the *minimum path* from $A$ to $B$. For instance, if both $A$ and $B$ are connected to $C$, the path length from $A$ to $B$ would be $2$ (i.e. two edges separate $A$ from $B$). A great video on graph theory and dual graphs can be found [here](https://www.youtube.com/watch?v=-9OUyo8NFZg). #mathematics #graph-theory #distance #metric