# How Does Unsupervised AI work
## Outline
* Why are we even trying to use data to learn anyways?
* There exists a fundamental reality of some situation in the world
* Our goal is to understand the reality as effectively as possible (filter signal from noise)
* Biases and complexity make it hard to have a great grasp of that underlying reality
* Data is a way to get *closer* to the reality of the situation
* Johanes kepler example (from david deutsch book)
* But data is messy and often the nuggets that you are looking for don't often present themselves readily :)
* Initial problem
* We have a data set. Want to return an interesting set of rows. How do we do that?
* Well, there are a lot of possible combinations (see size of power set. In the case of 1000 rows, there are $2^{1000}$ possible row sets that could be returned)
* A row = a transaction in my example
* We could use a state of the art AI algorithm!
* use umap, return a group, and then ask how would you go about explaining this?
* Okay, so what is a good solution?
* How can we define things in a human understandable way?
* Filters
* Great!
* But there are a lot of combinations of filters (now based on your *columns*) (show this)
* How can we select specific filters (patterns) (again, these in turn correspond to selecting certain rows)?
* We can start by imposing a *structure to these filters* by laying them out in a graph
* Now we can search this graph by imposing a KPI and comparing nodes. Which nodes to compare?
* A good place to start would be the parent and the child!
* When a child is interesting wrt the parent, we want to keep it
* How do we define interesting? A difference in a KPI! The metric our customer has deemed interesting.
* Awesome! By following this approach we should have arrived at human understandable patterns that define rows that are interesting wrt the kpi that the customer is interested in!
* Color nodes based on KS score with parent
* Note: We can think of pattern find as a weighted graph search for interesting (KS score) patterns
* explain that the KS score was chose (in a footnote) because it will capture difference in mean as well as other differences in higher order moments
* Want to know more about these patterns? That is what Dressing room and PDP is for!
* Adding features
* What if the features (filters) aren't interesting to the customer? What if they are expected or not actionable?
* We can bring in more features!
* What does that do? It increases our pattern space to search! Show visual here. PF's job is a bit harder now!
* Feature generation
* What does this mean? This means we create a new feature based on existing ones. This may provide ways of describing rows that are interesting wrt the kpi
* In other words, it may help us find ways of describing rows in the power set of $2^{1000}$ that are interesting wrt to the kpi! In a sense we are encoding information here
* Again this will increase our space of patterns to search
* With a larger space to search we can configure pattern find to be more fine tuned:
* Hyper params (how deep to drill into bop, how wide, how deep, how many hows required )
* column weights
* hidden gems
* Perspective change
* whoa now our number of rows has changed altogether and we are working with a different entity
* Our entire pattern space has changed!
* meaning we are trying to find ways of describing entirely different interesting rows
* Fabric
* Now what if we had a fast way to move across perspectives, giving us the ability to quickly search multiple pattern spaces?
* That is the promise of fabric