# How Does Unsupervised AI work ## Outline * Why are we even trying to use data to learn anyways? * There exists a fundamental reality of some situation in the world * Our goal is to understand the reality as effectively as possible (filter signal from noise) * Biases and complexity make it hard to have a great grasp of that underlying reality * Data is a way to get *closer* to the reality of the situation * Johanes kepler example (from david deutsch book) * But data is messy and often the nuggets that you are looking for don't often present themselves readily :) * Initial problem * We have a data set. Want to return an interesting set of rows. How do we do that? * Well, there are a lot of possible combinations (see size of power set. In the case of 1000 rows, there are $2^{1000}$ possible row sets that could be returned) * A row = a transaction in my example * We could use a state of the art AI algorithm! * use umap, return a group, and then ask how would you go about explaining this? * Okay, so what is a good solution? * How can we define things in a human understandable way? * Filters * Great! * But there are a lot of combinations of filters (now based on your *columns*) (show this) * How can we select specific filters (patterns) (again, these in turn correspond to selecting certain rows)? * We can start by imposing a *structure to these filters* by laying them out in a graph * Now we can search this graph by imposing a KPI and comparing nodes. Which nodes to compare? * A good place to start would be the parent and the child! * When a child is interesting wrt the parent, we want to keep it * How do we define interesting? A difference in a KPI! The metric our customer has deemed interesting. * Awesome! By following this approach we should have arrived at human understandable patterns that define rows that are interesting wrt the kpi that the customer is interested in! * Color nodes based on KS score with parent * Note: We can think of pattern find as a weighted graph search for interesting (KS score) patterns * explain that the KS score was chose (in a footnote) because it will capture difference in mean as well as other differences in higher order moments * Want to know more about these patterns? That is what Dressing room and PDP is for! * Adding features * What if the features (filters) aren't interesting to the customer? What if they are expected or not actionable? * We can bring in more features! * What does that do? It increases our pattern space to search! Show visual here. PF's job is a bit harder now! * Feature generation * What does this mean? This means we create a new feature based on existing ones. This may provide ways of describing rows that are interesting wrt the kpi * In other words, it may help us find ways of describing rows in the power set of $2^{1000}$ that are interesting wrt to the kpi! In a sense we are encoding information here * Again this will increase our space of patterns to search * With a larger space to search we can configure pattern find to be more fine tuned: * Hyper params (how deep to drill into bop, how wide, how deep, how many hows required ) * column weights * hidden gems * Perspective change * whoa now our number of rows has changed altogether and we are working with a different entity * Our entire pattern space has changed! * meaning we are trying to find ways of describing entirely different interesting rows * Fabric * Now what if we had a fast way to move across perspectives, giving us the ability to quickly search multiple pattern spaces? * That is the promise of fabric