High Dimensional Spaces

# High Dimensional Spaces [High-Dimensional-Spaces - Pyramid Principles Outline](High-Dimensional-Spaces%20-%20Pyramid%20Principles%20Outline.md) ### Old Outline Below is the outline for the structure of this series of posts. * Post 1: The curse of dimensionality * Intro: Why does this matter? What is the fundamental tenant behind machine learning (both supervised and unsupervised)? In a sense it is assessing similarity. * This becomes incredibly challenging in high dimensions * Explain Why this is so bad. * Why does this happen? * Show that each time you add a dimension our * Examples of why this is so bad * Combinatorics and coverage * sphere packing * volumes * shapes lose their similarity (spheres become worse approximations of cubes) * gaussian (in high dims) * Intuitions * Each time you add a dimension the distance between center and corner increases without bound. Hence, all space is "far away from center". Show this in the case of orange. #### Rough Topics and Notes #### The Curse of Dimensionality Euclidean Distance does not work well in higher dimensions. This is often referred to as the *curse of dimensionality*. Quoting Pedro Domingos: > Similarity-based reasoning that machine learning algorithms depend on (explicitly or implicitly) breaks down in high dimensions.<br> In high dimensions, all examples look alike. As the dimensionality increases, more and more example become nearest neighbors of a given point. Eventually, the choice of nearest neighbor is effectively random. Clearly, this wreaks havoc on approaches that utilize *similarity*. <br> In high dimensions, most of the mass of a multivariate Gaussian distribution is not near the mean, but in an increasingly distant “[shell](https://photos.google.com/search/_eAF1QipOQyc3orxK6jfimrO9nd~uN7nyUdy~uuIN8A_whiteboards/photo/AF1QipOEjnJb8krEdgTgSv1tZ6y_QoxgvmReTbD0hy6F)” around it; and most of the volume of a high-dimensional orange is in the skin, not the pulp. If a constant number of examples is distributed uniformly in a high-dimensional hypercube, beyond some dimensionality most examples are closer to a face of the hypercube than to their nearest neighbor. And if we approximate a hypersphere by inscribing it in a hypercube, in high dimensions almost all the volume of the hypercube is outside the hypersphere.<br> Fortunately, there is an effect that partly counteracts the curse, which might be called the“blessing of non-uniformity.” In most applications examples are not spread uniformly throughout the instance space, but are concentrated on or near a lower-dimensional manifold. One reason that has been mentioned as to why Euclidean distance does not work well in higher dimensions is that due to the squared terms it is particularly sensitive to *noise*. It is worth keeping in mind the reality that in a real world dataset the [intrinsic dimension](https://en.wikipedia.org/wiki/Intrinsic_dimension) is likely far smaller than the extrinsic dimension. We must also remember the [*combinatoric problem*](https://en.wikipedia.org/wiki/Curse_of_dimensionality#Combinatorics) that arises as well as we increase the dimensionality. The distance between the center of our space, $(0, 0, \dots, 0)$, and a corner, $(1,1,\dots,1)$ [increases without bound](https://en.wikipedia.org/wiki/Curse_of_dimensionality#Distance_functions) as we add dimensions. It is due to this fact that in higher dimensions nearly all of the space is "far away" from the center. Put another way, the dimensional unit hypercube can be said to consist almost entirely of the "corners" of the hypercube, with almost no "middle". So we can state that the fundamental reason that euclidean distance becomes far less than optimal is: > As we increase our dimensionality, the *volume* becomes more concentrated around a *shell* close to the exterior of our space. Points on the shell, by nature, are *far apart*. If most of the volume, and subsequently most of our points, reside in this outer shell, then they are likely very far away from each other when calculating distance via the euclidean distance metric.<br> The most important thing to keep in mind is that Euclidean Distance loses it's usefulness due to the *distribution of volume* in high dimensional space. The fundamental geometry of the space is what makes things far apart. THOUGHTS LEAVING OFF: From [this post](https://www.quora.com/Why-cant-we-use-Euclidean-distance-as-a-measure-when-dealing-with-high-dimensional-data): > In some ways, high dimensional data is intractable to all measures of distance; every data point is an outlier. This is because the volume of a p-dimensional unit hypercube increases a lot faster than the volume of a p-dimensional unit hypersphere, as p increases. * A great experiment to run would be to compare the L2 norm (Euclidean Distance) with the L1 norm (Manhattan Distance). References: * [Why is Euclidean distance not a good metric in high dimensions? ](https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions) * [A Few Useful Things to Know About Machine Learning](https://sites.astro.caltech.edu/~george/ay122/cacm12.pdf) * [Euclidean distance is usually not good for sparse data](https://stats.stackexchange.com/questions/29627/euclidean-distance-is-usually-not-good-for-sparse-data-and-more-general-case) * [How it relates to Nearest Neighbors](https://members.loria.fr/MOBerger/Enseignement/Master2/Exposes/beyer.pdf) * [University lecture overview](https://www.youtube.com/watch?v=dUEUQe0UKdE) * [3 blue 1 brown 10 dimensions](https://www.youtube.com/watch?v=zwAD6dRSVyI) * [Why can't we use euclidean distance in higher dimensions](https://www.quora.com/Why-cant-we-use-Euclidean-distance-as-a-measure-when-dealing-with-high-dimensional-data) * [On the Surprising Behavior of Distance Metrics in High Dimensional Space](https://bib.dbvis.de/uploadedFiles/155.pdf) * White board photos * [General Distance as real estate](https://photos.google.com/search/whiteboards/photo/AF1QipMTt1X622nxNe3kkdXeqaVqcFgyahzcv4gOFRHC) * [General Distance as real estate](https://photos.google.com/search/whiteboards/photo/AF1QipNJgGQqHwFXobSiUVi38otjIMB1BJu2kiCwH3ml) * [Distance in 2d](https://photos.google.com/search/whiteboards/photo/AF1QipNFAug7X1RwuIxLEmHstY7ti2UTKso7fcs8L8-1) * [Distance in 3d](https://photos.google.com/search/whiteboards/photo/AF1QipPs1meim8mg0PZUL7U7RWD8dkASb5w5r5ZBD7J5) * [ESL](https://photos.google.com/search/_eAF1QipOQyc3orxK6jfimrO9nd~uN7nyUdy~uuIN8A_whiteboards/photo/AF1QipPxTzhpcYudipofWj5CYvbzuh3Bv0qoSurUGHS5) #### Volume of an N-ball We see in *An Elementary Introduction to Modern Convex Geometry* that one way comparing a sphere to other shapes is to compare the difference in a hypersphere that is *inscribed* by that shape and the hypersphere that *circumscribes* that shape. Will want to show the computational perspective of what happens when you "sweep" a sphere into a new dimensional, scaling accordingly. We are going to want to show how shapes become worse approximations of each other: * hypercube * [Volume-of-Simplex](Volume-of-Simplex.md) * [Volume-of-Cone](Volume-of-Cone.md) * [hyperoctahedron](https://photos.google.com/search/_eAF1QipOQyc3orxK6jfimrO9nd~uN7nyUdy~uuIN8A_whiteboards/photo/AF1QipNcm1895xy3WDE5IIqm62s2jRmZ6J-NNk7AOmN2) * [Volume-of-Hypersphere](Volume-of-Hypersphere.md) #### Sphere Packing See [Sphere-Packing](Sphere-Packing.md). #### How this relates to distance in higher dimensions? #### High dimensional gaussian has mass in a shell References: * [Twitter thread](https://twitter.com/andrewm_webb/status/1254897947390742530?lang=en) #### Measure We can prove that the volume of the hypersphere decreasing so fast compared to that of the hypercube is *not* related to the *geometry of the hypersphere*. This is *incredibly important*. We can do this via *circumscribing* a hypercube inside of our unit hypersphere. This looks like: ![](cube_sphere_cube%201.gif) The reason it is so important to consider this is that it means that that problem isn't necessarily concerned with the *geometry* of a hypersphere. Rather, it has to do with *measure*. References: * [Reasoning in higher dimensions](https://ontopo.wordpress.com/2009/03/03/reasoning-in-higher-dimensions-hyperspheres/) * [Reasoning in higher dimensions - measure](https://ontopo.wordpress.com/2009/03/10/reasoning-in-higher-dimensions-measure/) * [gif visual](https://mathani.tumblr.com/post/128187087775/sphere-is-circumscribed-by-a-cube-and-has-a) Whiteboard photos: * [Measure](https://photos.google.com/search/whiteboards/photo/AF1QipP-5Wb2azhZUS_SQC0nxaGYWW2NRpkJFaWDg7XD) #### Manifold #### What should we do? * Dimensionality Reduction * [Prediction Quality vs Amount of Information in Horse Racing](Prediction%20Quality%20vs%20Amount%20of%20Information%20in%20Horse%20Racing.md) * Consider the above study. We want to pick out the *key* variables that are appropriate based on the underlying reality of our given situation at hand. A machine learning model may be able to handle more variables than the human brain can, but not an incredible amount more. Imagine trying to predict the location of a planet based on thousands of observation points. Now imagine trying to do it via the model of gravity given via Newton. If we can surmise a model based on the key variables related to the situation at hand, we have drastically reduced the complexity we are dealing with. We have increased our signal and decreased our noise. * Imagine problem solving to be composed of trying to get a 360 view of the problem. Bringing in data is a way to get additional vantage points and reduce bias. But, it is most effective if we have a preexisting model of the world in mind and use this new data to *rule it out* if it does not conform. * There are many ways in which we perform dimensionality reduction * focus on the KPI and use distance metrics (ks score) that determine how far two points are from each other wrt the kpi * PCA, SVD, UMAP, etc * Reason from first principles * Identify the variables at hand that you *know* have an effect on the final outcome. * Causal Model? #### Recursive process * Have built out a process that calculates the volume of a hypersphere via recursive sweeping in cartesian coordinates, shown [here](Volume-of-Hypersphere.md#Recursive%20process). ### Special Functions #### TODO Sections to flesh out: * High dimensions and how they relate to probability and statistics (High dimensional stats, chapter 1) --- Date: 20211006 Links to: [Blog Post MOC](Blog%20Post%20MOC.md) Tags: References: * []()