# Things to Think About Deeply
1. You have a new great view of matrices as linear transformation. How can you think about softmax and probability in that context? Think about them as FUNCTIONS not just tools
2. Hot take: the reason statistics make it easy to spin the truth/lie is that we aren’t good with ratios, percents and division. Think about doubling a percent increase.
1. https://www.youtube.com/watch?v=bVG2OQp6jEQ
2. https://www.youtube.com/watch?v=xHjQhliXUB0
3. https://www.youtube.com/watch?v=FUknTs9AzYA
3. The difference between AI Engineering R&D and Data Science is that there are a different set of constraints that you operate within on the AI Eng team.
4. "The essence of learning is simplification of the joint probability distributions, achieved by exploiting certain regularities" (Quantifying and Visualizing Attribute Interaction Paper)
5. How do you move between a topological view of neural networks and a linear transformation (my be worth looking at colah's blog)
6. Deep learning is really deep representation learning. See [here](https://youtu.be/lXrFX3vjtjQ?list=PL3pGy4HtqwD2kwldm81pszxZDJANK3uGV&t=977). This comes back to one your key ideas about *representation*. One of the biggest things that you have seen over time is that the problem *representation* that you select is critical in being able to solve the problem. In a sense this incorporates *language*, *information*, *algorithm*, *structure*, *constraints*, *exploitation* and more.
7. Frequently, when trying to solve problems you are thinking about:
1. What is the underlying reality of the situation you are dealing with?
2. Are there any constraints at play?
3. What data structure would be most effective to capture this reality? How much space does it take to store? How easy is it to traverse?
4. What algorithm/model/technique can most effectively interact with/scan/exploit this data structure (our data structure could be a BOP, R^n, some sort of custom data structure, fabric, etc)? What is its time complexity? Can the data structure be changed, altered, manipulated to allow an algorithm to operate more effectively?
5. What information can we "inject" into the system (either the algorithm or DS) to make it more efficient? Consider the information of polar coordinates and how they allow for certain integrals to be far easier to solve. Can we approximate anything as linear that would allow for easier storage or faster compute? Will that hurt our results?
6. Again, how can we encode any sort of information about the reality of the problem into our data structure/algorithm/model/etc.
7. Infinity is a *property* of always being able to add one more thing.
8. HOT TAKE: calculus *fundamentally* is a way to effectively change representation of the same, invariant thing. Moving from something curved, to a linear version we can deal with, and back again…
9. More on representation
10. Think of neural network as a graph or as a succession of linear transformations. Side note: how would this actually function in the brain?
11. Idea of multiple variables, orthogonal x and y, the xy plane, taking a partial derivative and treating other variables as constant. How does this all work? Why are 3 dimensions enough to model reality?
12. Why are x and y axis made to be orthogonal? So much of univariate calculus such as area under curve is entirely based on the orthogonality of the x and y dimensions. Why is that specific representation necessary? What do functions look like in 1d to 1d, see 3b1b! How would Integral (area under curve) be visualized with out xy plane? What about area under PDF?
14. Can we view derivatives as a change in density? How does that relate to probability? And calculus in general? [3b1b beyond calculus video (how to think about derivatives and areas without Cartesian coordinate system?)](https://www.youtube.com/watch?v=CfW845LNObM&list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr&index=12&t=402s)
15. Wilczek with lex Friedman, 1:30:00, what does it mean for things to exist
16. [Dual-Use-of-Information > Things to think about](Dual%20Use%20of%20Information.md#Things%20to%20think%20about)
17. Randomness is used in many interesting ways. Random walks, different CS algorithms (e.g. hashing and sketching). How does randomness function? What is the structure that it is exploiting?
18. Gamblers Fallacy vs Regression to the Mean: In coin toss example we know a string of heads does not make tails more likely. But what about regression to the mean? That seems to imply dependence?
### Docs
This page contains a list of thoughts and ideas that deserve deeper thought at some point. Ideally, you should include enough information for a given bullet that if you had a minute you could reload the ideas into memory and then think about them on a walk. This list can grow and should be periodically trimmed, thought about, moved to obsidian notes, turned into blog posts, etc.
---
Date: 20210921
Links to:
Tags:
References:
* []()