State of AI 2025 - Nate's Notes

# State of AI 2025 ## Reasoning * The model is a reflection of the user ([Is o1-preview reasoning? - YouTube](https://www.youtube.com/watch?v=nO6sDk6vO0g)). If you play smart chess moves, the model will play like a smart chess player. If you play dumb chess moves, the model will play like a dumb chess player. * We don't know how to train algorithms that know how to use an expandable, potentially infinite amount of memory * Reasoning * Reasoning is a process that applies [Logic](Logic.md). The most general process that applies logic is an effective [Computation](Computation.md). It is the computations that [Turing Machines](Turing%20Machine.md) can perform. Thus reasoning is an effective computation in pursuit of a goal or inference of [Knowledge](Knowledge.md). * Neural networks can perform a subclass of these effective computations * A drink vending machine could also be regarded as satisfying this criterion. But is that reasoning? * Reasoning is *knowledge acquisition*. The new OpenAI models don't reason, they simply memorise reasoning trajectories gifted from humans. Now is the best time to spot this, as over time it will become more indistinguishable as the gaps shrink. For example, a clever human might know that a particular mathematical problem requires the use of symmetry to solve. The OpenAI model might not yet know, because it's not seen it before in that situation. When a human hints the model and tells it the answer, its CoT model will be updated, and next time in a similar situation it will "know" what strategy to take. This will rinse and repeat as they sponge reasoning data from users until many of the "holes in the swiss cheese" are filled up. But at the end of the day - this isn't reasoning. It's still cool though. [x.com](https://x.com/MLStreetTalk/status/1834609042230009869) * All observations and thinking are theory laden. They require *context*. They require that you understand and have an explanation of what matters. It may require you understand your problem. When you prompt an LLM, *you* are providing this information. The LLM has no concept of it and it *understands* nothing. It requires you to ground it. And that is often the hardest thing to do. * Reasoning is: this is what I know about the world. I need to perform a computation and in order to do so I need to basically make a new model to understand some effective model to do the thing I need to do. I compose those models together, I create a new model, then I perform an effective computation. For instance, say we have A, B, C. In order to do C, we must know about A and B. Otherwise, we cannot do C. * So the drink machine is not performing reasoning. * A dictionary is not reasoning. If I look up my problem in a massive dictionary and it has the answer, that is not reasoning. It is not performing the computation that lead from the problem to the result * The way that o1 was trained is incredibly important to keep in mind ([Is o1-preview reasoning? - YouTube](https://youtu.be/nO6sDk6vO0g?t=1145)). They ran many passes in creative mode (temperature 1). They then selected those that had correct answer. This lead to many trajectories of reasoning. Many of which were nonsense and lead to correct answer. They selected those that lead to the right answers and then used those to retrain the model so it was more likely to give patterns of "reasoning" that gave right answers. So it basically is building up a dictionary of specific "rationales" (programs) - a massive database of rationales. And then when you give it a new problem, it does a context sensitive hashing and matches the rationale that is closest to it. And then it fills in the blanks. This is not reasoning. We are talking about the application of a set of first principles to a problem in a process of applying logic to derive an answer. * In the long term this is not the path to AGI * Because explanationless prediction is impossible, the methodology of excluding explanation from a science is just a way of holding ones explanations immune from criticism * The reason why language models work so well for generating code is that it is a didactic exchange of knowledge discovery * The reason why having a tight supervision is important is because the model will diverge. So having a reasoning system that goes many steps (and goes "what about this? Or this? Or this?") is not that useful! It will then be doing things that you didn't even want it to do (which you are paying for) in the first place * When we think about AI in reasoning, the current SOTA is Chain of Thought. Now, in any scenario where it is very easy to check if you have the right solution, you can easily generate random solutions and then if you can check that it is the right one, eventually you'll find a right one. But that is *not* what we are talking about when we are talking about reasoning. * Now people say that humans don't reason either! Once you have a good set of cached "rationales" you aren't reasoning at that point! You are just reusing these rationales and applying a quick computation * My thought is that this is related to the idea of proof in mathematics. You can perform a proof, a computation, without any understanding of the problem. Just by blindly following the rules. But, in order to *come up with the proof* that always requires an understanding of the problem * Duggar would say that "if something is following a deterministic process, it can still be reasoning" * But, I would respond - can an AI *create a new reasoning trajectory on it's own*? * Chollet would say the *efficiency* of the reasoning task is [Intelligence](Intelligence.md) * Duggar may say that there is a shallow depth of reasoning. I would say that another thing that prevents these models from reasoning is that they don't have any concept of if there is not proper reasoning motif! * In a sense, we will only achieve AGI if these systems can lead to true genuine progress. But, I would argue the only way that we can achieve that is via systems that understand the world - and particularly understand if they do not have any proper current mode of reasoning! Even *if* we could use current LLMs to fully replace all humans *today*, we would be stuck in a *static* society because they have no way to make progress. They don't have their own problems * Part of *my* issue with current AI is that they keep trying to *confirm* that they have a system that is intelligent * first and foremost, a challenge here is that they don't have a theory (it is a prediction based approach), so what would we even be confirming? * Second, you don't *confirm* theories, you *dis*confirm them! So why would we not be striving to constantly *move the goal posts*? That is the nature of science. We find flaws in our theories, then find some other class of phenomena that we must account for, then move forward. Why does AI not want to fall into that class? Why does it want some all encompassing metric? I wonder - is this at all related to the fact that it is an explanationless field? And because of that they feel that their main way to achieve their goal is to cross some benchmarks finish line and then scream "we made it - no take backsies" * Epistemic foraging and memetically sharing it * There is a convex hull around practical reasoning. AI will be able to solve many novel practical problems that are OOD. You could say that it is reasoning. The million dollar question is how far does that convex hull get you? * imagine that the boundary contains everything we care about! But what about the inner part of the ball? Will it basically be like swiss cheese? * That is the disadvantage to reasoning that is very spatially (in compute sense) *wide*. It has very very many templates, but they are all very stupid and shallow. It then tries to find the right one and apply it. * Compare that to a small set of first principles templates. This is far more parsimonious. * Science finds parsimonious principles that when applied in sequence reason out an answer * Moravecs paradox - easy problems are hard, hard problems are easy * Because failure _reasons_ matter. For example, a human can get tired, bored, sleepy, distracted, etc which has nothing whatsoever to do with intelligence and AGI. the _reason_ LLMs fail to multiply large numbers is because they are using an insufficient algorithm class, not because they get bored. * [Did OpenAI Just Solve Abstract Reasoning?](https://aiguide.substack.com/p/did-openai-just-solve-abstract-reasoning) * [I think people are overindexing on the o3 ARC-AGI results. There’s a long history in AI of people holding up a benchmark as requiring superintelligence, the benchmark being beaten, and people being underwhelmed with the model that beat it.](https://x.com/polynoamial/status/1872383436880859547?s=46) * --- Date: 20250104 Links to: Tags: References: * []()