# Doing Good Science See [Science at Startups](Science%20at%20Startups.md). Hypotheses should be *explicit*. Measurement is a critical tool - it allows us to compress incredibly high dimensional spaces down to something that humans can understand and reason about. This is *hard* and needs to be done with great care. This often requires creativity. ### Generate Falsifiable Hypothesis Jonathan Frankle, who came up with the [Lottery Ticket Hypothesis](Lottery%20Ticket%20Hypothesis.md), remarks that he doesn't actually think he was doing the best science when coming up with that hypothesis. He specifically says: > It was a hypothesis that their exists these smaller networks that can be trained. The biggest flaw is that it is not a **falsifiable hypothesis**. In order to falsify it we would have to look at every single network and show that they have this property, so in practice it is not falsifiable. > I then went and tried to find **existential evidence** (there exists at least one case) to support this hypothesis. It is really an existential statement, not a **universal statement** (something is true no matter what), so the way that you justify it is with more existential evidence. But it is *non falsifiable*. I cannot show that a network lacks these things practically, and I cannot show that all networks contain them. So this does not follow the best practices that popper describes. ### Asking Two Sided Questions > The other advice I'd give is to *ask two sided questions*. That is no matter what result you learn something about the world. > > For instance, consider neural network training and the optima they end up in. We can ask the question: "At what point in training is the optima determined?" Now, this is a cool question because one hypothesis is that maybe it is determined from initialization - that would be interesting. Maybe it is undetermined until the very last step of training...but that last step of training probably isn't going to bring you to a different optima. So the answer probably has to be somewhere in the middle, and I *love* that. > > At this point it is a **measurement question**. There is a phenomena I want to *measure*. No matter what I find, I am going to gain knowledge and learn something. Those are my favorite kinds of questions to ask. > > Most questions can be refactored into a two sided question. The original lottery ticket paper was not a two sided question - I got lucky that it worked. It could have been that I couldn't find these things, maybe they did exist, maybe they didn't. Maybe I had a method to find them maybe I didn't, and I could have just gone home with nothing. > > At this point I don't even think I would have asked the question I posed in the original paper. If I were to ask this question I would want to ask it very cheaply and very quickly. There are lots of questions like this - "wouldn't it be cool if $x$ worked?". They are great questions to ask. But I see lots of PhD students getting caught up in this and then trying to answer each question and taking 2 months for each. And then you end up with nothing. So if you are going to ask those kinds of questions, it's about efficiency. It is about how quickly you can possibly get through all of these experiments. ### What are you Measuring? [Good Science is about Measurement](Good%20Science%20is%20about%20Measurement.md) > I like the shift in measurement - how do we measure SGD noise? What is the quantity/thing that we are interested in in the first place. Instead of things like magnitudes of gradients which are a value that could a be proxy of something, you say why not just jump to the thing that we are interested in in the first place. That may take a little bit more time, the actual outcome of the optimization at the end of the day. > > [Ask the questions](https://open.substack.com/pub/thegradientpub/p/jonathan-frankle-lottery-tickets-llms-policy?utm_campaign=post&utm_medium=web&timestamp=1730.0): > * **What are we measuring?** > * **How are we going to measure it?** > > A great example would be as follows. Take a network an try training it multiple times. Fix a network (could be a the beginning of training or sometime during training), make a bunch of copies of it and train those copies on different random seeds. Where do these networks then land? Do they land in the same convex region or different convex regions? The way that you figure that out is you take two copies of the network and look at the loss landscape between them. If its flat then they are in the same convex region. If there is a spike then their is a barrier between them. We can think of this as effectively 1 dimensional slices of the loss land scape. The [loss landscape](Loss%20Function.md) is this ultra high dimensional thing and I don't think any attempt to visualize it is very productive. You are just taking this high dimensional thing and trying to compress it down to something much smaller. I don't know if that really works. You are losing a lot of information. What I like here is that you are taking these 1 dimensional slices (which are high fidelity), so you are looking at the loss landscape, you are just looking at these very tiny, narrow slices of a very high dimensional space. But they happen to be very important slices because they relate two different networks to each other. You then need to ask: why is this a good metric? Why does this matter? The answer is that it seems to have descriptive power that allow us to distinguish networks that have different properties in other ways. The way that you measure things is still artisinal (it still requires *creativity*). If you want to really dig into my papers and rip them apart then look at how I measure things and tell me that is a dumb way of measuring things. ### Misc Notes * Don't worry about efficiency, just worry about getting a result * Anchor any choices that you make to something that happened in prior work. Otherwise, it may be that in order for you new idea to work it was dependent on other changes that were made (no matter how subtle). ### State Your Assumptions You must clearly state your assumptions and be able to recollect them at any time. If you cannot you may have forgotten them. If you've forgotten them you likely won't know if you are breaking any of them. An entire chain of reasoning can be invalidated due to incorrect assumptions. ### Reduce simplest thing that contains the essential difficulty This is articulated well in [Simplify the Problem](Simplify%20the%20Problem.md). This is related to [Understand Systems via What They Are Not Doing](Understand%20Systems%20via%20What%20They%20Are%20Not%20Doing.md). Always compare against [Simple Baselines](Simple%20Baselines.md). ### Science is really about... [Explanationless Prediction Is Impossible](Explanationless%20Prediction%20Is%20Impossible.md). [Scientific Measurements Use Chains of Proxies](Scientific%20Measurements%20Use%20Chains%20of%20Proxies.md). You need to have *an explanation* of the phenomena at hand, otherwise your prediction is contentless. --- Date: 20231103 Links to: [Good Science is about Measurement](Good%20Science%20is%20about%20Measurement.md) Tags: References: * [Jonathan Frankle: From Lottery Tickets to LLMs](https://thegradientpub.substack.com/p/jonathan-frankle-lottery-tickets-llms-policy#details) (best conversation starts [here](https://open.substack.com/pub/thegradientpub/p/jonathan-frankle-lottery-tickets-llms-policy?utm_campaign=post&utm_medium=web&timestamp=1730.0)) * [1.2 Quantifiers](https://www.whitman.edu/mathematics/higher_math_online/section01.02.html#:~:text=The%20phrase%20%22for%20every%20x,is%20denoted%20by%20%E2%88%83x.) * Forecast evaluation for data scientists: common pitfalls and best practices (paper on time series forecasting)