Lean Startup Methodology

# Learn Startup Methodology The general Lean Startup method addresses market fit challenges via validated learning: quick iterations on a series of minimalistic experiments designed to test a hypothesis about the market. These iterations proceed along the **build-measure-learn** cycle. You start with the _value hypothesis_, figure out metrics to measure it, and design an experiment to test the it in the market. Then you _build_ what’s needed to run the experiment, _measure_ the results, and _learn_ by adjusting or refining the value hypothesis. ### Challenges with data products In a data product — a product that relies on large amounts of data or involve machine learning and AI — you run into the challenge of product/data fit. It arises from the uncertainty about the ability of data to satisfy product needs: - Machine learning models make predictions that aren’t always correct. You don’t know when they’re right and when they’re wrong. - You can’t guarantee in advance the level of performance of a model. Let’s say you can afford up to 10% wrong predictions. You may not be able to build a model that’s 90% accurate. - It’s entirely possible that you don’t have enough data or that there’s not enough signal, but you don’t know that in advance. - Even if the model performs well on training data, its performance in production may vary significantly. So when building a data product, you are tackling at least two sources of uncertainty _simultaneously_: the market and the data. Let’s understand how product/data fit impacts the process of finding product/market fit. In data products you also have a product/data fit problem, and that adds challenges and risks to every step in the build-measure-learn cycle: - The _build_ step is _contingent_ - The _measure_ step is _confounded_ - And the biggest risk: the _learn_ step can be _complacent_ #### Contingent Build As we just saw, data creates a fit problem. This means that you may not be able to build the model that you need (not accurate enough, not enough signal, etc.) The _build_ step is non-deterministic, or _contingent_. This isn’t an issue in traditional software products. Building great software is not easy. It takes hard work. But generally speaking, it’s a deterministic process: engineers know what they can and cannot build. By contrast, data science is non-deterministic: you have little control over the end result. This is one of the reasons for the distinction between software _engineering_ and data _science_. #### Confounded Measure This situation leads to _confounded measure_. You go and measure the results of the experiment. They are not what you were hoping for. Is it because the models are not accurate enough, something that might improve with more data and better models? Or is it because the value hypothesis actually doesn’t pan out in the market? The measurement is confounded, and you can’t untangle data uncertainty from market uncertainty. Can’t you just A/B test that? No. You can A/B test two models against each other, or a model against no model. But you can’t A/B test the model that you have against that model that you want, but don’t have. You can’t measure a hypothetical model. #### Complacent Learning So how do you go from the measure step to the learn step? The biggest risk here is _complacent learning_: instead of recognizing the confounded measurement, you attribute all the uncertainty to the data. Then, you decide that the value hypothesis holds and that the models will improve as you collect more data. But in reality, instead of _learning_ from the market, you are hinging your market hopes on data dreams. Blindly putting your faith the Gods of AI is is one of the biggest risks in data product development. --- Date: 20211230 Links to: [Running Lean](Running%20Lean.md) Tags: References: * [The Challenge of Product/Data Fit](https://hackernoon.com/the-challenge-of-product-data-fit-92543078551b)