Building Machine Learning Powered Applications

# Building Machine Learning Powered Applications ## Main Themes This still needs to be organized and summarized. * Start by identifying the general (product level) problem and the corresponding product goal * Start simple (have a straw man baseline, then start with the simplest ML model) * At the start, *be the algorithm* to build intuition. * Always be looking for the *impact bottleneck* ## Chapter Summaries ### 0. Preface **The entire process of ML** To successfully serve an ML model to users, you need to: * **Translate** your product needs to an ML problem * **Gather** adequate data * **Iterate** efficiently in between models * **Validate** your results * **Deploy** them in a robust manner We can home in on these specific stages: 1. **Identifying the right ML approach** Set the initial success criteria and identify an adequate initial dataset and model choice. 1. **Building an initial prototype** Start building an end to end prototype before working on a model. This prototype should aim to tackle the product goal with no ML. 1. **Iterating on models** Alternate between error analysis and implementation. 1. **Deployment and monitoring** Once a model shows good performance, pick an adequate deployment option. Once deployed, models often fail in unexpected ways. Have multiple approaches to monitor and mitigate model errors. ### 1. Find the Correct ML Approach * Generally we think of the goal of ML being able to train a model that performs well in isolation. That is *not* the goal. The goal can be defined as: > Take a problem, estimate how best to solve it, build a plan to tackle it with ML, and confidently execute on said plan. This a skill that often has to be learned through experience, after multiple overly ambitious projects and missed deadlines. * Identify which parts of a product would benefit from ML and how to frame a learning goal in a way that minimizes the risk of users having a poor experience. * You never want to use ML when you can solve your problem with a manageable set of deterministic rules (manageable meaning that you could confidently write and they would not be too complex to maintain). Put another way, ML should be considered in cases where you are not able to find a *heuristic solution*. * Always start with the product goal, then decide how best to solve it. At this stage, be open to any approach, whether it requires ML or not. This can best be accomplished in two steps: 1. **Framing a product goal in an ML paradigm**: When we build a product we start by thinking about what service we want to deliver to users. An ML problem concerns itself with *learning a function from data*. We must bridge that gap. 2. **Evaluating ML Feasibility**: To efficiently build ML applications, it is important to consider multiple potential ML framings and start with the one(s) we judge simplest. * Data sets are iterative. Allow yourself to progressively iterate on the way you formulate the problem. * Some real world constraints to think about when identifying potential solutions: * Data - How will you acquire dataset? Keep track of labels? Keep data organized? * Model - How often does a model need to be trained? The more frequent, the more time and energy this requires * Latency - How long does it take for inference? * Ease of implementation - training complex end-to-end models is a very delicate and error-prone process, as they have many moving parts. Consider tradeoff between a model's potential performance and the complexity it adds to a pipeline. This complexity will slow us down when building a pipeline, but it also introduces a maintenance burden. * **Be the algorithm**. Being the algorithm is a great idea before you actually get to implementation. * There is no ML involved in this phase but it is crucial: it provides a baseline that is very quick to implement and will serve as a yardstick to measure model against. * Remember, you are trying to use the best tools to solve a problem, and only use ML if it makes sense. You should start by combining discussions on modeling *and* product. Among other things, this includes designing the product around handling ML failures gracefully * When deciding on what to focus on, find the *impact [bottleneck](Bottlenecks.md)*, the piece of your pipeline that could provide the most value if you improve on it. * Start with a simple model because a huge goal should be to *derisk* our model somehow. The best way to do this is to start with a strawman baseline to evaluate worst case performance. Assuming our model is not much better than this baseline, would our product still be valuable? * How do you identify the impact bottleneck? * A: You should start with *imagining that the impact bottleneck is solved*, and ask yourself whether it *was worth the effort you estimated it would take*. I encourage data scientists to compose a tweet and companies to write a press release before they even start on a project. That helps them avoid working on something just because they thought it was cool and puts the impact of the results into context based on the effort. *The ideal case is that you can pitch the results regardless of the outcome: if you do not get the best outcome, is this still impactful?* Have you learned something or validated some assumptions? A way to help with this is to build infrastructure to help lower the required effort for deployment. ### 3. Build First End-to-End Pipeline * Frequently, we will not have defined success clearly enough at this stage --- Date: 20211230 Links to: [Book Reviews MOC](Book%20Reviews%20MOC.md) Tags: References: * []()