Even with a perfect stack of tools, teams still struggle to deliver ML products. So if tools aren’t the only piece of the puzzle, what does that leave? In my last post, Another tool won’t fix your MLOps problems, I argued the remaining pieces are:
- Culture
- Process
Let’s dive into the MLOps process — in particular, what I think most teams get wrong. What does a successful MLOps process look like, and how can individual ML practitioners help to build that process?
TL;DR
- Start with a product, not a model
- Survey the data in production, not in your warehouse
- Start simple — with data and models
- Partner with engineers
Start with an ML product
Maybe the most important practice that enables successful ML projects is to design a product, not a model. One of the biggest pitfalls I’ve seen across dozens of companies is to hand “projects” to data teams, instead of involving them in the product design phase.
To build a successful ML product, three stakeholders need to be involved in designing the product:
- PM / Business Stakeholder: What does success look like?
- ML Person: What is (likely) possible with ML?
- Product Engineer: What is feasible, what are the constraints?
Most important: these three personas need to be highly aligned throughout the development of an ML product. When ML projects fails, it’s typically due to an alignment problem!
Some examples of poorly aligned teams:
- ML person optimizes for model accuracy (instead of business outcomes!)
- Projects started that may not be feasible to solve with ML
- The model doesn’t meet performance constraints in production
- The features are challenging or impossible to compute in production
Some examples of good alignment:
- ML person understands the tradeoffs of accuracy vs. time to market
- Monitoring built on day 0 to ensure business outcomes are consistently measured
- Engineer helps ML person understand the production data landscape
- Model SLAs are clearly defined and measured
Survey the data in production
This is an example of “good alignment” in the last section, but it deserves its own section. Almost all ML builders I’ve seen start their ML project with a survey of the available data. The problem? They typically survey the data that is available for training, not the data that will be available in production.
Shouldn’t all of the data available for training be available in a production system?
Most of the time the answer is yes, but with a bunch of asterisks. How quickly is that data available? How fresh is that data? How much preprocessing needs to be done on the prod data to make it consumable? Who owns that data?
So many ML projects stall out because of issues with production data. I’ve seen over and over again that there is a huge disconnect between the ML person and product engineer. An example of two innocuous features with dramatically different requirements:
- A user’s home zip code: probably dirt simple to use in production. Query a database.
- A user’s average location in the last five minutes: probably a PITA! Streaming? How fresh does it need to be? Streaming aggregations are hard!
Spend extra time understanding what data is available in production, and what constraints apply to that data. It’ll save you months delivering a functional solution.
Start simple
This is probably the most common ML advice, but it’s good advice. Start with a simple solution.
My contribution — most folks will tell you to start with a simple model, but it’s equally important to start with simple data! To play off of the example above:
- A user’s average location in the last five minutes: hard
- A user’s most recent location: probably much easier!
You may discover when building your model that a model built with “a user’s average location in the last five minutes” produces a more accurate model, but you may be able to get a model built with “a user’s most recent location” into production two weeks earlier.
You probably take that trade every time. You can always build a V2 with the fancier feature, and it will be a lot easier to build incremental improvements than to ship something complicated the first time.
Partner with engineers
Odds are, if you’re a data scientist, not all of the questions posed above are obvious to answer. When I was last hands-on building ML models, I had no idea what streaming data was (let alone how to think about it).
The solution is to become friends with engineers and work with them throughout the development of an ML model. Building any software project cannot be done in a silo, and ML in a silo is even worse. Engineers can help with “what data is available,” “what constraints should I know about,” “what SLAs are feasible,” and more.
Work with an engineer early and often. You’ll build projects way faster.
Conclusion
These steps aren’t a comprehensive view of an MLOps process — there are a lot of moving pieces that lead to success (code reviews, CI/CD, monitoring, …). This is a starting point. As I mentioned above, most of the ML failures I’ve seen are alignment issues. These process guidelines are primarily meant to help you align your team for success.
You need a strong foundation to build an exceptional MLOps practice.
David Hershey is an investor at Unusual Ventures, where he invests in machine learning and data infrastructure. David started his career at Ford Motor Company, where he started their ML infrastructure team. Recently, he worked at Tecton and Determined AI, helping MLOps teams adopt those technologies. If you’re building a data or ML infrastructure company, reach out to David on LinkedIn.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.