Q&A: Expert tips for running machine learning in production

Building a machine learning mannequin in an instructional analysis context is already difficult. But the messiness and fluidity of real-world knowledge and enterprise goals provides a layer of complexity that is not all the time addressed in knowledge science and ML applications.

In Designing Machine Learning Systems, printed by O’Reilly Media, creator Chip Huyen presents an accessible but complete overview of what goes into designing versatile, dependable ML methods. Huyen, a Stanford-trained laptop scientist and founding father of ML platform startup Claypot AI, shares recommendation for constructing, deploying and sustaining ML methods that may adapt to the altering nature of production environments.

Click the guide cowl
picture to be taught extra.

In this Q&A with TechTarget Editorial, Huyen discusses her writing course of, frequent challenges when running ML in production and the way the generative AI increase has affected her excited about the guide. For a style of Designing Machine Learning Systems, learn an excerpt from Chapter 1, “Overview of Machine Learning Systems,” the place Huyen explains the nuances that differentiate ML from conventional software program.
Editor’s be aware: This Q&A has been edited for readability and conciseness.
What motivated you to write down Designing Machine Learning Systems?
Chip Huyen: I do not suppose it was a aware course of. I did not got down to write a guide — it was extra of an evolution.
Maybe as a result of I come from a writing background, I actually love taking notes. And so throughout my first job after faculty at Nvidia, I began making a ton of notes. I used to be speaking to lots of people about all of the issues that I noticed, our clients and what challenges they confronted. I truly first printed an open supply be aware on GitHub, like 8,000 phrases on the very fundamentals of the frameworks that I noticed for deploy ML fashions, and it in some way acquired fairly standard.
Then, I went again to Stanford and began instructing a course on [ML systems design]. I wished to make it possible for my college students understood what I mentioned, so I made a whole lot of lecture notes for them as nicely. Through the 2 iterations of the course, I acquired a whole lot of pupil suggestions, a whole lot of reviewers. My professors gave suggestions, different business people who I do know gave suggestions. And ultimately I mentioned, ‘Oh, wow, I truly now have a reasonably complete set of notes on this subject. Why not flip it right into a guide?’ So it was extra of a pure four-year course of till I acquired to the guide.
One key theme of your guide is the significance of specializing in ML methods as an entire, not simply fashions — you write, ‘The algorithm is simply a small half of the particular system in production.’ Could you increase on why that holistic view is so necessary?
Huyen: It’s totally different for individuals who need to use a mannequin in production versus individuals who need to make core algorithmic or analysis progress. Quite a lot of programs I took in college, I believe they’re actually nice programs, however I do suppose there is a very sharp deal with the algorithm half — the modeling half. It was very helpful to study that. But then I noticed, once we began serving to firms deploy these fashions, this isn’t sufficient.
One factor, for instance, is organizational construction. So to deploy a mannequin, you create a mannequin and hand it to a different workforce. And that workforce has no thought what to do, proper? They deal with it the identical means they deal with conventional software program, whereas there’s a whole lot of variations between ML fashions and conventional software program. And due to that, I believe we have to put extra guardrails in the method to make it possible for ML fashions can carry out nicely in production.
What are a few of the challenges that come up when operationalizing and sustaining ML methods in production over the long run versus, as you mentioned, working in an instructional context?
Huyen: One factor is the query of mannequin efficiency. For instance, in college, while you’re doing analysis, what you care about is this type of leaderboard-style competitors. But there is a very large hole between a mannequin that may do nicely on the leaderboard-style competitions versus a mannequin that does nicely in production.
For one factor, while you do a standardized activity, a whole lot of issues are nicely understood. Data is just about cleaned. You know precisely what it is like. It does not change over time. You can in all probability even discover some script any person wrote for you to take care of that. So the information half is just about executed for you, whereas the information is a big problem in production.
Another problem is latency. When you simply care about mannequin efficiency, accuracy or F1 rating, you do not care about how lengthy it is going to take to generate predictions, whereas in production, you do not need customers to attend.
Also, the metrics. Usually, you might have a concrete ground-truth label to check the mannequin predictions with in these leaderboard-style competitions or in analysis. But in production, the overwhelming majority of the time, we do not have all of those ground-truth labels. So how do you monitor the mannequin in production?
And, after all, the world modifications. So many issues are occurring these days, issues change very quick, and any AI wants to have the ability to change with the occasions. How can we hold updating the mannequin successfully? All these questions that you do not get to reply in only a analysis surroundings or in class.
Are there frequent errors or misconceptions that you just see come up rather a lot when designing ML methods for production?
Huyen: One key factor is to determine a method to constantly consider your mannequin and let individuals examine a brand new iteration to the final iteration. And it’s extremely arduous. In a whole lot of companies, an ML initiative must be tied to a enterprise goal. For a whole lot of firms, the target is income, revenue — enterprise metrics. But if you happen to use solely enterprise metrics, it is going to make it unimaginable, as a result of an organization could have a thousand initiatives on the identical time.
Say I need to attempt to increase income. If the income goes up, you do not know if it is the [current] mannequin or another mannequin. Or if the mannequin’s F1 rating or accuracy goes up, you are undecided if that can result in the enterprise income going up.
We have groups saying, ‘We can predict 96% accuracy now versus 95.8% accuracy earlier than. We ought to completely deploy [the new model].’ And then while you discuss to the enterprise individuals, they’re like, ‘Actually, most customers don’t discover a distinction between 96% and 95.8%.’ So what’s the level of doing it, you recognize? It’s very arduous, discovering the appropriate tradeoff.
Or say you run an e-commerce web site, and also you need to enhance the acquisition fee. You have this speculation that if individuals see the factor that they like, they’ll purchase it extra typically, so that you need the recommender system to advocate objects that customers would possibly like. The assumption right here is that the extra customers click on on [a product], the extra individuals will purchase it. And so you employ this metric of click-through fee.
Now, think about that the mannequin is doing very nicely by means of that lens of click-through fee. But what if the enterprise metric nonetheless does not go up? There’s an opportunity that the ML mannequin metrics and the enterprise metrics do not line up. Then what do you do about that?
Another is establishing knowledge infrastructure. I believe knowledge infrastructure is glossed over in a whole lot of lessons. Learning function, construct or preserve knowledge methods takes a whole lot of time. Maybe it does not match into 10 weeks or 16 weeks of coursework. You additionally need college students to take pleasure in it, and a whole lot of the time, knowledge infrastructure is simply not pleasant. It’s not enjoyable, proper? It’s painful.
So I see it being glossed over rather a lot, nevertheless it’s an enormous factor. AI and ML immediately rely on knowledge. It does not matter how fancy any person’s prototype running in a Jupyter pocket book goes to be — if it can not entry the information quick sufficient or in the appropriate means, it is not going to be helpful in production. AI technique has to start out from knowledge technique.
It’s humorous, individuals hold asking me, ‘So the guide was printed earlier than ChatGPT got here out. Did generative AI change rather a lot? Change every part?’ And, truly, the basics keep the identical with generative AI. Even although ChatGPT is new, it is constructed on present know-how. It’s not one thing that simply got here out of the blue, you recognize? Language fashions have been round because the Fifties. Embeddings have been round because the early 2000s. Vector databases have been round for a couple of years. So a whole lot of this has remained just about the identical.
Yeah, it looks like a whole lot of what’s modified has been the size or the extent of consideration, fairly than the underlying applied sciences. How have you ever seen the generative AI increase affecting the business and ML roles?
Huyen: Actually, I really feel like generative AI made a whole lot of the main focus factors of the guide extra related. Generative AI turned fashions into an API. You do not actually construct fashions from scratch anymore. It’s good to know algorithms, for certain — I do encourage the individuals in doing analysis to undoubtedly be taught these abilities — however lots of people simply use these fashions as an API. They do not care about the way it’s being run below the hood.
So I do suppose that it truly highlighted AI production, as a result of the remainder of it’s about the remainder of the system. Like knowledge infrastructure, evaluating AI for enterprise use instances, specializing in consumer expertise, organizational construction, constructing platforms so that individuals can automate a whole lot of their work and evaluations. Quite a lot of that is truly extra necessary now that the mannequin half is being commercialized away [through APIs].
Another factor is, for a very long time, we mentioned that ML ought to be extra interactive. In the previous, lots of people adopted this type of batch prediction [approach]. Say, for Uber Eats or DoorDash, while you log in to the app, you see a whole lot of suggestions for what eating places you would possibly take pleasure in. In the previous, a whole lot of this had been pre-computed for you daily. So if this firm has 100 million customers, they’re producing suggestions for 100 million customers.
But then these predictions can get stale. Maybe yesterday I loved Italian, however now immediately I need Vietnamese meals. And it prices cash. Of 100 million customers, not all of them are going to log in daily. If solely 2% of them are going to log in every day, and also you generate [predictions] for 100 million customers, 98 million predictions are going to be wasted.

Generative AI truly highlighted a whole lot of challenges with ML in production.

Chip HuyenCreator, ‘Designing Machine Learning Systems’

And then, after all, generative AI got here out. With ChatGPT, every part is mainly real-time prediction, proper? You ship in a request, you get again a prediction that has the impression of being generated on the fly. And now individuals simply take it for granted. We needn’t persuade individuals of that anymore.
Or one other side is mannequin drift. Things change over time. So generally you ask ChatGPT, ‘Hey, inform me about this or that,’ and it is like, ‘Oh, as of my data cutoff of September 2021, I can not do this.’ How can we hold fashions updated with the altering world? That is why it is necessary to observe and regularly replace the mannequin over time.
So I believe generative AI truly highlighted a whole lot of challenges with ML in production.
It seems like a few of the issues that you just’re drawing on the market are broader tendencies that we’re seeing throughout software program and IT — the general pattern towards abstracting away a few of that underlying infrastructure, the shift towards platforms. It’s fascinating to see, in some methods, ML and MLOps turning into far more much like conventional software program and DevOps.
Huyen: Yeah, the brand new factor that generative AI introduced is that it is really easy to make use of. Lots of people historically haven’t been capable of construct functions as a result of they lack engineering data or ML data. Now they will. We see a whole lot of instruments popping out to assist these individuals construct very low-code [applications]. I believe it is thrilling. I really feel like now, individuals with much less AI data however extra area experience can deliver AI to the overwhelming majority of industries and plenty of, many use instances. It’s very thrilling.
At the identical time, we will not count on individuals to know every part, proper? People who’re actually good at picture modifying or film modifying won’t be the very best at engineering practices, like placing in guardrails and ensuring the mannequin performs, stays updated, and is dependable and sturdy. So I believe we should always see a whole lot of motion of individuals constructing instruments to make it very arduous for individuals with out engineering data to make errors in creating these functions.


Recommended For You