Model, Data, Action! A Beginner’s guide to the wacky world of MLOps

12 min readFeb 16, 2023

Hello and welcome to the wondrous and wacky world of *Ops buzzword bingo. Featured this week we will be exploring the not-so-surprisingly practical pairing of DevOps and Machine Learning (ML), a la Machine Learning Operation (MLOps).

MLOps brings with it processes to manage the lifecycle of ML models from inception all the way to production deployment. It does this by using the tried and tested DevOps principles to enable the deployment of ML models in a fast, secure, and scalable manner.

Of course, deploying a run-of-the-mill application is not exactly the same as deploying an ML model. For the most part the tools are similar but there are key differences that separate the two, we’ll discuss these in more detail later on.

But before we dive in any further, let’s cover some fundamentals :)

What is an ML model?

Great question me! To understand what an ML model is we will first look at Mr Hulk. Mr Hulk sells houses for a living, he’s been doing this for the last 20 years. He is at the top of his field — just don’t get him angry, you wouldn’t like him once he’s angry. Mr Hulk has sold more houses than he can remember (and not because he has a terrible memory), he could tell you almost the exact price of any house in any neighborhood just by asking a few key facts about it, incredible! He must have a gift! Well, no…

What he’s done is rely on his years and years of experience selling houses. An immense amount of house pricing information collected, internalised, and recorded. Near a school? Price increases by x amount. Near a noisy highway? Price decreases by x amount. High bedroom-to-bathroom ratio… well, you get my point. Given certain features of the house he knows how to adjust the expected estimate.

Simple depiction of model training. Made by me.

This is pretty much how an ML model works, you feed it plenty of training data (features of a house and their known prices), and the more the merrier. And then, when you want an estimate, give it the features of the new house and it’ll spit out how much it thinks it costs. And this can work on all sorts of datasets. Interesting? Scary? Worrying? Hotel? Trivago… (sorry, bad British TV advertising-based joke I couldn’t help making).

But ask Mr Hulk about car prices… error 404!! Information not available. Turns out he isn't a genius after all. And the same happens for ML models, ask them to make a prediction on irrelevant data it hasn’t been trained on, and it’s essentially useless. You don’t know what you don't know. 🤷🏻‍♂️

In summary: A model is trained by providing it with data and using an ML Algorithm that allows it to reason over the data we give it and recognise these patterns. Once trained, we can use the model to help us make predictions about data it hasn’t seen before.

Note: ML Algorithms change based on the type of problem we are trying to solve, but don’t worry too much about that, we just need to know they’re there for now.

What is MLOps?

Now you’ve got your model all trained and ready to destroy the world (kidding! … or am I?) how do you get it from your shiny Macbook 💻 (I’m not sponsored or anything by Apple but I think it’s a safe bet to assume y’all have a Mac) to a production environment where others can use it and scale their destructions… I mean predictions*?

This is a big question and really depends on a number of variables, the team size, the model complexity etc. etc., but let’s think about what goes into a standard ML model deployment…

❓Who is in charge of the deployment? The underlying infrastructure? The networking? API access and serving of the model?
❓How do you make sure the predictions stay accurate, and if they lose their accuracy how will you know? And what actions will you take?
❓If you have to retrain where will the data come from? Who’s in charge of cleaning it? Can the processed data be made available to anyone else?
❓How will you control access to the data? What if there is Personally Identifiable Information (PII) in there?
❓How are you going to manage dependencies for ML libraries? Are your pipelines standardised? Are your artefacts (data, containers, code) versioned?
❓How are you tracking your experiments, parameters, data, hyperparameters? Can you reproduce your model if you’re being audited?

Funny questions gif from Giphy.com

I know… lots of questions, but I hope you now see, he said… hopeful, the sense behind why MLOps is needed.

MLOps is here to help Data Scientists (DS) focus on solving business problems using their models, and not be distracted or expected to manage infrastructure, configuration, pipelines, and the systems and processes surrounding them.

The MLOps workflow

Coming from the magical land of DevOps (think Narnia but more pipelines), many of the principles covered below should feel very familiar. I believe there is a lot of overlap between the two Ops’s. You could even say there is significant overlOps… too much? 😬

Let us start with a concept we are all too familiar with:

Code — in ML we have code the same as any other application, code that can help us pre-process the data, define how we want to carry out the model training, or even middleware to control how we serve the model (AKA inferencing). This code goes through the same linting, testing, and packaging as any other.

Inference — this is just the process of sending queries with data to the model and receiving a prediction on the data in return

ML real-time inference requests and response.

Related to code, we need somewhere for the code to run, and there’s no better place for it to run than…

Containerisation 📦 — it can’t be a *Ops blog without mentioning containers, and you bet MLOps is riddled with containers. They can’t be… contained… but you know what can? That’s right, your training, processing and inferencing environments along with their code and dependencies can be defined in a nice Dockerfile, packaged up and ready to be deployed wherever your heart desires. Containerisation has really transformed the way we work, especially in MLOps.

For a bit more customisation, we can even use pre-existing containers created with ML Frameworks already installed, inject our own code at runtime and deploy those. Simples. Or leverage the flexibility of building our own containers with exactly what we need and make these available for reuse across our organisation #efficiency.

Where are the script injection and container building done? I’m glad you asked…

Pipelines — a word synonymous with DevOps (… and CI/CD). In MLOps we orchestrate and automate many different processes using pipelines. In AWS, for example, Amazon SageMaker Pipelines allow MLOps engineers to define repeatable model training and model deployment processes for quick and easy lifecycle management. A further selection of CI/CD tools can be used to orchestrate the overall process, kicking off the Amazon SageMaker Pipelines and managing deployment.

Since pipelines allow us to standardise processes, it becomes easier to trigger retraining pipelines based on monitoring the ML model, allowing us to inject, speed and robustness into the process.

Retraining ML models is a common pattern in MLOps, one major distinction from regular DevOps — a symptom of model drift where models begin to lose accuracy in their predictions

Model Deployment — I put this as a separate section but it is part of the pipelines too. It’s just that there are a few things I wanted to point out. The first is that models are generally deployed in two main ways:

Batch inferencing: this is also known as offline inferencing. This is used when you have a large dataset sat waiting to be served to the model to give you predictions or insights about the data. Batch inferencing is normally served on a scheduled basis (e.g. every morning at 9am) or event-based (only triggered when new data is added to the dataset).
Realtime inferencing: this happens, you guessed it, in real-time, on-demand. A model is normally stood up behind an API endpoint. This endpoint is then queried, sending it data and returning a prediction value.

One thing to note, with real-time inferencing you can use the tried and tested deployment strategies like Canary or Blue/green for the endpoint or even utilise some A/B testing with the production model to test your results. It’s super useful!

Note: there are 2 more inferencing options: Serverless and Asynchronous. The former great at handling unpredictable traffic patterns and the latter is great at queuing requests that have larger payloads needing longer processing times.

So, as you can see, the model deployment process and the resources you will need to serve the inference can vary depending on your use case. And even more so depending on the accessibility to your data, speaking of...

Data — where MLOps and DevOps begin to really differ is in the importance and reliance on data. Quality data is the foundation of ML, and quick and easy access to it is therefore essential for any MLOps design. This could be a book on its own (and it is) so I will do my best to butcher the summary in less than 500 words.

Data is needed everywhere, models being trained need data, and models that have become subject to drift also need new and more recent data. Data Scientists need data (I mean… it’s in their title). Good quality data is the golden egg of any model training process, your model is only as good as the data it trains on (essentially, you are what you eat) and we want to feed it only the best!

However, data doesn’t just arrive at our doorstep nicely packaged up. A lot of data is just noise, and often it is in need of enrichment or manipulation to make it useful. Modifying the format, transforming values, normalising and standardising the dataset etc. are all ways of preparing your dataset.

Furthermore, many industries, like finance or healthcare, handle sensitive Personally Identifiable Information (PII) that can only be accessed from within certain regulated environments — normally Production. Making your job of training and evaluating the model more interesting. This means when it comes to model training you generally have two main options for promotion:

Promote the model: it does what it says on the tin, in this promotion process you train the model using the real data by bringing it to the DEV environment. Once the model is created, evaluated, and approved for deployment, it is moved to a staging environment for some integration testing and finally, it is promoted to the production account ready to serve.
Promote the code: in this process, you’re essentially promoting the whole training pipeline through the different environments. Since you can’t bring all the production data to the lower DEV environment, possibly due to regulations, you can only train on what is most likely tokenised data (randomised version of the real data, not as good as the real stuff and will lead to a drop in accuracy). Once the training is done, the pipeline and code are promoted to higher environments until they finally reach the production environment. Here is where the magic lies, you can now train the model using the actual data! If you can’t bring them to the party, take the party to them!

Now you may be thinking option #1 sounds just terrrrrrific, no need to faff around with promoting pipelines, let’s do that always. But it’s actually option #2 that’s the most common scenario and most often recommended as it allows something we’ve not covered yet and that’s the process of automatically retraining the model when model drift is experienced since the training pipeline is already in the target environment. Also, some industries will require a certain level of accuracy from your model, and tokenised data has its limits.

I think we’ve spoken enough about the data. Too much data on data…

*deep sigh* I knew I’d struggle to keep this section short… but I hope that was useful and exciting, data is fun!

Data is a common theme in MLOps and really has its own field entirely dedicated to this (DataOps, Data Engineering, Data Analytics et al) I’m only scratching the surface.

Developing ML models is a highly iterative process. Many small changes and lots of outputs are created per model training attempt. How do we store, categorise and track these? Let’s find out!

Experimentation — When developing a model, you run multiple experiments and trials, changing hyperparameter values, Algorithms, weights etc. Altering the data you're using or how you’re slicing it up or manipulating it. And then after all this experimentation you pick your best-performing model and deploy that. But best performing doesn’t always mean you deploy the latest trained model.

Keeping track of all these changes, metadata and performance results is challenging, but fear not, tools like MLFlow and Amazon SageMaker Experiments are your friends. These tools allow you to experiment to your hearts content whilst providing you with model experimentation tracking, lovely visual comparisons between training runs, along with metadata and tagging.

But where is the intermediate and final model stored?

Model Registry — this is a collection of ML models, all packaged and stored in a registry to ease the process of deployment and sharing. Like code that’s been built and is essentially “ready to go”. Closely linked to the Experiments console so that you can view the details of the different models. Amazon SageMaker Model Registry is an example of a model registry, it is built upon Amazon S3 (because… like… what isn’t? 😂), integrates well with Amazon SageMaker Experiements and provides great auditability by recording who approved what model for deployment.

Model training and Model Experimentation tracking.

And last but not least…

Monitoring — I’ll be damned if you think I’m deploying anything anywhere without monitoring it. But what are you actually monitoring?

Model monitoring process with re-training.

In MLOps we mainly care about the following 2 things:

the underlying hardware — model training needs a lot of computing power, many cloud providers will allow you to provision specialised hardware for model training and utilise really cool parallel training jobs etc. Either way, when it comes to inferencing, whether offline or real-time, we have to ensure we monitor the usage of the underlying compute (includes CPU, Memory, GPU etc. etc.).
the model performance — due to possible changes in the real-world environment, data, context or something else entirely, the model will, over time, lose accuracy (model drift) in its predictions and this is something we definitely want to know about.

If you are operating in the cloud ☁️, which for your sake and the sake of everyone on this good green earth I hope you are, then monitoring underlying hardware is easy to handle. Depending on the specific service, managed cloud-based options handle scaling pretty well, or if you want to manage your own solution that’s an easy few clicks to deploy instances to your taste that tick your CPU, memory, and GPU requirements. Also, how easy is it to select different instance flavours to keep up with your experimentation needs in the cloud ☁️? The answer is very, it is very easy.

For monitoring model performance we need to make sure we are utilising model monitoring services to constantly track the accuracy of the model predictions (#3 in the diagram above). This is normally done by using our evaluation dataset pre-deployment of our model to determine a baseline check for accuracy (#1 in the diagram above) and tracking this against the real-time accuracy results for a delta (#2 in the diagram above). Should these drop below a certain threshold, a re-training pipeline should be triggered (#4 in the diagram above) and all our hard work of setting up this MLOps workflow will come to full fruition, producing a new model, ready to pick up where the old one left off with sustained accuracy 🔥

Conclusion

Wow… well, thanks so much for sticking with me this far. MLOps is huge, there are so many topics I didn’t even get to touch like security and governance, maybe next time! 😬

This has been a marathon, mostly because it is hard to shut up about MLOps. I have really struggled to not dive deep into each aspect of the journey and turn this into a novel… well, more of a novel. You’re awesome if you’ve stuck this far, pat on the back to you :)

MLOps looks to streamline the journey of the model through its different stages. It borrows many of the tried and tested DevOps practices and adapts the rest to enable efficient model development and deployment. MLOps, in my opinion, is more than just training a model quickly, it is about enablement at every single stage of the model development and absolutely surpasses the definition “just DevOps for ML” as a deserving recipient of the *Ops status.

Until next time… Adios!

Model, Data, Action! A Beginner’s guide to the wacky world of MLOps

What is an ML model?

What is MLOps?

The MLOps workflow

Conclusion

Written by Hamzah Abdulla

No responses yet