Advanced MLOps

Advanced MLOps CONTENTS: 1. Principles of MLOps? 2. Basic MLOps stack template 3. References WHAT ARE SOME OF THE KEY PRINCIPLES OF MLOps?

Automation:
The maturity of the ML process is determined by the level of automation of the Data, ML Model, and Code pipelines. With increased maturity, the rate at which new models are trained increases as well. An MLOps team’s goal is to automate the integration of machine learning models into the core software system or as a service component.
This entails automating the entire ML-workflow process without the need for operator intervention. Calendar events, communications, monitoring events, as well as changes to data, model training code, and application code, can all be used as triggers for automatic model training and deployment.
Automated testing aids in the rapid and early detection of issues. This allows for quick error correction and learning from mistakes. To implement MLOps, we see three degrees of automation, starting with manual model training and deployment and progressing to fully automated ML and CI/CD pipelines.

Manual – Process that is done by hand. This is a common data science procedure that is carried out at the start of ML implementation. The nature of this level is experimental and iterative. Each pipeline step is carried out manually, including data preparation and validation, model training, and testing. Rapid Application Development (RAD) technologies, such as Jupyter Notebooks, are commonly used to process data.

Source: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

ML pipeline automation – The model training is carried out automatically at the next level. Here we introduce the model’s ongoing training. The process of model retraining is started whenever fresh data becomes available. Data and model validation stages are included in this level of automation.

Source: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

CI/CD pipeline automation- We introduce a CI/CD system in the last stage to do rapid and reliable ML model deployments in production. The main difference from the previous stage is that the Data, ML Model, and ML training pipeline components are now built, tested, and deployed automatically.

Source: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Continuous X:
Continuous Integration (CI) adds data and model testing and validation to the testing and validating of code and components.
Continuous Delivery (CD) is concerned with the delivery of an ML training pipeline that deploys an ML model prediction service automatically.
Continuous Training (CT) is a feature exclusive to machine learning systems that automatically retrains ML models for redeployment.
Continuous Monitoring (CM) is concerned with the continuous monitoring of production data and model performance indicators that are linked to business metrics.
Versioning:

The purpose of versioning is to consider machine learning training scripts, models, and data sets for model training as first-class citizens in DevOps processes by using version control systems to manage ML models and data sets. According to SIG MLOps, the following are the most prevalent causes for ML model and data changes:

New training data can be used to retrain machine learning models.
Models can be retrained using fresh training methods.
Self-learning models are possible.
Over time, models may deteriorate.
The models could be used in new applications.
Models are vulnerable to assault and may need to be revised.
Models can be rolled back to a previous serving version in a matter of seconds.
We need access to all versions of the productionised ML model because corporate or government compliance may demand audit or investigation of both the ML model and the data.
Data may be spread across several systems.
Only certain jurisdictions may be able to store data.
It’s possible that data storage isn’t immutable.
It’s possible that data ownership is a factor.

Experiments Tracking:

Machine learning is a research-driven, iterative process. In contrast to the traditional software development approach, numerous model training trials can be run in parallel in ML development before deciding which model would be promoted to production.
The following scenario could arise from ML development experimentation: Using different (Git-) branches, each dedicated to a single experiment, is one technique to keep track of several studies. Each branch produces a trained model as its output. The trained ML models are compared to each other based on the chosen metric, and the best model is chosen.
The tool DVC, which is an open-source version control system for machine learning projects and is an extension of Git, fully supports low-friction branching. The Weights and Biases (wandb) library is another popular tool for recording ML trials. It automatically tracks the hyperparameters and metrics of the trials.
Testing:

Source: https://ml-ops.org/content/mlops-principles#monitoring The data pipeline, machine learning model pipeline, and application pipeline make up the entire development pipeline. We identify three types of testing in ML systems based on this separation: tests for features and data, tests for model development, and tests for ML infrastructure.

Features and Data-tests:

Validation of data and features schema/domain is performed automatically.
The relevance of the features test is used to determine whether new features improve predictive power.
Policy-compliant features and data pipelines are essential (e.g. GDPR). Both the development and production environments should be verified for these requirements.
Unit tests should be used to test the code that creates features (to capture bugs in features).

Tests for reliable model development:

Routines for verifying that algorithms create decisions that are aligned with business objectives should be included in ML training testing. This means that loss measures from machine learning algorithms (MSE, log-loss, etc.) should be correlated with business effect measurements (revenue, user engagement, etc.)
Test for staleness of a model. If the trained model does not incorporate up-to-date data and/or does not meet the business impact requirements, it is considered stale. In intelligent software, stale models can have an impact on prediction quality.
Trying to figure out how much more advanced machine learning models will cost.
Validating a model’s performance.
The performance of the ML model is subjected to fairness, bias, and inclusion testing.
ML model definition code (training) and testing, as well as traditional unit testing for all feature creation.

ML Infrastructure test:

The training of machine learning models should be repeatable, which means that training the model on the same data should yield identical results.
Use the ML API to see how it works. Stress testing is a term used to describe the process of
Verify if the algorithm is correct.
Integration testing: The entire machine learning pipeline should be tested.
Before serving the ML model, it must be validated.
Before serving, the ML models are canaried.
Testing the model in a training environment yields the same result as testing the model in a serving environment.

Monitoring:

Once the machine learning model has been deployed, it must be checked to ensure that it is performing as planned. The following check list for model monitoring operations in production was adapted from E.Breck et al. 2017’s “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction”:

Changes in reliance across the entire pipeline result in a notification.
In training and serving inputs, keep an eye on data invariants.
Check to see if the training and serving features produce the same result.
Keep an eye on the ML model’s numerical stability.
An ML system’s computational performance should be monitored.
Keep track of how old the manufacturing system is.
Calculate the model’s age. The performance of older machine learning models tends to deteriorate.
Keep an eye on the feature creation procedures, as they have an impact on the model.
Keep track of the ML model’s predictive quality on served data as it deteriorates. In terms of prediction quality, both spectacular and slow-leak regression should be reported.

Source: https://ml-ops.org/content/mlops-principles#ml-test-score-system

Reproducibility:

In a machine learning workflow, reproducibility means that given the same input, every phase of data processing, ML model training, and ML model deployment should generate identical results.

ML Test score systems:

The “ML Test Score” assesses the ML system’s overall preparedness for production. The following is how the final ML Test Score is calculated:

For each test, half a point is given for manually running the test and documenting and disseminating the results.
If there is a system in place to execute the test automatically on a regular basis, you will receive a full point.
Add the scores from each of the four sections: Data Tests, Model Tests, Machine Learning Infrastructure Tests, and Monitoring.
The minimum of the scores accumulated for each of the sections: Data Tests, Model Tests, ML Infrastructure Tests, and Monitoring, is used to get the final ML Test Score.

Modularity:

It can be more challenging to ensure free coupling between machine learning components in ML-based software systems than it is for traditional software components. In numerous aspects, component boundaries in ML systems are weak. For example, ML model outputs might be utilised as inputs to another ML model, and these interleaved dependencies may have an impact on training and testing.The machine learning project can be structured to achieve basic modularity. The use of these specific templates such as to set up a typical project structure is suggested.

HOW DOES THE MLOps STACK TEMPLATE LOOK LIKE? Currently, the MLOps technology environment is constantly shifting. Because each tool’s scope may include various aspects of the MLOps process, each use-case will necessitate careful consideration of the MLOps tech stack. The needs for model versioning and monitoring, for example, may vary based on the use-case. The monitoring mechanism used in regulated businesses like finance and medicine will be more complex than in non-regulated businesses. Using the MLOps Stack Template provides a methodical approach to go about choosing an MLOps tech stack. According to the MLOps Principles, this template breaks down a machine learning workflow into nine components. The necessary requirements for each component must be collected and analysed before selecting tools or frameworks. Finally, the instrument should be chosen in accordance with the analysis.

Source: https://ml-ops.org/content/state-of-mlops REFERENCES:

Article Credit:-

Name: Sagnik Mukherjee

Designation: I am an aspiring Data Scientist. My domain of research, currently, is Machine Learning Operations (MLOps) and I am coming up with an auto-ml tool kit as a parallel project of my own for which I also have had research paper(s) accepted at the ASIANCON and a university-level thesis/dissertation accepted.

Research area: MLOps