Basic Introduction to MLOps


What exactly is MLOps?

Machine learning operations, or MLOps, are strategies for streamlining the machine learning life cycle from start to finish. Its goal is to connect design, model development, and operations. Model development and operations are frequently kept separate in ML development, with just a manual handover connecting them, resulting in lengthy turnaround times. Data collection, preprocessing, model training, evaluation, deployment, and retraining are all combined into a single MLOps process that teams must maintain. System administrators, data science teams, and other departments across the company collaborate and communicate to create a shared understanding of how production models are produced and maintained.

The evolution of MLOps

When organisations needed to adopt machine learning solutions in the early 2000s, they used vendor-licensed software like SAS, SPSS, and FICO. More software practitioners began using Python or R libraries for training ML models as open-source software and data became more widely available. However, using the models in production remained a challenge. The deployment of the model in a scalable manner was solved utilising Docker containers and Kubernetes as containerization technology matured. These systems have recently evolved into machine learning deployment platforms that cover the entire cycle of model experimentation, training, deployment, and monitoring. The MLOps evolution is depicted in the diagram below.


How is it similar to / different from DevOps?

DevOps is a term that refers to the integration of software development, testing, and operations. The purpose of DevOps is to transform these segmented processes into a unified set of procedures within a business. The automation of processes, continuous delivery, and feedback loops are all key DevOps ideas. These concepts rely on cross-departmental communication and a set of technologies (such as CI/CD systems) that consolidate and facilitate these processes in a visible way.

DevOps and MLOps have certain parallels because MLOps adopted many of DevOps’ principles. DevOps and MLOps both encourage and enable collaboration among developers (software engineers and data scientists), infrastructure managers, and other stakeholders. Both prioritise process automation in ongoing development in order to maximise speed and efficiency.

Although both have synchronous principles, there are some fundamental differences between the two as stated below:

  1. Versioning for ML: Code version control is used in DevOps to ensure that any changes or revisions to the product being created are documented clearly. The code, on the other hand, isn’t the only variable in machine learning. Data, as well as parameters, metadata, logs, and the model, are all crucial inputs that must be managed.


  1. Hardware required: Machine learning models require a lot of computing power to train, especially deep learning models. For most software projects, build time is unimportant, and the hardware on which it is completed is also unimportant. Larger models, on the other hand, can take hours to weeks to train, even on the fastest GPU workstations available from cloud vendors, implying that an MLOps system must be much more complicated in terms of the machines it can handle.


  1. Continuous Monitoring: Monitoring is also an important aspect of excellent DevOps methods. Site reliability engineering (SRE) has been all the rage in recent years, emphasising the necessity of monitoring in software development. The distinction between DevOps and MLOps monitoring is that software does not deteriorate, however machine learning models do. Once a model is deployed into production, it begins to generate predictions from new data that it receives from the real world. This data will continue to change and adapt as the business environment does, resulting in model degradation. MLOps provides for procedures that facilitate continuous monitoring and retraining so that the algorithms may continue to be used in production.


ML Ops is a set of practices that combines Machine Learning, DevOps and Data Engineering, which aims to deploy and maintain ML systems in production reliably and efficiently.


Only 15% of 70 prominent enterprise businesses have put AI capabilities in mass production, according to a recent research by NewVantage Partners. AI that isn’t used to create value is just a very expensive experiment. These experiments are technically challenging, yet they don’t yield a return on investment. MLOps makes it simple for businesses to instal, monitor, and update models in production, opening the road for AI that pays off. The need for MLOps refers to a vast domain of issues, as discussed below:

  1. Deployment: Because models aren’t implemented, businesses aren’t reaping the full benefits of AI. Or, if they are deployed, it is not at the speed or scale required to fulfil the business’s requirements.

The benefits that MLOps provides on similar lines are associated with models like: Models are built using a variety of languages and teams; Models are sent to IT, but they are never used in production; For deployment, models must be rewritten in several languages;

A huge number of models are currently awaiting deployment; During the deployment phase, data scientists spend a lot of time troubleshooting models; There is no or a poorly standardised method for moving models from development to production; Putting models into production is a complicated process that necessitates upgrading multiple systems.

  1. Monitoring: Manually assessing the health of machine learning models is time-consuming and diverts resources away from model development.

The significance of MLOps is that it helps to tackle the following issues / models: Models are being produced, but there has never been any monitoring; Without an uniform way to monitor models, they are deployed across the enterprise and in diverse systems; Models have been on the market for a long time and have never been updated; The performance of the model must be determined manually by a data scientist.

  1. Life-cycle management: The construction of numerous machine learning models and their deployment in a production setting are relatively new concepts for most traditional enterprises. Until recently, the number of models may have been manageable on a small scale, or there was just less interest in understanding these models and their interdependencies on a corporate level. Models become more vital when decision automation (that is, decision making that occurs without human participation) grows more common, and managing model risks becomes more crucial at the top level. In terms of demands and tooling, the reality of the machine learning life cycle in an organisational setting is far more complex. These are the main reasons why scaling machine learning life cycles is difficult:
  1. There are multiple interconnections. Not only does data change over time, but so do business requirements. To ensure that the model in production is a reality, the results must be presented to the company on a regular basis.
  2. The majority of data scientists are skilled in model construction and evaluation but not in application development. Although this may change in the future as some data scientists specialise on deployment or operations, many data scientists are now juggling many duties, making it difficult to execute them all successfully. As the number of models to handle grows, overworked data scientists become even more problematic. When data teams have a lot of turnover, the complexity skyrockets, and data scientists are suddenly in charge of models they didn’t create. 

        Source: oreilly-ml-ops

In the above scenario, MLOps helps us when: In production, models are not being updated; After the initial deployment, data scientists do not hear about model deterioration; Production model upgrades are significantly influenced by data scientists; Due to the high maintenance demands of old models, only a tiny portion of new project demand is filled.

d. Model Governance: Because of the many deployment techniques, modelling languages, and the lack of a centralised view of AI in production throughout a company, businesses require time-consuming and costly audit processes to verify compliance. 

Production Access Control; Traceable Model Results; Model Audit Trails; Model Upgrade Approval Workflows are all aided by MLOps Model Governance.


1. Risk Mitigation: Any team with even one model in production should use MLOps since, depending on the model, continual performance monitoring and modifying is required. MLOps is critical in limiting the risks posed by the usage of ML models by allowing safe and dependable operations. MLOps methods, on the other hand, come at a price, therefore each use case should undergo a thorough cost-benefit analysis.

Risk Assessment – Machine learning models come with a wide range of dangers. For example, the stakes for a recommendation engine used once a month to choose which marketing offer to send a customer are substantially smaller than for a travel site whose price and income are based on a machine learning model. When considering MLOps as a risk mitigation strategy, consider the following: 

• The risk that the model will be unavailable for an extended period of time 

• The risk that the model will make a poor forecast for a particular sample

• The risk that the model’s accuracy or fairness will deteriorate over time 

• The risk that the skills required to maintain the model (e.g., data science ability) will be lost

      Source: oreilly-ml-ops

The risks are frequently higher for models that are widely deployed and used outside of the company. The probability and impact of the unfavourable event are usually used to measure risk. Mitigation strategies are usually based on a mix of the two, i.e. the severity of the model. Each project should begin with a risk assessment that is updated on a regular basis.

2. Responsible utilisation of Artificial Intelligence: Intentionality and responsibility are the two primary characteristics of responsible machine learning (also known as ResponsibleAI). 

These concepts may seem self-evident, but it’s worth remembering that machine learning models lack the openness of imperative programming. To put it another way, it’s considerably more difficult to figure out what attributes are used to produce a prediction, which might make it difficult to show that models meet regulatory or internal governance standards.

The truth is that implementing automation through machine learning models lifts the fundamental onus of accountability from the bottom to the top of the hierarchy. That is, decisions that were previously made by individual contributors operating within a margin of error (for example, what a specific product’s price should be) are now decided by a model. The person in charge of the model’s automated decisions is almost certainly a data team manager or maybe an executive, bringing the concept of Responsible AI even closer to the fore.

It’s clear to understand how MLOps and Responsible AI connect, given the previously outlined concerns, as well as these specific difficulties and concepts. Teams must follow solid MLOps standards to conduct Responsible AI, and Responsible AI needs MLOps tactics. 

3. Scaling: MLOps is crucial not just because it helps to reduce the risk of machine learning models in production, but it’s also a necessary component of massively deploying machine learning efforts (and in turn benefiting from the corresponding economies of scale). MLOps discipline is required to get from one or a few models in production to tens, hundreds, or thousands that have a beneficial business impact.

At the very least, good MLOps processes will benefit teams:

• Keep track of versioning, especially throughout the design process with experiments.

• Determine whether retrained models outperform prior versions (and promoting models to production that are performing better)

• Ensure that model performance does not deteriorate in production (at predetermined intervals—daily, monthly, etc.)


  1. Subject Matter Experts (SMEs)
  1. The subject matter experts (SMEs) are the first profile to consider as part of MLOps efforts; after all, the ML model life cycle begins and finishes with them. While data-oriented professions (data scientist, engineer, architect, etc.) have a broad range of skills, they sometimes lack a thorough understanding of the business and the problems or questions that machine learning may solve. 
  2. Starting the machine learning model life cycle with a more defined business issue isn’t always a required, or even a desirable, scenario in businesses with good procedures. Working with a less defined business goal might provide subject matter experts with an early opportunity to collaborate directly with data scientists to better outline the problem and brainstorm potential solutions. 
  3. Subject matter experts are important not only at the start, but also at the finish (post-production) of the ML model life cycle. Because traditional measures (accuracy, precision, recall, etc.) aren’t always enough to determine whether an ML model is working well or as predicted, data scientists frequently rely on subject matter experts to fill the feedback loop. MLOps is an additional technique to bring openness and insight to these processes for subject matter experts who are also concerned about compliance of machine learning models with internal or external requirements.
  1. Data scientists:
  1. When developing an MLOps strategy, the needs of data scientists are the most important to consider. To be sure, they stand to gain a lot; most data scientists today work with segregated data, processes, and tools, making it difficult to scale their efforts successfully. MLOps is in a good position to change that.
  2. Though most people think of data scientists’ role in the ML model life cycle as limited to model construction, it is — or should be — far broader. Data scientists must collaborate with subject matter experts from the start, understanding and assisting in the framing of business problems in order to develop a viable machine learning solution.
  1. Data engineers:
  1. Data pipelines are at the heart of the machine learning model development process, and data engineers are at the heart of data pipelines. Data engineers can gain a lot of efficiencies from MLOps because data pipelines can be abstract and sophisticated.
  2. MLOps can deliver considerable efficiency savings due to data engineers’ essential involvement in the ML model life cycle, underpinning both the construction and monitoring sections. Not only do data engineers need insight into the performance of all models in production, but they also need the ability to delve down into particular data pipelines to resolve any underlying problems.
  3. For the data engineer profile (and others, including data scientists), MLOps should ideally consist of a bridge to underlying systems for studying and changing ML models, rather than simply monitoring.
  1. Software engineers:
  1. Most firms produce not only ML models, but also traditional software and apps, despite the fact that software engineers rarely construct ML models. It’s critical that software developers and data scientists collaborate to guarantee that the bigger system runs smoothly. After all, machine learning models aren’t just stand-alone experiments; the machine learning code, training, testing, and deployment must all fit into the CI/CD pipelines that the rest of the programme uses.
  2. MLOps provides model performance details to software engineers as part of a comprehensive view of enterprise software application performance. MLOps is a method for data scientists and software engineers to communicate in the same language and share a common knowledge of how different models deployed across the enterprise’s silos interact in production.
  3. Other key aspects for software engineers include versioning, which ensures that they know what they’re working on; automatic tests, which ensure that what they’re working on is working; and the ability to work on multiple applications at the same time (thanks to a system that allows branches and merges like Git).
  1. DevOps:
    1. MLOps are based on DevOps concepts, but it doesn’t imply they can’t function in parallel as independent, segregated systems. In the life cycle of a machine learning model, DevOps teams play two key functions.
      1. They are, first and foremost, those who conduct and create operational systems as well as tests to assure the security, performance, and availability of machine learning models.

II. Second, they are in charge of managing the CI/CD pipeline. Both of these positions necessitate close cooperation with data scientists, engineers, and architects. Of course, tight collaboration is easier said than done, but that is where MLOps can help.

  1. MLOps must be integrated into the enterprise’s overall DevOps strategy, bridging the gap between classic CI/CD and current ML for DevOps teams. That implies tools that are fundamentally complementary and allow DevOps teams to automate ML tests in the same way they can do traditional software testing.
  1. Model risk managers / auditors (MRMs):
  1. Model risk managers are key players in the ML model life cycle, examining not only model output but also the initial goal and business questions. The goal of machine learning models is to reduce the company’s overall risk. They should be involved early in the life cycle, together with subject matter experts, to ensure that an automated system does not pose a risk in and of itself.
  2. Because their work is sometimes tediously manual, MRM specialists and teams can benefit greatly from MLOps. Because MRM and the teams with whom they collaborate frequently utilise different tools, standardisation can significantly improve the speed with which auditing and risk management may be completed.
  3. When it comes to specific MLOps requirements, the key one is comprehensive reporting tools for all models (whether they are currently in production or have been in the past). Not only should this reporting provide performance details, but also the ability to identify data lineage. In MLOps systems and processes, automated reporting adds an added layer of efficiency for MRM and audit teams.
  1. Machine learning architects:
  1. Machine learning architects play a crucial role in the life cycle of ML models, ensuring that model pipelines are scalable and flexible. Furthermore, data teams require their skills to implement new technologies that increase ML model performance in production (where suitable). 
  2. It is for this reason that the designation of data architect is insufficient; To play this critical role in the ML model life cycle, they must have a thorough understanding of machine learning, not just business architecture.
  3. This position necessitates cross-company collaboration, from data scientists and engineers through DevOps and software engineers. Machine learning architects cannot correctly deploy resources to ensure optimal performance of ML models in production without a comprehensive grasp of the needs of each of these people and teams.
  4. The machine learning architects’ job in MLOps is to provide a consolidated picture of resource allocation. They require an overview of the situation to detect bottlenecks and use that information to discover long-term changes because they have a strategic, tactical role. 
  5. Their job is to identify new technologies or infrastructure that could be worth investing in, not to provide operational quick fixes that don’t address the system’s scalability.

                 ML Architecture Design

Source: :

Sagnik Credit:-

Name: Sagnik Mukherjee
Designation: I am an aspiring Data Scientist. My domain of research, currently, is Machine Learning Operations (MLOps) and I am coming up with an auto-ml tool kit as a parallel project of my own for which I also have had research paper(s) accepted at the ASIANCON and a university-level thesis/dissertation accepted.
Research area: MLOps

Leave a Comment