MLOps: What is Operational Tempo? | by Matt Blasa | Jun, 2023


There are several factors that can affect the ops tempo:

  1. Strategic goals
  2. Data quality and availability
  3. Team size, organization, and experience
  4. Infrastructure, resources, and tools
  5. Model complexity
  6. Regulatory requirements

It’s important to remember that operational tempo isn’t only about tools or metrics. While these factors influence ops tempo, they don’t solely define it.

Photo by Jason Goodman On Unsplash

Strategic goals play a significant role in shaping the operations tempo of an organization. They provide intent, focus, direction and a framework for the model building process. Always ask the users why they need an ML solution, rather than the how. Focus on the functionality of the project’s goal before building.

Focusing on this “why” can significantly impact the operations tempo in MLOps. By understanding the reasons and intent of a project, it helps identify the direction of what you want to build. Lack of communication or clarity on the “why” is a common reason why projects don’t make it to production. This creates bottlenecks where development may stop — or even need to be restarted.

Author’s Creation

Good questions to ask before you build a model:

  • Why are we building this?
  • What do we want it to do?
  • What are potential risks?
  • When do we want this be completed by?

The best way to answer these questions? Always start with a business use case, and focus on the end product in mind. Data products need to be the focus first, rather than looking at the model. You need to start with a “why ” first, then move to the “how”.

A good example of this is an ecommerce client I worked with overseas, who needed at a recommendation model. They believed this approach would enhance customer engagement and increase sales.

When we were brought on, our consulting team lead didn’t dive into to how to build the model. Instead, they focused on listening and asking critical questions:

  • Why are we building this model, what current process does it replace?
  • What is the end user’s goal for the model, and its outputs?
  • What are the potential risks that slow this down?
  • How much time do we have, and what is feasible?”

Our team lead’s focus was on understanding the “why.” It was on the end result — the data product. The ultimate goal was to personalize shopping experiences, to driving customer engagement and remarket products to them.

This clarity of purpose — product vs. model mentality —streamlined the entire development process. It tied the model development process to the larger business strategic objectives. And it fostered improved communication and trust between our consulting team and business teams, increasing the MLOps tempo.

Based on our recommendations, the client’s team was then able to efficiently allocate resources, adapt to unexpected changes, and focus their efforts on achieving the desired outcome: an improved recommendation system.

By focusing on the “why,” they ensured that every stage of the project was aligned with the strategic goal of providing a personalized shopping experience. As a result, the ML solution was successfully implemented, significantly improving the product recommendation system and leading to a noticeable uptick in sales and customer satisfaction.

All other factors in MLOPs tempo are affected by clear understanding of the why — the strategic goals and end use cases.

Photo by Luke Chesser on Unsplash

Data and efficient data operations are critical for MLOps success. Good processes, experienced workers, and tools are not effective without high-quality data. Data is the foundation of machine learning model development — it plays a decisive role the operations tempo.

A solid DataOps foundation is crucial for maintaining a good MLOps operational tempo. If the DataOps supporting the MLOps process is immature or incomplete, it can lead to a backlog. A robust DataOps process and maturity ensures that data used in machine learning models is high quality, consistent, timely, and accurate.

Data Ops and MLOps Process, Author’s Creation

Challenges in data operations for MLOps:

  • Inadequate data hindering the speed of operations.
  • Lack of clear strategic goals and communication, diminishing the impact of good data.
  • Data unavailability or difficulty in obtaining and transforming data for specific use cases

Access to high-quality, up-to-date, and scalable data can speed up the process of model development by providing accurate and relevant information to train the model.

Example

Let’s illustrate this example, with a financial client trying to improve its credit scoring model. The goal? Provide better risk assessments, leading to better loan decisions.

Our client had a competent and knowledgeable MLOps team, good processes and templates, and the best tools. However, they ran into a data issue- without high quality, relevant data, these resources were not enought to build an effective machine learning model. It was like a hammer without nails or a wood.

Every machine learning (ML) model needs data to work. Data is the building blocks for creating all ML models. We noticed didn’t have a good system (DataOps) in place to ensure the data was of the right quality and available when needed. This issue slowed down their work speed and development, dragging their MLOps rhythm.

Our consulting team assisted the client in improving their DataOps foundation —by working closely with their data engineering teams. The goal? To ensuring that data used in machine learning models was high quality, consistent, timely, and accurate. Throughout the contract, our team lead emphasized establishing clear strategic goals and improving communication regarding data needs and usage.

Access to high-quality, up-to-date, and scalable data sped up the model development process. They were able to provide accurate and relevant information to train the credit scoring model, which significantly impacted the speed and efficiency of model development.

Data is the backbone of any machine learning project. So its quality and availability, helped by DataOps, can significantly impact the speed and efficiency of model development.

Photo by MAHDI on Unsplash

Team size, organization, and experience are crucial elements that greatly affect the execution of operations. A well-rounded team of data scientists, engineers, domain experts, and project managers, can effectively collaborate to create, deploy, and maintain machine learning models.

Factors that affect operations tempo include:

  • Lack of time frames (time boxing), vague scope, and undefined responsibilities.
  • Inter-team communication issues and coordination challenges, slowing down project progress.
  • Disorganized teams encountering delays and inefficiencies due to overlapping tasks or role confusion.
  • Less experienced teams needing more time for learning and experimentation, affecting the project’s overall pace.
Team Factors, Author’s Creation

Larger teams can achieve a higher operational tempo. Much of this comes down to both good delegation, project scoping, and communication. It helps if teams work in parallel by divide experiments and model development. Their tempo hinges on communication between individual teams, time boxed tasks, and clearly defined roles.

Smaller teams may have a lower operational tempo due to limited resources and the need to perform multiple tasks with limited manpower. However they may also have more streamlined communication and coordination, which can enable them to move faster and iterate more efficiently.

Example

Let’s use another example. This time I’ll use a marketing client. The client sturggled with efficient MLOps tempo in developing ML models to personalize marketing pricing. The project workloads had outstripped their ability to build quickly and flexibility. It was so bad that they were struggling to maintain an efficient operational tempo.

The team was small and lacked a balance of skills and experience to handle these new projects. The scope of the projects were often vague, responsibilities undefined, and there were several communication issues. The smaller team size meant they had limited resources, affecting their ability to work on multiple aspects of the models simultaneously.

To fix these challenges, our consulting team lead suggested a reorganization. The marketing client hired additional data engineers, machine learning experts, and project managers with experience in managing AI-based projects. They then suggested each role and responsibility was clearly defined, and also that each project had a well-defined scope and timeline.

The change made a big difference. With a bigger and more diverse team, working on different parts of the machine learning projects became a breeze for our client. I noticed clearly defined roles, improved how they tested and benchmarked models, and improved their communication. It really reduced confusion and boosted teamwork, speeding up their MLOps tempo and development time.

Striking a balance between team size, organization, and experience, coupled with effective project management, is vital for maintaining an efficient MLOps pace and ensuring the success of machine learning projects.

Photo by Tyler Lee on Unsplash

Infrastructure and tools are two key factors that can significantly impact the speed and agility of machine learning development and deployment. Infrastructure makes sure predictive outputs are delivered in a timely manner. While tools help make sure you can automate repetitive processes and enhance insights gained from data.

Scaling Infrastructure, Author’s Creation

Infrastructure must have decent computing, good data storage, and reliable networking infrastructure. This enables faster development and retraining cycles. It also makes iteration and deployments quicker.

ML models as they scale, require larger amount of computing power. They also need data storage to save experiments, data, etc. Without sufficient or correct computing and storage resources, ops tempo is slowed down. Which limits the number models, that can be developed, deployed, or scaled.

Data is the core of machine learning model development, with infrastructure to support that data the most critical. Before starting MLOps (or even ML modeling for that matter) focus on building a robust data storage, data pipelines, and data versioning processes. The last is especially critical if you are building models.

Tools also play a role in ops tempo. They’re used not only automate repetitive tasks and complex processes, but to improve reproducibility, model management, data management, monitoring and security. Tools automate these processes but can slow them down — especially if they are incompatible or vendor lock-in occurs.

Tools and MLOps Tempo, Author’s Creation

Some tooling issues that commonly slow down MLOps operation tempo:

  • 3rd party tools that are incompatible with each other
  • Redundant tools that duplicate similar processes
  • Different tool versions between teams
  • Too many tools to solve problems that could’ve been done manually

Good audits and assessments of these tools need to be done regularly. This helps eliminate any inefficiencies that tool conflicts, duplication, different formats, other factors create.

Each may be small, but without an occasional audit, can significantly slow down MLOps processes and model development.

Example — Infrastructure and Tools

Let’s use another example. This time I’ll use a client with a recommendation system. Their goal was to boost sales by suggesting products similar to what customers had already interacted with. Initially, their tech setup and pace of work in machine learning operations, or MLOps, was just fine.

However, as they expanded their machine learning models to handle more data and complexity, they ran into hurdles. Their computer power and data storage became inadequate, slowing their work pace. This limitation also reduced the number of new models they could create, use, or grow. They realized they needed to improve their data handling processes and keep better track of different versions of data when building models.

Tools also became a headache. Some didn’t work well together, others were doing the same job, and different teams used different versions of the same tool.

To tackle these issues, our consulting team improved their tech setup, working with their data engineers to increase data storage, and streamline data handling processes. Using this foundation, the clients also set up better processes to keep track of data versions, which helped keep data intact and made it easier to retrain models.

We also suggested regular maintenance audits. The client began to regularly check their tools, noting any that were unnecessary, standardizing the versions used across teams, and swapping out those that didn’t work well together. This helped improve their MLOps pace.

This experience I saw, highlights the importance of having proper technology and tools that complement each other, especially as your operations grow. The pace of machine learning operations (MLOps) can change. What works well at one stage with your current tools and technology may not be enough as you scale up.

Photo by Donny Jiang on Unsplash

Modeling complexity affects ops tempo in MLOps in three ways: training, technical, and execution. MLOps often runs into slowdowns at these three points.

Training intricate models can be challenging. Data scientists and engineers may need to spend more time experimenting and validating data. For data engineers, complex models require extra level of validation and data quality checks. For data scientists, complex models have higher difficulty interpreting, maintenance, debugging, and optimizing time. High complexity means more scoping, development, and testing time.

Modeling Complexity, Author’s Creation

Technical complexity also increases in proportion to a model. The more complex, the more resources, people, and time need to build models, engineer a pipeline, demo, and perform user acceptance testing. There’s greater time needed to retrain and rebuild if that model fails in production. Even when it does succeed, the testing and validation is more extensive than a simple model.

Time is also important factor. Especially for the business units, who you have to keep informed. Team members may need to devote additional time and effort to plan for these models and demonstrate value to the business. Setting clear time limits for model experimentation and development is critical.

Balancing model complexity with operational efficiency is important for maintaining a manageable MLOps tempo.

Example —Modeling Complexity

Let’s take an example of an overseas retailer who set out to build a sophisticated machine learning model. Their goal was to personalize product recommendations for customers. However, the model was too ambitious and complex, leading to its eventual failure.

Training this complex model demanded substantial time and resources. Data scientists and engineers had to put extra effort into experiments, research, and data validation. The complexity not only added to the project time but also required additional layers of validation and quality checks. Tasks such as interpreting, maintaining, debugging, and optimizing the model became much more challenging and time-intensive.

The model’s technical complexity increased operational costs. The client needed more resources and staff, and extra time to build the model pipeline, demonstrate the model, and conduct user acceptance tests. When the model failed in a live environment, the cost and time for retraining and rebuilding increased significantly.

The project scope was too ambitious and didn’t effectively address the business problem. They had made the model first, before considering the end use case. Our team spent most of our time either refactoring code or working with engineering teams to make it work to answer the problem.

The extensive time spent on training the complex model, combined with rising costs, started impacting the pace of operations. Business stakeholders became frustrated with the delays in actionable results and escalating operational costs. Eventually, these issues led to the project’s being cut.

Balancing model complexity with operational efficiency and cost is crucial for the successful implementation of machine learning projects.

Photo by Growtika on Unsplash

Regulatory requirements also change the ops tempo. Complying with different internal and external rules can speed or slow down development.

It gets more complicated especially if you are working with international clients or stakeholders. Data available in some geographic regions cannot be removed from one region and used in another. Some models in some geographic regions require more documentation.

It also extends to the data used to build the models, as well as the storage. GDPR and other regulations may limit the use of features used to build models. Teams need to implement proper data management practices and potentially adjust their models to maintain privacy, which can affect the overall operations tempo.

Data Regulations, Author’s Creation

Certain industries regulations may also require additional model validation or third-party audits. Model governance and documentation for these audits adds to the development time. It is critical that these needs are scoped prior to working with a business unit or client. These regulations may even ban the use of certain models.

With regulatory factors, data science teams often need to build custom solutions or compliant models, which adds extra development time and cost.

Example — Data Regulations

Let’s consider a financial client, working on a machine learning model to predict loan defaults. They had business in both North America and in the EU. The problem lay in the difference in data regulations.

In North America, the regulations were less strict. In the EU, with GDPR they were requried to have a stricter audit process — from the data to the models they used.

Their work was slowed down because they had to follow various international data protection laws. Transferring data was limited, and they had to create more paperwork. The General Data Protection Regulation (GDPR), a European law, required them to modify their models and strictly manage data to ensure user privacy.

To comply with these rules, I watched our team help the client created separate cloud environments. For European data, they built machine learning models in a GDPR-compliant cloud. Meanwhile, they stored data for North American customers in another cloud and used it for models targeting those customers.

Industry-specific regulations add more complexity to ML projects. They often require additional model validation, audits, and comprehensive documentation. Some these regulations even limited the types of predictive models our client could use. This required our team to develop custom, compliant solutions — which required extensive research and constant model compliance reviews.

This example illustrates how cross national regulatory compliance can add time, cost, and complexity to machine learning projects, significantly impacting the MLOps tempo.

Operations tempo, especially MLOps is not always tech. Tech is a factor that drives it. To get speed up your model development time, you need to

In summary:

  • Start with clear data strategy — have clear goals
  • Ensure good enough data quality and availability
  • Assess organizational factors: size, organization, and experience
  • Take into account: Infrastructure, resources, and tools
  • Ensure that model complexity isn’t overly complex — and still answers the business question
  • Check regulations — especially if you’re dealing with international data.

I write frequently about Data Strategy, MLOps, and machine learning in the cloud. Connect with me on Linkedin, YouTube, and Twitter





Source link

Leave a Comment