MLOps and data observability: What should you know?
How do MLOps and data observability interact and support each other? Here are nine things you should know about this partnership.
Best practices from data engineering are increasingly incorporating into machine learning frameworks. After all, ML systems are similar to other consumers of data.
As ML-teams set up processes, tooling, and infrastructure, they need access to healthy, reliable data pipelines. In this post, we cover nine things you should know about how data observability and MLOps interact and support your organization's entire data ecosystem.
1. Understand where observability sits in the ML lifecycle
Data observability monitors the internal workings of each table and pipeline in a data stack. Unlike traditional data quality assessment, which is based on pass/fail conditions, data observability continuously collects signals from datasets.
Similarly, when ML modeling is involved, it isn't enough to judge data at its ingestion point. All the way from ingestion to testing to validation, data models can break, compromising output accuracy. For businesses, this can filter into bad decision-making and lost revenue. Ideally, you'll be able to set up data observability across your entire MLOps - to safeguard against the snowballing consequences of bad ML models built with bad data.
2. Figure out which observability activities apply to your MLOps
Your MLOps framework is unique, so not all of the typical data observability activities apply to your organization. Data observability helps manage ML models in several ways. Which ones apply to you?
1. Input data monitoring: ML models depend heavily on input data. If there are issues in the data, such as missing values, outliers, or incorrect entries, these can significantly affect the model's performance. Data observability helps ensure the quality of the data by tracking and alerting about such anomalies.
2. Data drift signs: Over time, the statistical properties of input data to ML models may change, a phenomenon known as "data drift." This can degrade the model's performance, even if the model itself has not changed. Data observability tools can monitor the data for signs of drift, allowing teams to address the issue before it significantly impacts the model's performance.
3. Model performance monitoring: Even if an ML model was accurate at deployment, its performance can degrade over time due to changes in the data or the environment in which it operates. Data observability allows for the continuous monitoring of model performance and provides early warnings when model accuracy begins to drop.
4. Operational efficiencies: In many cases, data pipelines can break or fail silently, causing delays in data processing and ingestion. Data observability's real-time visibility into pipelines means data stays available when and where it is needed.
5. Upstream issue identification: Data observability can identify problems as far upstream as possible in your model training frameworks. For example, suppose you find 10,000 unusable images in your data set after retraining your model, wasting significant resources. Data observability halts these situations through data monitoring and alerting.
3. Focus on a single SLI
What metrics should you monitor when you have a machine learning model in production? There are numerous statistics, like KL divergences for features, or recall and/or precision, that are frequently talked about in the ML research context, but don't make sense for monitoring through an observability context.
Our suggestion is to start with a single “metric to rule them all” that is understandable to all stakeholders and directly tied to business value; for example, revenue. This is your service level indicator (SLI).
Once you’ve picked an SLI, the second question is: when that SLI changes, why is it changing? Now you can add additional metrics measuring data volume, freshness, correctness, model drift, or feature drift, to help you figure out why.
To summarize, changes in your SLI should trigger alerts, and other metrics are then used to debug what happened. Don't get overwhelmed by monitoring hundreds of irrelevant metrics across your system.
4. Don't only monitor models
ML models are not static systems. Data constantly flows in for both training and inference, and flows out via predictions, recommendations, and determinations. Outputs are labeled (by users or by an internal team), and used to improve the model.
While you may want to only monitor model outputs like "accuracy" or "recall," it actually makes more sense to monitor the whole data pipeline. That way, you'll minimize impact to the model.
Some concrete examples of what this looks like in real life include:
- Monitoring your production database and raw tables as they land in your data warehouse, prior to any transformations, for volume and freshness, to ensure that the "load" part of ELT has occurred correctly
- Monitoring transformed tables in data warehouses to ensure that initial cleanup was performed correctly
- Monitoring “feature” tables to ensure that feature calculations are correct
- Logging all model outputs into the data warehouse and monitoring them to detect model errors and drift
5. Standardize metrics across models
If you have multiple models/pipelines in production, standardize the metrics you monitor for each one. Try not to vary your metrics across each unique pipeline. Standardized metrics enable comparison and benchmarking between models.
6. Tag and filter metrics for targeted insights
When you are monitoring a number of models, pipelines, and data sources, the quantity of data generated can be overwhelming. Tag metrics based on attributes such as model type, data source, pipeline stage, team, so you can organize data and filter it in a way that delivers targeted insights.
For example, if you notice a dip in the performance of a particular model, filter all metrics by that model's tag. You'll quickly identify whether the problem is specific to that model, or part of a larger issue. Such a system also allows you to customize alerts based on specific tags, ensuring that the right people are notified about relevant issues.
7. Share metrics dashboards across teams
Data observability is a team effort. While a single SLI may be aimed at high-level stakeholders, share more detailed metrics and dashboards across teams - data scientists, data engineers, or product managers. You'll give all stakeholders a view of what's happening in real-time, fostering quicker decision-making and resolution of issues.
8. Consider commercial tools
While open-source tools like Prometheus and Grafana are great for getting started with data observability, as your system grows in complexity, you may find that you need more sophisticated tools.
Commercial vendors like Bigeye, Arize, and Weights & Biases offer powerful, purpose-built solutions for monitoring machine learning models, data, and systems. These tools come with features like automated anomaly detection, alerting, root cause analysis, and integration with popular data stack technologies, which can save you significant time and effort. They also provide advanced visualizations and dashboards that make it easier to understand and communicate about your data and models.
Finally, they often provide support and resources to help you get the most out of your observability efforts. There is a cost associated with these tools, but the benefits in terms of saved time, improved performance, and reduced risk can make them a worthwhile investment.
9. Foster a culture of continuous improvement
A prerequisite to both MLOps and data observability is ultimately a culture of continuous improvement. Use the insights gained from data observability to identify areas of improvement, whether it's a data pipeline that's consistently slow, an ML model that's degrading in performance, or a specific type of data issue that keeps recurring, and incorporate these insights into your planning and prioritization process. Encourage teams to focus on resolving these issues and improving the system's reliability and performance.
Neither data observability nor MLOps are one-time efforts, but instead are ongoing practices. Regularly review your metrics and alerts, refine your SLIs, and adjust your monitoring as your data pipelines and ML models evolve.
Monitoring
Schema change detection
Lineage monitoring