Thought leadership
-
August 29, 2024

Monitoring vs. Lineage: Why You Need Both For Data Observability Success

Why can't I just pick monitoring or lineage for my data strategy? Isn’t one enough?

Kyle Kirwan

For those of us responsible for delivering analytics to business users, the story is all too familiar. You spend countless hours perfecting a dashboard, ensuring every metric is just right. But all it takes is one data issue for trust to evaporate—after you ship the report something goes wrong with the data and your stakeholders spot the issue before you do. According to Bigeye’s 2023 State of Data Quality Report, 70% of business leaders admit they lack trust in analytics dashboards due to regular data quality incidents.

Despite the heavy investments enterprise data teams make in monitoring tools and processes, they still struggle to answer the age-old question from data consumers: “Is the data in my dashboard reliable?” 

That’s where the combination of monitoring and lineage comes in.

Monitoring can tell you when something goes wrong, but it’s lineage that helps you understand where the issue originated and how it impacts the entire data ecosystem. Together, they form the backbone of data observability—ensuring you can trust your data and deliver reliable insights every time. In this article, we'll explore why you need both to achieve true data observability and avoid those dreaded "that data doesn't look right" moments.

Monitoring: What's Happening With Your Data?

Monitoring is your first line of defense when it comes to data health. It's like having a 24/7 health tracker for your data, constantly checking for signs of trouble—missing values, unexpected spikes, or schema changes. Monitoring alerts you to what’s going wrong, giving you the chance to address issues before they snowball into bigger problems.

Unlike traditional data quality rules or tests, monitoring can catch anomalies you might not have anticipated. While data quality rules are essential for specific checks, monitoring provides a broader safety net, identifying potential issues whether you've seen them before or not.

But here's the catch: monitoring can only tell you so much. It might flag that something’s wrong, but it doesn’t tell you where that problem originated from or how to fix it. It also doesn't explain what the impacts of that issue are. That’s where lineage comes in.

Lineage: Mapping the Issue

Data lineage provides the context that monitoring alone can’t. It shows you the complete journey of your data—where it comes from, where it’s going, and how it’s being transformed along the way. Imagine finding out that a critical data feed has gone haywire. With lineage, you can quickly trace its path and see exactly which dashboards, reports, and models will be affected.

Let's consider a real life example. Suppose your finance team is analyzing sales data, but during an ETL process, a column of prices gets stored as integers instead of decimals. Suddenly, amounts like $1.37 get clipped to $1.00, leading to millions of missing pennies across transactions. This small mistake snowballs into inaccurate sales totals and potential financial reporting issues. Data observability could catch this anomaly before it impacts your revenue and decision-making.

Without lineage, you're stuck playing detective, trying to piece together the impact of any anomaly. It’s like having a roadmap for your data, showing you how everything is interconnected. And the beauty of lineage is its ability to trace issues back to their source, allowing you to pinpoint exactly where things went wrong.

The Dynamic Duo: Monitoring and Lineage

When it comes to data observability, monitoring and lineage are like peanut butter and jelly—they're good on their own, but together, they're unbeatable. You need both to truly understand your data's health and minimize the impact of issues that do arise. Together, monitoring and lineage give you a complete picture of your data environment, helping you ensure the business is making informed decisions with confidence.

Sure, there might be times when one takes the spotlight, for example, in compliance projects where lineage is crucial for tracing data origins and transformations. In a circumstance like that, your main goal is to show exactly where your data comes from and how it’s been handled. Lineage is your go-to here. 

But if the same data pipeline that you include in compliance reporting is also used to power analytics dashboards, then, you’ll also need monitoring to ensure that data is accurate and up-to-date. And, if there’s an anomaly in that data, monitoring is what will catch it, ensuring your compliance reports are trustworthy.

So next time you think about data observability, remember: it’s not about choosing between monitoring or lineage. By combining the two, you can transform how your business sees your data from skeptical to confident.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.