Product
-
February 2, 2023

Intro to Bigeye Collections

How do you manage data quality metrics at enterprise scale? In this post, we'll walk through it.

Jon Hsieh

As your data observability operation grows in size and sophistication, manually managing individual checks isn't practical, or even feasible. Bigeye is an enterprise-ready data observability platform with helpful and proven ways to scale up the scope of monitoring. Bigeye will also let you increase the number of teammates that can contribute to the data engineering practice. In this post, I want to highlight Bigeye Collections (formerly SLAs): an efficient and powerful way to manage data quality metrics at enterprise scale.

Bigeye Collections provide three main capabilities:

1. Organize related metrics

2. Get a quick summary of collection performance and health

3. Consolidate and route notifications to the right people

Let’s dive deeper into those specific benefits:

Organizing related metrics

You could be a “full-stack data engineer” or a small team responsible for a small number of data pipelines. Whatever the case may be, when you initially deploy Bigeye, you’ll probably find that deployed metrics are relevant, and the default global view is sufficient.

But as your team and use cases grow, you could be responsible for monitoring hundreds, thousands, or even tens of thousands of tables and metrics. At this scale, trying to manage and make sense of everything is impractical and overwhelming.  

To solve this conundrum, Bigeye Collections let you gather and organize related metrics—a simple act that produces powerful results by creating context and helping your team focus on the job at hand, including triaging relevant issues.

Presenting a summary

The Collection list view gives you a quick visual and descriptive summary of the status of your selected metrics. From there, you can quickly see which metrics are alerting (an upper bound on the number of open issues you should have) and easily determine if a Collection is healthy or may need attention.

As a data producer, you can get detailed "per metric" notifications if you are responsible for the ops. As a consumer, you can send summary notifications to folks who just need a traffic light level status. You can also drill down to view issues on the metrics and take actions on them.

Routing notifications

Dramatically save on toil by setting up notifications on your Collections.  Instead of setting up a notification on each metric, you can create your Collection and have all the metrics associated with it send notifications to the same Slack channel, email, or webhook.  

With this simple capability, you can control what groups send information to different channels.  This filter helps manage different teams, different priorities, and different facets of a data pipeline.  We’ll discuss the details in future blog posts.

Summary

Bigeye Collections are a key feature that helps you organize and scale as your data,  your data team, and your data consumers grow.  Customers using collections can handle one or two orders of magnitude more data, metrics, and teams.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.