Product
-
December 7, 2022

Alation + Bigeye: Enabling Everyone to Understand the Health of Their Data

Bigeye enables data teams to operate increasingly complex data ecosystems with the assurance that if something goes wrong, they will know before the business is affected.

Egor Gryaznov

Bigeye enables data teams to operate increasingly complex data ecosystems with the assurance that if something goes wrong, they will know before the business is affected. Data teams at companies like Clubhouse, Instacart, and Udacity leverage Bigeye’s data observability platform to automatically detect freshness and quality issues in real-time and give them a deep understanding of the health of the company’s data. Another crucial part of data quality and ensuring reliable data pipelines, however, is communicating the health of that data to the stakeholders that depend on it (check out my presentation on reliable data pipelines for more).

With our integration with Alation, we now offer a way to let everyone in the organization know the health of the data they depend on at any time – a great complement to the data quality SLAs already in the Bigeye platform. Thousands of analysts and stewards use Alation to find data and get the context they need to understand how to use it – and now the health of their data is always at their fingertips, easily viewable directly within their data catalog.

Integrating Bigeye and Alation brings huge advantages to data teams and their stakeholders:

  • The ability to see the health of their data surfaced in Bigeye, right inside Alation’s data catalog
  • Alation lineage carries Bigeye warnings downstream showing every impacted dataset and alerting potentially impacted users
  • Bigeye warnings appear directly inside Alation Compose to keep queries high quality

Communicating Data Quality

Bigeye automatically monitors the data, looking at nine categories of potential issues, encompassing more than 60 data quality metrics that measure freshness, volume, distributions, formats, and more. Data teams use Bigeye to prevent data quality and pipeline issues from reaching their stakeholders, and ultimately, from impacting their business.

For example, let’s say there is a weekly sales forecast dashboard that the sales team reviews every Friday. Normally, the new sales data is loaded into Snowflake on Thursday and updated in the dashboard before the Friday meeting. This week, however, there was an unexpected data pipeline issue, and the datasets feeding the dashboard aren’t receiving any new records. Under normal circumstances, the sales team wouldn’t have noticed the problem until Friday — too late to take any action. With Bigeye, the data team is automatically alerted as soon as the issue starts occurring,  and with that early warning, they have time to get the pipeline back up and running before the dashboard is impacted — keeping the sales team moving fast, and preserving trust in the data.

In this example, only the sales team is affected by the pipeline issue, and there’s plenty of time to fix the issue before the meeting, but in many organizations, an important data set will have numerous stakeholders. The same data that feeds the sales team’s dashboard might also be used by the marketing team for planning campaigns; the data science team for forecasting experiments; the product team for use in internal tools, and any number of ad hoc queries.

While the data team is fixing the pipeline problem, any number of analytics projects may be using problematic data. It’s often impossible for the data team to know everyone that needs to be updated when there is an issue – and this is where the power of Bigeye + Alation comes in.

With the integration between Bigeye and Alation, the data team can stay laser-focused on resolving the issue without needing to scramble to figure out how to communicate status to everyone that could be affected. Because Bigeye automatically and instantly flags all impacted data in Alation, every data steward, analyst, and consumer of the data has a real-time view of the health of their data, without needing to ask the data team.

From an Alation catalog page, data consumers get a table-level view that highlights the warning. From there, they can drill down to the column level to see exactly what has gone wrong, what thresholds were crossed, and what the status of remediation is.

If a data consumer is writing a query with Compose, Alation’s query tool, they will also be able to see whether there is an issue in real-time. If they query a data source with an issue detected by Bigeye, a warning clearly indicates that an issue has occurred.

Going further, the integration leverages Alation’s impact analysis capabilities to flag any data or analytics affected by an issue detected with Bigeye. Stewards and owners of the downstream data will be notified and anyone looking to leverage downstream data and analytics will find a warning, ensuring that decisions aren’t made on affected data while an issue is being addressed. And once the issue is fixed, everyone stewarding or subscribed to the data or any downstream analytics will be notified that the issue has been resolved.

Creating Trust in Data

With Bigeye, the data team can focus on what’s important, knowing that if there is an issue, they will be alerted. With Bigeye and Alation, now the data team’s stakeholders, the consumers of the data, have the same level of assurance. That level of assurance builds trust in the data, helping companies make data-driven decisions with confidence. Interested in seeing how Alation and Bigeye work together? Schedule a meeting with us to get a demo.  

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.