Thought leadership
-
July 10, 2023

Snowflake Summit 2023 and data observability: 5 key takeaways

This post dissects Snowflake Summit 2023, from a data observability perspective. What do the announcements and new features mean for data observability?

Liz Elfman

The desert city of Las Vegas might not be the first place you think of when the word “Snowflake” hits your ears, but it was indeed the epicenter of all things data last week. The annual Snowflake Summit 2023, a confluence of data professionals, industry leaders, and technophiles alike, was the biggest annual gathering yet: thought-provoking discussions, illuminating keynotes, and countless networking opportunities.

From the unveiling of groundbreaking technologies to visions for the future of data-driven businesses, the summit made waves that are sure to ripple through the industry for years to come. Whether you were able to experience the magic firsthand or, like many, were keeping up from a distance, there’s a lot to unpack. We’ve done the heavy lifting to distill some key takeaways from this incredible event.

1. Data observability: What a difference a year makes

Even from our niche vantage point, we’ve seen incredible growth in the recognition and mainstreaming of data observability. Last our conversations at Snowflake centered around “What is data observability?” This year, it was a different story.  

At our booth, most visitors already knew all about data observability. Their questions were not about education on the space, but more so how Bigeye fits into the data quality monitoring universe.

For the most part, the majority of folks we spoke to were still doing manual data quality monitoring.

2. AI: Not just the buzzword of the hour

In Snowflake CEO’s keynote speech, Frank Slootman underscored the importance of a data strategy to undergird an AI strategy. In the era of “generative AI,” organizations need to fundamentally prioritize their data strategy through robust, reliable data that works.

Snowflake announced “Document AI”, which will leverage Snowflake’s LLM to unlock deeper insights from documents faster, more simply, and with more accuracy. Snowflake has expanded support for Iceberg tables, which will extend Snowflake’s performance and governance to data stored in open formats.

So while AI and generative AI could be seen as buzzwords (a fact that Slootman even acknowledged during his keynote), there is real innovation to be seen when AI strategy is powered by solid, reliable data.

3. An industry-wide focus on simplifying data management

Snowflake’s new Snowpark Container Services (previewed) hopes to bring fully managed Kubernetes to customers. Even with the most complex data environments, there’s an industry-wide drive to bring apps closer to the data layer and simplify data management. Automation and ML will help, and data observability will likely be a critical factor in creating safeguards around this streamlining work. 4.

4. The Open Data Quality Framework from Alation

At the conference, Alation announced the launch of the Open Data Quality Framework to bring best-of-breed data observability capabilities into the Snowflake Data Cloud. This move formalizes the importance of data observability, further entrenching it as an important piece of the data quality puzzle (editor’s note: Bigeye is one of the launch partners for this initiative).

The Open Data Quality Framework aims to strengthen data governance for Snowflake by elevating data quality information. Its existence will catalyze data teams into enforcing data quality policies and rules, as well as assigning data quality stewards and helping to manage the entire data quality lifecycle from one platform.

5. The Snowflake Model Registry

Currently, most clients of Snowflake develop their models on Snowflake Data by utilizing external tools. Snowflake is working towards a comprehensive machine learning workflow, incorporating smooth integration of model administration and observability. Currently, many clients source LLM models from a variety of sources and modify them to fit their specific needs by optimizing them with data from Snowflake. These clients now have the ability to launch these models in Snowflake for inference via the Snowflake Model Registry (Preview). They have the capacity to add or remove model versions, as well as tagging them with the metadata model. Snowflake also intends to offer a Feature Store in the future to streamline feature management.

Final thoughts

Snowflake Summit 2023 has not just set the bar high, it has helped redefine the parameters of data observability. New data observability developments point towards a future where data isn't just passively managed but actively observed and controlled. It's clear that the data landscape is set to experience a seismic shift. As innovation spreads, we stand on the brink of a new era in data observability, and it's an exciting place to be. The future of data is not just about accumulation, but about smart, proactive observation, and the tools to do so are no longer a distant dream but a tangible reality. We're eager to see how these developments will shape the world in the coming years. Until then, let's keep the spirit alive by pushing the boundaries of what's possible with data.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.