Conference Recap: Key Trends from Snowflake and Databricks
A recap of key trends from the 2024 SnowBricks conference season.
Attended by thousands in San Francisco, the 2024 Snowflake and Databricks conferences were a melting pot of innovation, with Snowflake and Databricks making significant announcements that will shape the future of data management, AI, and analytics.
Here, we'll recap the key highlights from both conferences.
Snowflake's Big Announcements: Polaris, Cortex, and More
Polaris Catalog Integration with Iceberg
One of the major announcements from Snowflake was the introduction of the Polaris catalog, which integrates with Iceberg. This new feature allows users to access Iceberg data directly through Snowflake and other technologies. The goal is to open-source the Polaris catalog within 90 days, promoting a more accessible and collaborative data storage format. This interoperability is a game-changer, enabling data to be stored in Iceberg format and accessed by various compute engines like Spark, Snowflake, and Trino.
Cortex AI Suite
AI was a central theme at the Snowflake conference, with the introduction of Cortex, a suite of services designed to simplify AI and ML operations. Cortex includes features like chatbot creation through Snowflake Services, providing a seamless interface for AI tasks. The live demo showcased the ease of building AI applications with minimal SQL commands, highlighting Snowflake's commitment to making AI accessible within its ecosystem.
Governance and Observability with Horizon
Snowflake also emphasized data governance and observability through its Horizon suite. This set of tools includes features for labeling, lineage, privacy, and security, all integrated with the Polaris catalog. The ability to automate tagging and enforce governance policies directly within Snowflake ensures comprehensive data management. Additionally, the interoperability with external tools like Jira, Slack, and email for alerting enhances the observability capabilities of Snowflake.
NVIDIA Integration with Nemo Retrieval Framework
In a significant move, Snowflake announced the integration of the Nemo retrieval framework from NVIDIA, enhancing its AI capabilities. This integration aids in the efficient tokenization and embedding of unstructured data, making it easier to build AI applications. The collaboration with NVIDIA ensures that Snowflake users can leverage powerful AI models and tools directly within the Snowflake environment.
Databricks' Innovations: Unity Catalog, Mosaic AI, and More
Unity Catalog Goes Open Source
Databricks made a bold statement by open-sourcing Unity Catalog during their keynote. This move aims to eliminate fragmented governance and promote interoperability. Unity Catalog serves as a multimodal governance layer, supporting data, ML models, and AI within a single catalog. This comprehensive approach ensures seamless integration and management of diverse data assets.
Mosaic AI: Democratizing AI
Mosaic AI was another highlight, showcasing Databricks' commitment to making AI accessible across organizations. Mosaic includes an agent framework for building AI applications, a model training toolkit, and governance features. The live demo featuring a Shutterstock image model built on Mosaic illustrated the platform's potential to leverage proprietary data effectively.
Lakeflow: Simplifying Data Pipelines
Databricks introduced Lakeflow, a GUI interface for creating data pipelines natively within Databricks. This tool simplifies the process of building and managing data pipelines, from extraction and loading (via point-and-click CDC connectors) to transformation and orchestration. The integration of AI-powered suggestions further enhances the efficiency and accuracy of pipeline creation.
AI & BI with ABI
Databricks also announced ABI, a low-code data visualization tool with AI agents for building visualizations. This feature includes Genie, an AI-driven natural language querying tool that learns and adapts to user queries. The focus on making data visualization and analysis more intuitive and accessible aligns with the broader trend of democratizing data and AI tools.
Key Takeaways and Trends
AI and ML Integration
Both Snowflake and Databricks are heavily investing in AI and ML integration, emphasizing the importance of making these technologies accessible and efficient within their platforms. The focus on simplifying AI operations and integrating AI tools directly into data platforms is a clear indication of where the industry is headed. The era of enterprise AI is here, and organizations should leverage these tools to stay competitive.
Data Governance and Interoperability
Data governance remains a critical theme, with both companies enhancing their governance features. The move towards open-source catalogs and interoperability ensures that organizations can manage their data more effectively while avoiding vendor lock-in. This trend towards comprehensive, integrated governance solutions is likely to continue.
Vertical Integration and Ecosystem Expansion
Both Snowflake and Databricks are expanding their ecosystems, integrating more features and tools to provide a comprehensive data management solution. This vertical integration strategy aims to make their platforms a one-stop-shop for all data needs, from storage and compute to AI and governance.
The Rise of Small Language Models
An emerging trend highlighted at the Databricks conference was the focus on small language models for specific tasks. This approach contrasts with the pursuit of AGI and emphasizes the practical application of AI for targeted use cases. This trend is likely to gain traction as organizations seek efficient and effective AI solutions. Small language models can provide significant value with lower computational costs and faster deployment times.
Conclusion
The announcements at Snowflake and Databricks have set the stage for the next wave of data technology. With significant advancements in AI integration, data governance, and ecosystem expansion, companies are pushing the boundaries of what is possible.
Monitoring
Schema change detection
Lineage monitoring