Databricks to acquire storage platform maker Tabular

By Anirban Ghoshal

Databricks has agreed to acquire Tabular, the storage platform vendor led by the creators of Apache Iceberg, in order to promote data interoperability in lakehouses.

Tabular founders Ryan Blue and Daniel Weeks started developing Iceberg at Netflix in 2017 and donated it to the Apache Software Foundation in 2018, around the same time that Databricks was developing Delta Lake, an open-source table format for data that can be used for ACID transactions or OLTP processing. In contrast, Apache Iceberg is mostly used for OLAP queries as it has challenges around concurrency writes.

In June 2022, Databricks open sourced all Delta Lake APIs as part of its Delta Lake 2.0 release and said that it would contribute all enhancements of Delta Lake to The Linux Foundation.

Prior to open sourcing Delta Lake, competitors such as Cloudera, Dremio, Google (Big Lake), Microsoft, Oracle, SAP, AWS Snowflake, HPE (Ezmeral) and Vertica had criticized the company, casting doubt whether Delta Lake was open source or proprietary, thereby taking away a share of prospective customers.

With the acquisition of Tabular, Databricks said that it will support the two leading open source table formats for lakehouses, and also expand support for its UniForm Tables.

“Databricks intends to work closely with the Delta Lake and Iceberg communities to bring format compatibility to the lakehouse; in the short term, inside Delta Lake UniForm and in the long term, by evolving toward a single, open, and common standard of interoperability,” the company said in a statement.

UniForm (Universal Format), is a new table format released in June 2023 that provides interoperability across Delta Lake, Iceberg, and Hudi, and supports the Iceberg restful catalog interface.

Snowflake and Iceberg Tables versus Databricks and Delta Live Tables

Analysts, too, see the Tabular acquisition as a means for Databricks to support more robust interoperability.

“We’ve seen before, companies often acquire the talent behind important open source projects as a means of gaining a strong voice among the project’s open source community of developers,” said Bradley Shimmin, chief analyst at Omdia.

“The founders of Tabular joining Databricks may translate into improved compatibility between Delta Lake and the Iceberg standard, which will give Databricks an advantage over Snowflake in supporting customers with a heavy reliance upon data external to the Snowflake platform,” Shimmin explained.

However, the chief analyst pointed out that the acquisition is unlikely to hinder Snowflake’s use of Iceberg as Blue and Weeks had long since open-sourced the project and donated it to the Apache Software Foundation.

Constellation Research’s principal analyst also believes that Apache Iceberg has already eclipsed all other standards and Databricks’ foray into creating interoperability for the table format will even push it further towards becoming the dominant table standard.

Further, analysts pointed out that the rivalry is not simply between the two open table formats but encompasses Snowflake and Databricks.

“The timing of this deal is obviously intended to grab some of the Snowflake Summit limelight, and to try to outdo its competitor on openness messaging with the suggestion that it will have huge influence over the future of the Iceberg standard as well as Delta Lake,” Henschen said.

Snowflake, too, this week showcased its Polaris Catalog and said that it was going to open source the data catalog in the next 90 days.

Polaris Catalog is a data catalog built atop Iceberg in order to address enterprises’ need to access a vendor-neutral offering that comes with data governance capabilities and supports interoperable query engines.

The launch of Polaris catalog, which is similar to Databricks’ Unity Catalog, according to analysts, was a strategy employed by Snowflake to lure data catalog users away from rival Databricks while bolstering the attractiveness of its own offering.

Amalgam Insights’ chief analyst also seconded Henschen and said that both the data lakehouse providers are trying to show that they are better suited to support the enterprise data environment across a variety of data formats and types.

“Databricks gains from this acquisition as it shows that it can support Iceberg, which arguably is the most supported table format,” Park explained, adding that though Databricks has traditionally been a good open source contributor for its self-developed projects, Iceberg’s contributor community is now much larger than Tabular with the commitments that exist from many large vendors.

However, Henschen pointed out that there are too many interested parties for any one company to dominate Iceberg although Tabular’s acquisition might give Databricks an edge on the Iceberg front.

Databricks versus Snowflake: A competition in acquisitions

Databricks has been acquiring companies lately and earlier in March, Databricks acquired Boston-based Lilac AI to help enterprises explore and use their unstructured data for building generative AI-based applications.

Prior to that, Databricks acquired LLM and model-training software provider MosaicML for $1.3 billion to boost its generative AI offerings around June 2023.

Before the Lilac AI and MosaicML acquisition, the company had acquired AI-centric data governance platform provider Okera for an undisclosed sum in May last year.

The acquisition was expected to boost Databricks’ data governance capabilities while training and managing large language models (LLMs), such as its proprietary open source Dolly 2.0 LLM.

Snowflake, too, has been acquiring companies that not only boost its generative AI offerings but also bolster its capabilities around data management.

Its latest acquisition came in the form of the company buying assets from an observability platform providing firm TruEra—a startup that also specializes in providing lifecycle management capabilities for machine learning and LLMs.

Last year in May, the cloud-based data warehouse company acquired Neeva, a startup based in Mountain View, California, for an undisclosed sum in an effort to add generative AI-based search to its Data Cloud platform.

In February 2023, Snowflake acquired LeapYear to boost its data clean room abilities.

The LeapYear acquisition came just a month after Snowflake agreed to buy artificial intelligence-based time series forecasting platform provider Myst AI, taking the company’s acquisition count to seven companies in three years.

© Info World