Jeffrey Erickson | Senior Writer | November 17, 2025
Iceberg tables were created by Netflix engineers who had pushed the streaming service’s real-time analytics beyond what its Apache Hive-based data warehouses could handle. Their solution? Develop a table format that lived over in the bulk data storage layer. They dubbed the format Iceberg tables and found that the new system did indeed provide a more scalable and predictable platform for dealing with massive, frequently updated data stores.
Seeing that the Iceberg technology solved an issue commonly faced by users of large, complex data lakehouses, the company open sourced and donated the project to the Apache Software Foundation in 2018. Iceberg quickly found an enthusiastic user base and in 2020 became a top-level Apache project. It’s now used by a community of developers and is embraced by hyperscale cloud data management systems, such as Oracle Autonomous AI Database. Here’s a look at what Iceberg can do and how to use it.
Apache Iceberg is an open source table format designed to simplify the management of vast data lakes and data lakehouses while improving query performance.
Iceberg tables differ from traditional relational tables found in databases such as Postgres, MySQL, or Oracle. Relational tables store both metadata and data in the database where the data is processed and are well suited for structured application data. Moreover, strict relationships between tables in the database can be enforced. Iceberg tables, on the other hand, store both data and metadata in some form of file system storage layer, such as your local file system, Amazon S3, Google Cloud Storage, or Oracle Object Storage. This separation of data and metadata storage and compute decouples data processing from the data itself and gives end users the flexibility to choose the processing engine that is right for their specific needs.
Iceberg tables facilitate data analytics by combining the scalability and flexibility of data lakes with the reliability and performance of traditional data warehouses, making Iceberg tables a popular format for data lakehouses. For example, they support both real-time analytics and batch processing workloads on vast amounts of unstructured data while also providing ACID-compliant transactions. ACID transactions help ensure data integrity by providing automaticity, consistency, isolation, and durability to database transactions and are fundamental to relational databases and data warehouses. Iceberg tables extend this integrity to data lakehouse transactions.
Iceberg tables also provide flexibility, allowing you to specify different data types for different workloads. For batch analytics, Iceberg supports columnar formats, like Parquet or ORC, and row-based formats, such as Avro. To use Parquet files with Iceberg, for example, you would create an Iceberg table and configure it to use Parquet as the data file format using tools provided by Iceberg. Iceberg tables also support functions for efficient querying, such as data partitioning and pruning, which reduce the amount of data scanned per query.
Another strength of the Iceberg table format is its advanced data versioning features, such as snapshots and time travel. With time travel, each change to the table is recorded as a new snapshot, allowing users to query the table at any point in its history. This feature is highly useful for auditing, debugging, and compliance purposes—and, of course, rolling back changes if necessary.
Iceberg tables are an open source table format designed for large-scale analytics in data lakes. Iceberg tables are self-describing, meaning they contain metadata about their schemas and data, which can simplify querying and management.
Key Takeaways
Iceberg tables are an open source table format that’s managed by the Apache Foundation and designed for large-scale data analytics in data lakehouses. Apache Iceberg tables address the challenges of managing and querying large data sets in environments where multiple users or processes are writing to the same data set.
To accomplish this, Iceberg tables provide several key architectural features and management benefits. First, Iceberg tables are self-describing, meaning they contain their own metadata about their schemas and data. They reside in a cloud storage layer, such as Amazon S3, Oracle Object Storage, Google Cloud Storage, or Hadoop Distributed File System (HDFS), rather than being part of a database or other query engine. This provides many advantages. For example, because they are held separate from the compute needed to process queries, Iceberg tables can be used with your data processing engine of choice, such as Apache Spark and Flink, Trino, and Oracle Autonomous AI Database or other enterprise data management systems.
Iceberg tables provide a range of helpful features for managing and querying large, complex data lakehouses, such as data versioning and ACID transactions. Moreover, Iceberg's schema evolution capabilities let organizations adapt their data schemas over time without the need for complex and time-consuming table rewrites. Robust data versioning features include time travel capabilities, which help maintain a clear and traceable history of data changes and allow users to query data as it existed at a previous point in time.
Iceberg tables offer flexibility in terms of data formats and processing engines, so engineers can select the data storage format best suited for each workload, such as batch analytics or transactions.
Data security in Iceberg table architectures can be achieved through a combination of the native features delivered by the Apache Foundation and those provided by cloud storage vendors and the data management systems accessing and querying the Iceberg table. The result will be a mix of data encryption, access controls, data masking, and other data governance features, as well as replication and synchronization capabilities for recovery from failures or attacks.
Apache Iceberg is an open table format designed to help manage and query petabyte-scale data sets efficiently. It was developed to address the limitations of existing file formats for data lakes, which didn’t provide a way to manage data at the table level.
Unlike traditional tables, Iceberg tables were built to handle the complexities of big data sets that incur frequent updates, deletions, and schema evolution. Iceberg has proven to be a robust and efficient way to manage data, whether you’re performing batch processing, real-time streaming, or interactive queries. Here’s how.
The many benefits of Apache Iceberg make it a compelling choice for organizations looking to manage and process large data sets efficiently and reliably.
Apache Iceberg is a high performance table format for big data workloads that supports a range of data types and multiple query engines as well as ACID transactions. Here are five common use cases for Apache Iceberg.
Iceberg tables were created by engineers at Netflix to bring more speed and flexibility to streaming analytics and their real-time recommendation engines, and it does an excellent job for this sort of use case. Now an open source project run by the Apache foundation, Iceberg is still used by Netflix, as well as a growing number of organizations, including the following:
Oracle Autonomous AI Database supports Apache Iceberg tables. If your data sets are already in Iceberg format on a different cloud, they can be easily read by Oracle Autonomous AI Database, reducing data duplication and enhancing the flexibility of your operations. Now you can take advantage of Iceberg tables for managing large, complex data sets and optimizing query performance—while also enjoying cross-platform data accessibility with your Oracle data management platform, where you can build scalable AI-powered apps using your choice of large language model (LLM) and deploy in the cloud or your data center.
Apache Iceberg has forever transformed the data management landscape. It arrived as companies like Netflix were struggling to manage and query massive data sets and mixed data types. As these conditions have become more common, Iceberg tables have become a staple of the modern data ecosystem. Look for the trend to continue as more businesses look to streamline their data pipelines and derive insights from big data analytics.
Is your data infrastructure set up to handle similarity search and other AI initiatives? Our ebook lays out a plan to build a data foundation robust enough to support AI success.
Is Apache Iceberg better than Delta Lake?
Neither Delta Lake nor Apache Iceberg can be considered generally superior. Both are table formats for managing data sets for analysis. A simple way to think about the differences is that Delta Lake does best within the Spark ecosystem, while Iceberg offers wider compatibility and works well with a range of data management engines.
What is Apache Iceberg used for?
Apache Iceberg is used to simplify the management and querying of very large, complex, multiplatform data sets into a single, unified data layer for analysis processes.