Apache Iceberg: The Definitive Guide
Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
Book Details
| Authors | Tomer Shiran, Jason Hughes, Alex Merced |
| Publisher | O'Reilly Media |
| Published | 2024 |
| Edition | 1st |
| Paperback | 344 pages |
| Language | English |
| ISBN-13 | 9781098148614, 9781098148621 |
| ISBN-10 | 1098148614, 1098148622 |
| License | Compliments of Dremio |
Book Description
Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool - a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way.
Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg.
With this open book, you'll learn:
- The architecture of Apache Iceberg tables
- What happens under the hood when you perform operations on Iceberg tables
- How to further optimize Iceberg tables for maximum performance
- How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio
Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.
This book is published as open-access, which means it is freely available to read, download, and share without restrictions.
If you enjoyed the book and would like to support the author, you can purchase a printed copy (hardcover or paperback) from official retailers.
Download and Read Links
Share this Book
[localhost]# find . -name "*Similar_Books*"
Delta Lake: The Definitive Guide
Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Pra
Kafka: The Definitive Guide, 2nd Edition
Every enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the data itself. With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motio
Cassandra: The Definitive Guide, 3rd Edition
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third edition - updated for Cassandra 4.0 - provides the technical details and practical examples you
Jenkins: The Definitive Guide
Streamline software development with Jenkins, the popular Java-based open source tool that has revolutionized the way teams think about Continuous Integration (CI). This complete guide shows you how to automate your build, integration, release, and deployment processes with Jenkins - and demonstrates how CI can save you time, money, and many headac
The Definitive Guide to Graph Databases
First and foremost, the authors did not write this book to criticize relational databases or undermine a still-valuable technology. Without relational databases, many of today's most mission-critical applications would not function, and without the early innovations of RDBMS pioneers, modern database technology would not have advanced as far as it
Critical Data Literacy
A short course for students to increase their proficiency in analyzing and interpreting data visualizations. By completing this short course students will be able to explain the importance of data literacy, identify data visualization issues in order to improve their own skills in data story-telling. The intended outcome of this course is to help s