Author | Matt Palmer |
Publisher | O'Reilly Media |
Published | 2024 |
Edition | 1 |
Paperback | 107 pages |
Language | English |
ISBN-13 | 9781098159238, 9781098159252 |
ISBN-10 | 1098159233, 109815925X |
License | Compliments of Databricks |
Extract, transform, load (ETL) is at the center of every application of data, from business intelligence to AI. Constant shifts in the data landscape - including the implementations of lakehouse architectures and the importance of high-scale real-time data - mean that today's data practitioners must approach ETL a bit differently.
This updated technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You'll come away equipped to make informed decisions when implementing ETL and confident about choosing the technology stack that will help you succeed.
- Discover what ETL looks like in the new world of data lakehouses
- Learn how to deal with real-time data
- Explore low-code ETL tools
- Understand how to best achieve scale, performance, and observability
This book is published as open-access, which means it is freely available to read, download, and share without restrictions.
If you enjoyed the book and would like to support the author, you can purchase a printed copy (hardcover or paperback) from official retailers.
So someone has heard about graph databases and wants to understand what all the buzz is about. Are they just a passing trend - here today and gone tomorrow - or are they a rising tide that businesses and development teams can't afford to ignore? Whether they're a business executive or a seasoned developer, something - perhaps a pressing business ch
Modern C focuses on the new and unique features of modern C programming. The book is based on the latest C standards and offers an up-to-date perspective on this tried-and-true language. C is extraordinarily modern for a 50-year-old programming language. Whether you're writing embedded code, low-level system routines, or high-performance applicatio
Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Pra
This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed wit
Web development is an evolving amalgamation of languages that work in concert to receive, modify, and deliver information between parties using the Internet as a mechanism of delivery. While it is easy to describe conceptually, implementation is accompanied by an overwhelming variety of languages, platforms, templates, frameworks, guidelines, and s
Is Kubernetes ready for stateful workloads? This open source system has become the primary platform for deploying and managing cloud native applications. But because it was originally designed for stateless workloads, working with data on Kubernetes has been challenging. If you want to avoid the inefficiencies and duplicative costs of having separa