Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines.
Summary
A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task.
About the book
Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs.
What's inside
Build, test, and deploy Airflow pipelines as DAGs
Automate moving and transforming data
Analyze historical datasets using backfilling
Develop custom components
Set up Airflow in production environments
About the reader
For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills.
About the author
Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer.
Table of Contents
PART 1 - GETTING STARTED
1 Meet Apache Airflow
2 Anatomy of an Airflow DAG
3 Scheduling in Airflow
4 Templating tasks using the Airflow context
5 Defining dependencies between tasks
PART 2 - BEYOND THE BASICS
6 Triggering workflows
7 Communicating with external systems
8 Building custom components
9 Testing
10 Running tasks in containers
PART 3 - AIRFLOW IN PRACTICE
11 Best practices
12 Operating Airflow in production
13 Securing Airflow
14 Project: Finding the fastest way to get around NYC
PART 4 - IN THE CLOUDS
15 Airflow in the clouds
16 Airflow on AWS
17 Airflow on Azure
18 Airflow in GCP
About the Author
Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies including Heineken, Unilever, and Booking.com. Bas is a committer, and both Bas and Julian are active contributors to Apache Airflow.
Ebook License
End-User Warranty And License Agreement
1. Grant Of License
Manning Has Authorized The Download By You Of An Unrestricted Number Of Copies Of The Electronic Book (Ebook) In Any Of The Available Formats. Manning Grants You A Nonexclusive, Nontransferable License To Use The Ebook According To The Terms And Conditions Herein. This License Agreement Permits You To Install The Ebook On Any And All Your Devices For Your Personal Use Only.
2. Restrictions
You Shall Not: (1) Share, Resell, Rent, Assign, Timeshare, Distribute, Or Transfer All Or Part Of The Ebook Or Any Rights Granted Hereunder To Any Other Person; (2) Duplicate The Ebook, Except For A Single Backup Or Archival Copy; (3) Remove Any Proprietary Notices, Labels, Or Marks From The Ebook; (4) Transfer Or Sublicense Title To The Ebook To Any Other Party.
3. Intellectual Property Protection
The Ebook Is Owned By Manning And Is Protected By United States And International Copyright And Other Intellectual Property Laws. Manning Reserves All Rights In The Ebook Not Expressly Granted Herein. This License And Your Right To Use The Ebook Terminate Automatically If You Violate Any Part Of This Agreement. In The Event Of Termination, You Must Remove The Original And Any Copies Of The Ebook From All Your Devices.
4. Source Code Supplementary Material
Any Source Code Files Provided As A Supplement To The Book Are Freely Available To The Public For Download. Reuse Of The Code Is Permitted, In Whole Or In Part, Including The Creation Of Derivative Works, Provided That You Acknowledge That You Are Using It And Identify The Source: Title, Publisher And Year.
5. Limited Warranty
Manning Warrants That The Ebook Files, A Copy Of Which You Are Authorized To Download, Are Free From Defects In The Operational Sense That They Can Be Read By A Pdf Reader Or Epub Reader, Or Other. Except For This Express Limited Warranty, Manning Makes And You Receive No Warranties, Express, Implied, Statutory Or In Any Communication With You, And Manning Specifically Disclaims Any Other Warranty Including The Implied Warranty Of Merchantability Or Fitness Or A Particular Purpose. Manning Does Not Warrant That The Operation Of The Ebook Will Be Uninterrupted Or Error Free. If The Ebook Was Purchased In The United States, The Above Exclusions May Not Apply To You As Some States Do Not Allow The Exclusion Of Implied Warranties. In Addition To The Above Warranty Rights, You May Also Have Other Rights That Vary From State To State.
6. Limitation Of Liability
In No Event Will Manning Be Liable For Any Damages, Whether Arising For Tort Or Contract, Including Loss Of Data, Lost Profits, Or Other Special, Incidental, Consequential, Or Indirect Damages Arising Out Of The Use Or Inability To Use The Ebook.
7. General
This Agreement Constitutes The Entire Agreement Between You And Manning And Supersedes Any Prior Agreement Concerning The Ebook. This Agreement Is Governed By The Laws Of The State Of New York.