Skip to content

Multi-Tenant Data Warehouse ETL

MWAA Airflow pipeline moving 40+ tables across 10+ tenants into Aurora PostgreSQL - 85% failure reduction, 99% data accuracy

85%Failure Reduction

Problem

Data from 10+ enterprise tenants was siloed across S3 CSV exports and Pendo analytics with inconsistent schemas. Cross-tenant analytics was impossible. Large CSVs (100MB+) caused memory failures. Pipeline failures had no isolation - one bad table took down the whole run.

Solution

  • Built Apache Airflow DAGs on AWS MWAA with config-driven transformations parameterized per tenant.
  • Sub-DAG triggers via TriggerDagRunOperator chain dependent workflows across tables and tenants.
  • AWK splits 100MB+ CSVs into ~80MB chunks for parallel streams — 30 concurrent batch operations with Pandas ETL.
  • Per-table failure isolation ensures one bad table never takes down the entire pipeline; retry logic with dead-letter handling eliminates cascading failures.
  • Row validation with S3-based audit logging and email alerts on failure ensure 99% data accuracy.
  • Containerized Airflow workers on K8s for elastic scaling during peak ETL windows; Aurora PostgreSQL with staging and reporting schemas as the target warehouse.

System Flow

Sources

S3 CSVs
Pendo API
Config CSVs

Orchestration

MWAA DAGs
Sub-DAG Triggers

Processing

AWK Chunking
Pandas ETL

Quality

Row Validation
S3 Audit Logs

Warehouse

Aurora PostgreSQL
K8s Pods

Architecture

  • 01Apache Airflow DAGs on AWS MWAA - config-driven per-tenant parameterization
  • 02Sub-DAG triggers via TriggerDagRunOperator for dependent workflow chaining
  • 03AWK chunking for 100MB+ CSVs (~80MB chunks) - 30 concurrent parallel batch operations
  • 04Pandas ETL with per-table failure isolation, retry logic, and dead-letter handling
  • 05Row validation with S3-based audit logging and email alerts on failure
  • 06Containerized workers on K8s for elastic peak scaling
  • 07Aurora PostgreSQL with staging + reporting schemas

Impact

  • 40+ production tables across 10+ tenants processed daily
  • 85% pipeline failure reduction via chunking and per-table isolation
  • 99% data accuracy across all customer instances in production
  • Cross-tenant analytics unlocked for the first time

Tech Stack

PythonApache AirflowAWS MWAAAurora PostgreSQLS3PendoDockerKubernetesPandas