Back to catalog
DataAdvanced

Data Engineering with Python + PySpark in the Cloud

Handling larger data volumes requires distributed processing and orchestration. In this module you will learn how to structure a modern lakehouse stack with Python, Spark, Airflow, dbt, and core AWS services.

18 lessonsCertificate includedUSD 10 (~ARS 10.000)

Course syllabus

1

Modern data architectures

2 lessons
  • Data Lakehouse
  • The modern Airflow + dbt stack
2

PySpark from scratch

3 lessons
  • RDD vs DataFrame
  • Joins and aggregations
  • Parquet/Delta
3

Apache Airflow

3 lessons
  • DAGs and operators
  • TaskFlow API
  • Local orchestration
4

Transformations with dbt

3 lessons
  • dbt Core
  • Models
  • Tests and macros
5

Cloud data engineering with AWS

3 lessons
  • Glue
  • Redshift
  • S3 data lakes
6

Data quality and governance

3 lessons
  • Great Expectations
  • Data catalog
  • Lineage
7

Final project

1 lessons
  • Batch ingestion and transformation pipeline

What you will learn

PySparkApache AirflowdbtAWS S3 / GlueGreat Expectations

Certificate

Advanced Data Engineer Certificate - CumbreAcademy

Ready to start?

Investment: USD 10 (~ARS 10.000)

Buy access

Want access to every course?

Total Access gives you this course and all the others for $20/month.

This course: USD 10 (~ARS 10.000) - Total Access: $20 USD/month (all courses)
See Total Access

Enroll

USD 10 (~ARS 10.000)
Buy access