Back to catalog
DataAdvanced
Data Engineering with Python + PySpark in the Cloud
Handling larger data volumes requires distributed processing and orchestration. In this module you will learn how to structure a modern lakehouse stack with Python, Spark, Airflow, dbt, and core AWS services.
18 lessonsCertificate includedUSD 10 (~ARS 10.000)
Course syllabus
1Modern data architectures
2 lessons
Modern data architectures
- Data Lakehouse
- The modern Airflow + dbt stack
2PySpark from scratch
3 lessons
PySpark from scratch
- RDD vs DataFrame
- Joins and aggregations
- Parquet/Delta
3Apache Airflow
3 lessons
Apache Airflow
- DAGs and operators
- TaskFlow API
- Local orchestration
4Transformations with dbt
3 lessons
Transformations with dbt
- dbt Core
- Models
- Tests and macros
5Cloud data engineering with AWS
3 lessons
Cloud data engineering with AWS
- Glue
- Redshift
- S3 data lakes
6Data quality and governance
3 lessons
Data quality and governance
- Great Expectations
- Data catalog
- Lineage
7Final project
1 lessons
Final project
- Batch ingestion and transformation pipeline
What you will learn
PySparkApache AirflowdbtAWS S3 / GlueGreat Expectations
Certificate
Advanced Data Engineer Certificate - CumbreAcademy
Ready to start?
Investment: USD 10 (~ARS 10.000)
Buy accessWant access to every course?
Total Access gives you this course and all the others for $20/month.
This course: USD 10 (~ARS 10.000) - Total Access: $20 USD/month (all courses)
See Total AccessWhat can you do after this course?
These are the recommended next steps for your learning path.