Data Engineering Bootcamp - Series 1
Develop essential programming & development skills with expert instruction and practical examples.
Skills you'll gain:
Skill Level
Requirements
Who This Course Is For
About This Course
Take your first step into the world of data engineering and future-proof your career with this hands-on, project-based bootcamp built on the modern data stack. Taught by a seasoned data architect with over 11 years of industry experience, this course blends theory with practice, designed for aspiring data engineers, software engineers, analysts, and anyone eager to learn how to build real-world data pipelines. You will learn to design scalable data lakes, build dimensional data models, implement data quality frameworks, and orchestrate pipelines using Apache Airflow, all using a real-life ride-hailing application use case to simulate enterprise-scale systems.
What You'll LearnSection 1: Context SetupBuild your foundation with the Modern Data Stack, understand OLTP systems, and explore real-world data platform architectures. Gain clarity on how data flows in data-driven companiesLearn using a ride-hailing app scenarioGet properly onboarded into the bootcamp journeySection 2: Data Lake EssentialsLearn how to build and manage scalable data lakes on AWS S3. S3 architecture, partitioning, layers, and schema evolutionIAM, encryption, storage classes, event notificationsLifecycle management, backup & recoveryHands-on with Boto3 S3 APIsSection 3: Data ModelingMaster star schema design and implement SCD Type 1 and Type 2 dimensions.
Dimensional & Fact modelingETL development for analytical reportingBuild end-to-end models and data marts with hands-on labsSection 4: Data QualityEnsure trust and integrity in your data pipelines. Understand accuracy, completeness, and consistencyImplement DQ checks using industry best practicesUse data contracts for accountabilitySection 5: AWS AthenaQuery massive datasets with serverless power using AWS Athena. Learn DDL, Glue Catalog, and workgroup managementAutomate queries using Boto3 APIsCompare Athena vs Presto vs TrinoOptimize queries with best practicesSection 6: Apache SparkBuild production-grade data pipelines with PySpark on AWS EMR.
Learn Spark architecture and PySpark APIsBuild data pipelines using the WAP (Write-Audit-Publish) patternRun scalable jobs on AWS EMRApply UDFs and data quality within transformation logicSection 7: Apache AirflowOrchestrate workflows using Airflow and build custom plugins:Design DAGs, schedule pipelines, manage dependenciesAutomate Spark jobs using custom AWS EMR pluginHands-on labs for ingestion and transformation DAGsBuild reliable, reusable orchestration solutionsWhat You'll BuildA production-style data platform for a ride-hailing company, including:Data lake on AWS S3Dimensional data model with SCD logicSpark-based transformation pipelinesAutomated orchestration with AirflowQuery layer with AthenaBuilt-in data quality validations.
Topics Covered
Course Details
View pricing and check out the reviews. See what other learners had to say about the course.
This course includes:
Not sure if this is right for you?
Browse More Programming & Development CoursesContinue Your Learning Journey
Explore more Programming & Development courses to deepen your skills and advance your expertise.