Writing production-ready ETL pipelines in Python / Pandas

Name: Writing production-ready ETL pipelines in Python / Pandas
Price: 104.99 EUR
Availability: InStock

Develop essential data science & ai skills with expert instruction and practical examples.

Online Course

Self-paced learning

Flexible Schedule

Learn at your pace

Expert Instructor

Industry professional

Certificate

Upon completion

What You'll Learn

Master the fundamentals of data science & ai

Apply best practices and industry standards

Build practical projects to demonstrate your skills

Understand advanced concepts and techniques

Skills you'll gain:

Professional Skills

Continue Your Learning Journey

Explore more Data Science & AI courses to deepen your skills and advance your expertise.

Tableau Prep for Data Analysis & Business Intelligence

Tableau Prep is an industry-leading data prep tool, with intuitive tools for combining, shaping, and cleaning raw data f...

View Course Details

How to Build Neural Networks in Python

This course has been specially designed with months of research to help learners to understand how to build and train a ...

View Course Details

Practical Transfer Learning ( Deep Learning )in Python

Don't be Hero. as It is well said..Let;s Enroll and utilize works of Hero for our problems.Everyone can not do research ...

View Course Details

Pose Estimation - Deep Learning using OpenPose

Learn how we implemented OpenPose Deep Learning Pose Estimation Models & Build 5 AppsPose Estimation is a computer visio...

View Course Details

Data Science for Business Leaders: ML Fundamentals

Machine learning is a capability that business leaders should grasp if they want to extract value from data. There's a l...

View Course Details

Browse All Data Science & AI Courses Explore Technology & Programming

Course Information

About This Course

This course will show each step to write an ETL pipeline in Python from scratch to production using the necessary tools such as Python 3. 9, Jupyter Notebook, Git and Github, Visual Studio Code, Docker and Docker Hub and the Python packages Pandas, boto3, pyyaml, awscli, jupyter, pylint, moto, coverage and the memory-profiler. Two different approaches how to code in the Data Engineering field will be introduced and applied - functional and object oriented programming.

Best practices in developing Python code will be introduced and applied: design principlesclean codingvirtual environmentsproject/folder setupconfigurationloggingexeption handlinglintingdependency managementperformance tuning with profilingunit testingintegration testingdockerizationWhat is the goal of this course. In the course we are going to use the Xetra dataset. Xetra stands for Exchange Electronic Trading and it is the trading platform of the Deutsche Börse Group.

This dataset is derived near-time on a minute-by-minute basis from Deutsche Börse's trading system and saved in an AWS S3 bucket available to the public for free. The ETL Pipeline we are going to create will extract the Xetra dataset from the AWS S3 source bucket on a scheduled basis, create a report using transformations and load the transformed data to another AWS S3 target bucket. The pipeline will be written in a way that it can be deployed easily to almost any production environment that can handle containerized applications.

The production environment we are going to write the ETL pipeline for consists of a GitHub Code repository, a DockerHub Image Repository, an execution platform such as Kubernetes and an Orchestration tool such as the container-native Kubernetes workflow engine Argo Workflows or Apache Airflow. So what can you expect in the course. You will receive primarily practical interactive lessons where you have to code and implement the pipeline and theory lessons when needed.

Provider

Udemy

Estimated Duration

10-20 hours

Language

English