Getting Digital

Writing production-ready ETL pipelines in Python / Pandas

Develop essential data science & ai skills with expert instruction and practical examples.

Online Course
Self-paced learning
Flexible Schedule
Learn at your pace
Expert Instructor
Industry professional
Certificate
Upon completion
What You'll Learn
Master the fundamentals of data science & ai
Apply best practices and industry standards
Build practical projects to demonstrate your skills
Understand advanced concepts and techniques

Skills you'll gain:

Professional SkillsBest PracticesIndustry StandardsPython
Prerequisites & Target Audience

Skill Level

IntermediateSome prior knowledge recommended

Requirements

Basic understanding of data science & ai
Enthusiasm to learn
Access to necessary software/tools
Commitment to practice

Who This Course Is For

Professionals working in data science & ai
Students and career changers
Freelancers and consultants
Anyone looking to improve their skills
Course Information

About This Course

This course will show each step to write an ETL pipeline in Python from scratch to production using the necessary tools such as Python 3. 9, Jupyter Notebook, Git and Github, Visual Studio Code, Docker and Docker Hub and the Python packages Pandas, boto3, pyyaml, awscli, jupyter, pylint, moto, coverage and the memory-profiler. Two different approaches how to code in the Data Engineering field will be introduced and applied - functional and object oriented programming.

Best practices in developing Python code will be introduced and applied: design principlesclean codingvirtual environmentsproject/folder setupconfigurationloggingexeption handlinglintingdependency managementperformance tuning with profilingunit testingintegration testingdockerizationWhat is the goal of this course. In the course we are going to use the Xetra dataset. Xetra stands for Exchange Electronic Trading and it is the trading platform of the Deutsche Börse Group.

This dataset is derived near-time on a minute-by-minute basis from Deutsche Börse's trading system and saved in an AWS S3 bucket available to the public for free. The ETL Pipeline we are going to create will extract the Xetra dataset from the AWS S3 source bucket on a scheduled basis, create a report using transformations and load the transformed data to another AWS S3 target bucket. The pipeline will be written in a way that it can be deployed easily to almost any production environment that can handle containerized applications.

The production environment we are going to write the ETL pipeline for consists of a GitHub Code repository, a DockerHub Image Repository, an execution platform such as Kubernetes and an Orchestration tool such as the container-native Kubernetes workflow engine Argo Workflows or Apache Airflow. So what can you expect in the course. You will receive primarily practical interactive lessons where you have to code and implement the pipeline and theory lessons when needed.

Provider
Udemy
Estimated Duration
10-20 hours
Language
English
Category
Technology & Programming

Topics Covered

Data Science & AIPythonProduction

Course Details

Format
Online, Self-Paced
Access
Lifetime
Certificate
Upon Completion
Support
Q&A Forum
Course Details
Ready to get started?

View pricing and check out the reviews. See what other learners had to say about the course.

Get started and enroll now
Money-back guarantee might be available
Join thousands of students

This course includes:

Lifetime access to course content
Access on mobile and desktop
Certificate of completion
Downloadable resources

Not sure if this is right for you?

Browse More Data Science & AI Courses

Continue Your Learning Journey

Explore more Data Science & AI courses to deepen your skills and advance your expertise.

This comprehensive course offers a structured introduction to the world of data science, combining foundational theory w...
Artificial intelligence is now present in our daily lives, and will profoundly impact us. Although the media is starting...