Creating a Data Pipeline from Scratch

Gustavo Santos
13 min readJan 8, 2024

A Basic Data Engineering Project end-to-end.

Photo by Mike Benna on Unsplash

Introduction

Data Engineering was always an area of interest to me, but I never really had time to create a project because I need to split my time among many things like work, family, and everything else that need my time and attention. So, I have given myself a strong challenge: creating a data pipeline from scratch in just a couple of days.

Wow, I mean — that sounds like a lot. But also sounds doable.

With just an idea in mind and more experience navigating the Data Science space than the Data Engineering one, I knew this was going to be tough, but still: challenge accepted.

In this post, we will go over the following project (GitHub):

Create a data pipeline that:

(1) Gets finance datasets from telecommunication stocks, economic indicators, and a Dow Jones Index for the Telecommunications (Telco) sector;

(2) Give initial treatment to validate the data;

(3) Clean and organize the data;

(4) Makes it ready for consumption by analysts and clients in a PostgreSQL database; and

(5) Presents a Power BI report as an en result with some insights.

--

--

Gustavo Santos

Data Scientist. I extract insights from data to help people and companies to make better and data driven decisions. | In: https://www.linkedin.com/in/gurezende/