Creating a Data Pipeline from Scratch
A Basic Data Engineering Project end-to-end.
Introduction
Data Engineering was always an area of interest to me, but I never really had time to create a project because I need to split my time among many things like work, family, and everything else that need my time and attention. So, I have given myself a strong challenge: creating a data pipeline from scratch in just a couple of days.
Wow, I mean — that sounds like a lot. But also sounds doable.
With just an idea in mind and more experience navigating the Data Science space than the Data Engineering one, I knew this was going to be tough, but still: challenge accepted.
In this post, we will go over the following project (GitHub):
Create a data pipeline that:
(1) Gets finance datasets from telecommunication stocks, economic indicators, and a Dow Jones Index for the Telecommunications (Telco) sector;
(2) Give initial treatment to validate the data;
(3) Clean and organize the data;
(4) Makes it ready for consumption by analysts and clients in a PostgreSQL database; and
(5) Presents a Power BI report as an en result with some insights.