How to Become a Data Engineer 2024

How to Become a Data Engineer 2024

Hello dears, it’s Nataindata and today let’s talk about How I would learn Data Engineering if I had to start AGAIN.

full video is here

I’m a Senior Data Engineer at TripAdvisor now, [disclaimer: views are mine] but I’ve made my mistakes in the past and want to share with you how to kick off Data Engineering more efficiently

This article will be useful If You...

  • Aspire to switch from a non-IT career to data engineering
  • Are a student hesitating about which career path to pursue
  • Have prior coding experience and heard of the opportunities in big data

And I’m pretty sure you come across figures like this:

src: https://www.glassdoor.com/Salaries/us-data-engineer-salary-SRCH_IL.0,2_IN1_KO3,16.htm

Which have sparked your interest and fuelled your career aspirations.

But you could be overwhelmed with a couple of things:

  1. Data Engineer Tools Landscape

It seems like there are lots of tools to grasp, (and I’m pretty sure not all of them are listed).

full article here - Data Tools Landscape 2024

The thing is, when you go out there searching Top data engineering skills, the results are even more frustrating:

Some websites are recommending outdated skills that weren't even close to being in the Top 10. While others, with access to the most valuable insights provided skills that could be applied to any job.

  1. Quantity of Data buzzwords out there

Well, data engineering is a lot more ambiguous field compared to traditional Tech roles like software engineering, so new concepts are frequently emerge:

🎈LOW CODE / NO CODE

🎈SOURCE OF TRUTH

🎈SELF SERVICE ANALYTICS

🎈MODERN DATA STACK

🎈DATA MESH

🎈LAKEHOUSE

  1. If AI is going to take over Data Engineering jobs

(spoiler: no, it’s just a helping hand here, not a replacement) I've talked about it here: Can AI replace Data Engineer?

Thankfully, we’ll address all those questions further. Let’s go!


In short, my DE career started accidentally, when I’ve been actively looking for Junior Python developer jobs.🤫

Then all of a sudden got an opportunity from PepsiCo:

It was like: Hey, we are looking for Junior Data Engineers, wanna join?

I’m like what: Data Engineering? What is that? Is it even a good career path? I've been learning like Docker, APIs, Django, etc. How is all of that applied? How can I succeed with that?

But after some research, I've understood that DE is a highly, hiiiiighly promising career, that sweet spot between software engineering and data analysis.

So I took the leap of faith...

And now I’m a Senior Data Engineer at TripAdvisor, AWS and GCP certified, wrangling gazillion of data.


Before jumping on particular skills, we’ll gonna refer to data. Many sources are advising some obsolete data, without any data proof.

What is the actual demand for data engineering positions right now? The best approach is to look through LinkedIn job postings and figure out what exactly the market needs.

For that, I’ll refer to datanerd.tech website, which analyzed 278K LinkedIn Data job postings and outputs current market needs per role, level, and country:

if you filter by Data Engineer position you will have real-world demanded skills from Data Engineers right now, right here

So looking at this list, should I just jump on these all right away? Well, I do agree with these stats, but with a tweak ;)

The first thing I’d suggest you to check is:

  1. Computer Science Fundamentals

Yes! It’s not shown in the list, but it goes without saying. The main difference between Data Engineer and other data professions is that it requires a certain level of Computer Science fundamentals.

So if you are a complete newbie - I’d suggest you start softly with CS50 free course, it will broaden your perspective and strengthen your code fundamentals.

It covers a range of basic concepts like algorithms, data structures, resource management, security, software engineering, and web development. CS50 is available for free online, YouTube or edX, allowing self-paced learning with optional certificates. It’s really engaging, with a comprehensive introduction, plus a balance of theoretical and practical learning.

In my time, it gave me a lot of confidence and a YouTubedeeper understanding what is computer science in general.

  1. SQL

After that - “the bread and butter” of every Data professional - SQL.

Sql is the oldest and an absolute must, no one can beat it for more more than 40 years on the market!

Beyond just doing some basic selects you need to learn sub-queries, views, how to use analytical functions, and things beyond standard from and where clauses. You need to have a pretty deep understanding of SQL, if you're gonna become a data engineer, you can't just get away with basics.

There are tons of resources out there, but my advice here is don’t rely on just theoretical resources, pick one where you can type and practice. Really, you can even use ChatGPT for that:

➡️ SQL Tutorial for beginners with Chatgpt

Or if you want a coding platform for that, have a look at Basic Introduction to SQL via Codecademy.

SQL is a must at your job and for passing interviews

  1. 🥁…

Well, here you expect me to say Python. But, NO. Let me explain:

I think learning Python right after is a mistake.

Before jumping on such a broad tool, robust programming language, you need to understand WHERE you need to apply it and which parts to use, in which context.

So you need to learn DATA FUNDAMENTALS first.

My story is that I’ve started with Python and didn’t know about Data Engineering, so instead of leveraging Python for batch pipelines, I was wasting my time studying Django and Flask frameworks which are cool but not a 100% match (what?). I can’t say that it was a complete waste of time, but I’d better focus on something relevant. For that, I need to KNOW which concepts and approaches are used in Data Engineering first.

Like, I’d better learn pandas, scripting, whatever. But not Django (no offense here)

so:

  1. Data Fundamentals

I’ve mentioned data buzzwords before, so it can be pretty hard for you to understand which ones are widely used, and which ones are just noise. If we are talking about fundamentals, you can kick off with a bunch of these concepts, to dig down and understand :

  • SQL vs NoSQL
  • Structured vs Semi-structured vs Unstructured data
  • Databases evolution
  • Data warehouse vs Data lake vs Data mart
  • OLTP vs OLAP
  • ETL x ELT x EL
  • Data Modeling - Kimball vs 3NF vs Data Vault vs Big Table
  • Data formats - csv, parquet, json
  • Batch vs Streaming

It’s not a comprehensive list but these are the concepts you will stumble upon Data Engineering interviews and will let you speak with other DE’s the same language. You will have a better feeling of what Data Engineering is about and how is it applied.

  1. Python

Yes, finally! As data showed, Python is the most demanded programming language for data (and you are pretty safe these days).

Python's a great place to start learning all of the basics: usually like for loops, if statements, variables, and functions, and then from there you can go to the next level: understanding object-oriented programming (it’s pretty helpful if you are dealing with Airflow, so all those classes and methods are not so scary), functional programming, pandas library, different concepts in that space

So far, in this step, you will know about databases, SQL, basics of data engineering, like what is etl, and what to do with data, you will know basics in programming, so can jump on creating some easy ETL pipeline, which picks up data from API, transforms and pushes it to database, etc. Like a pet project.

Okay so now you have a starting point, a north star. And it’s more than enough for you to kick off.

💡
If you want even more details, step-by-step process, and me to gently take your hand and walk through, please check my data engineering roadmap ➡️ 105+ tools & concepts are included, I’ve drilled down every point even more and included particular resources that teach you in the best way. Plus, multiple pet projects, my experience on passing Data certifications; How to land a job; and even AI section for Data Engineering.

So here you have it dears! Please tell me if this video was helpful, I do appreciate your feedback.

See you in the next one ;)