Name	Name	Last commit message	Last commit date
Latest commit History 56 Commits
README.md	README.md

How To Become a Data Engineer

SQL

Comprehensive SQL Tutorial by Mode Analytics
SQL Practice on Leetcode
Modern SQL a website about modern SQL syntax
Introduction to Window Functions En, Ru

Programming

Scala School by Twitter
Fluent Python intermediate level book about Python
Intro to Scala in Russian on Stepik by Tinkoff Bank
The Hitchhiker’s Guide to Python by Kenneth Reitz & Tanya Schlusser
Learn Python 3 The Hard Way by Zed A. Shaw

Databases

Intro to Database Systems by Carnegie Mellon University
Advanced Database Systems by Carnegie Mellon University
On Disk IO
- I. Flavors of IO
- II. More Flavours of IO
- III. LSM Trees
- IV. B-Trees and RUM Conjecture
- V. Access Patterns in LSM Trees

Distributed Systems

Distributed systems for fun and profit by Mikito Takada
Distributed Systems by Maarten van Steen & Andrew S. Tanenbaum
CSE138: Distributed Systems by Lindsey Kuper
CS 436: Distributed Computer Systems by University of Waterloo
MIT 6.824: Distributed Systems by Robert Morris from MIT
Distributed consensus reading list maintained by Heidi Howard from University of Cambridge

Books

Design Data-Intensive Applications by Martin Kleppmann
Fundamentals of Data Engineering: Plan and Build Robust Data Systems by Joe Reis & Matt Housley
Introduction to Algorithms by Thomas Cormen
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Star Schema The Complete Reference
Database Internals: A Deep Dive into How Distributed Data Systems Work
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
A Philosophy of Software Design
Grokking Streaming Systems by Josh Fischer & Ning Wang
Guide to High Performance Distributed Computing by K.G. Srinivasa & Anil Kumar Muppalla
Data Pipelines with Apache Airflow by Bas P. Harenslak and Julian Rutger de Ruiter

Courses

Data Engineering on Google Cloud Platform Specialization by Google
Data Engineer Nanodegree by Udacity
Data Engineering with Python by DataCamp

Blogs

Martin Kleppmann author of Designing Data-Intensive Application
BaseDS by Vaidehi Joshi about Distributed Systems

Tools

Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
Apache Spark is a unified analytics engine for large-scale data processing
Apache Kafka is a distributed streaming platform
Luigi is a Python package that helps you build complex pipelines of batch jobs.
Dagster.io is a system for building modern data applications.
Prefect includes everything you need to create and run data applications.
Metaflow build and manage real-life data science projects with ease
lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

data Engineering - telegram chat about data engineering
Data Engineering Subreddit - subreddit about data engineering

Data Engineering Jobs

Data Engineering jobs

Other

Data Engineering Podcast

Newsletters & Digests

DataEng Telegram channel - Telegram channel about data engineering (rus/eng)
Data Engineering Weekly
SF Data Weekly - A weekly email of useful links for people interested in building data platforms
Data Elixir - Data Elixir is an email newsletter that keeps you on top of the tools and trends in Data Science.
Data Governance, Privacy and Security - DbAdmin News is a news letter on the technology behind Data Governance, Security and Privacy

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adilkhash/Data-Engineering-HowTo

Folders and files

Latest commit

History

Repository files navigation

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

Tools

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors 13

Uh oh!

adilkhash/Data-Engineering-HowTo

Folders and files

Latest commit

History

Repository files navigation

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

Tools

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 13

Uh oh!

Packages