Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gkdevops/python-data-engineer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

9 Commits

Repository files navigation

Python Data Engineer Learning Repository

Welcome to the Python Data Engineer learning repository! This repo contains a structured, practical set of Jupyter notebooks for learning core Python concepts, especially with a focus on data engineering. Each topic is covered with hands-on examples and explanations, and links are provided to the code for easy reference.

Note: This summary is based on the top-level files; for a full list of all tutorials and scripts, check the GitHub repository contents.


📚 Topics Covered

  • Overview: Introduction to Python, variables, data types, and basic operations.
  • Key Concepts:
    • Printing and string manipulation
    • Variable assignment and naming
    • Numeric, string, and boolean data types
    • Type conversion, built-in functions, and string methods
    • List basics and common list operations

  • Overview: Mastering conditional statements for decision making.
  • Key Concepts:
    • if, elif, else statements
    • Comparison and logical operators
    • Nested conditions and practical examples

  • Overview: Using loops to automate repetitive tasks.
  • Key Concepts:
    • for and while loops
    • Loop control (break, continue, pass)
    • Looping through lists, strings, and dictionaries

  • Overview: Writing reusable blocks of code with functions.
  • Key Concepts:
    • Defining and calling functions
    • Parameters, return values, and scope
    • Lambda functions and higher-order functions

  • Overview: Using operators to manipulate data.
  • Key Concepts:
    • Arithmetic, assignment, comparison, logical, bitwise, and membership operators
    • Precedence and associativity

  • Overview: Mastering data structures for efficient storage and retrieval.
  • Key Concepts:
    • Lists, tuples, sets, dictionaries
    • When and how to use each collection
    • Real-world data engineering examples using collections

  • Overview: Organizing and reusing code with modules and packages.
  • Key Concepts:
    • The difference between modules, packages, and libraries (with LEGO analogies)
    • Importing and using built-in and external libraries (e.g., Pandas, NumPy, Matplotlib, Requests, Scikit-learn)
    • Creating custom modules and packages

  • Overview: CSV File handling and manipulation for data storage and retrieval.
  • Key Concepts:
    • Reading and writing text and CSV files
    • Using csv file with Pandas library
    • File and directory operations using os and shutil
    • Handling file paths and exceptions
    • Data extraction and ingestion from files

  • Overview: Managing JSON data formats for configuration and data exchange.
  • Key Concepts:
    • Reading and writing JSON files with Python’s json module
    • Parsing and serializing complex JSON structures
    • Real-world use cases: configuration files, API responses
    • Data transformation between JSON and Python objects

  • Overview: Working with randomness, generating random numbers and data for testing and simulations.
  • Key Concepts:
    • Using Python’s random module for numbers, choices, and shuffling
    • Generating random data for data engineering tasks
    • Introduction to the faker library for synthetic data creation
    • Practical examples: random sampling, data anonymization

  • Overview: Code blocks and reusable scripts for modular data engineering workflows.
  • Key Concepts:
    • Encapsulating logic in code blocks (functions, scripts)
    • Organizing reusable code for ETL pipelines
    • Example templates for batch processing and automation

  • Overview: Logging and monitoring data engineering processes.
  • Key Concepts:
    • Using Python’s logging module for event tracking
    • Setting up log formats, levels, and handlers
    • Best practices for error handling and process monitoring
    • Writing logs to files and integrating with external tools

📎 How to Use This Repo

  1. Browse Notebooks: Start with the Jupyter notebooks in the main directory for a structured learning path.
  2. Explore Directories: Check out the additional folders for more scripts and data.
  3. Try the Code: Run the notebooks locally or in an online Jupyter environment.
  4. Contribute: Pull requests to add new topics or improve examples are welcome!

🔗 Explore More


About

Learn Python language for beginners in Data Analytics and Big Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /