Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

License

Notifications You must be signed in to change notification settings

vim89/datapipelines-essentials-python

Repository files navigation

Datalake ETL Pipeline

Data transformation simplified for any Data platform.

Features: The package has complete ETL process -

  1. Uses metadata, transformation & data model information to design ETL pipeline
  2. Builds target transformation SparkSQL and Spark Dataframes
  3. Builds source & target Hive DDLs
  4. Validates DataFrames, extends core classes, defines DataFrame transformations, and provides UDF SQL functions.
  5. Supports below fundamental transformations for ETL pipeline -
    • Filters on source & target dataframes
    • Grouping and Aggregations on source & target dataframes
    • Heavily nested queries / dataframes
  6. Has complex and heavily nested XML, JSON, Parquet & ORC parser to nth level of nesting
  7. Has Unit test cases designed on function/method level & measures source code coverage
  8. Has information about delpoying to higher environments
  9. Has API documentation for customization & enhancement

Enhancements: In progress -

  1. Integrate Audit and logging - Define Error codes, log process failures, Audit progress & runtime information

About

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /