The Extract Load Transform (ELT) framework is a metadata-driven orchestration framework designed for modern cloud data platforms. It simplifies ingestion and transformation pipelines, ensuring a consistent development experience and ease of maintenance. The framework supports batch ingestion and has been extensively tested with Microsoft Fabric and Azure managed services like Azure Databricks and Azure Synapse . It utilizes an ANSI-compatible control database as the metadata repository.
- Configurable and Extendable: Easily adapt the framework to meet specific needs.
- Data Source Agnostic: Ingest data from various sources such as databases, Delta Lake, REST API, flat files, JSON, XML, without storing connection strings as metadata.
- Delta and Full Loads: Support for both incremental and full data loads.
- Re-run and Retry Capability: Automatically handle failures without manual intervention.
- In-built Audit Tracking: Track data processing activities with built-in audit capabilities.
- Extended Audit Capability: Enhance audit tracking with Azure PaaS services like Diagnostic Logging.
- Eliminates Manual Data Patching: Streamline data processing by removing the need for manual interventions.
- Data Lineage Support: Maintain data lineage throughout the data lifecycle.
- Level1 and Level2 Transformations: Support for one-to-many and many-to-many transformations.
- On-demand Pipeline and Transformation Management: Enable or disable pipelines and transformations as needed.
Key concepts and configuration metadata explained in detail at Wiki
To get started follow these steps:
-
Clone or Fork the Repository: Start by cloning or forking the repository from github.com/bennyaustin/elt-framework.
-
Deploy ControlDB: The GitHub Action workflows/ControlDB-deployment.yml executes the workflow to deploy controlDB objects.
Pre-Requisites
- controlDB is already provisioned by IaC process like 07-IaC-Bicep or iac-synapse-dataplatform
- Create 1 or reuse an existing Service Principal. Take note of the Application (client) ID and the secret, they will be required later.
- Grant db_owner permission to the Service Principal
CREATE USER [<service_principal>] FROM EXTERNAL PROVIDER GO EXEC sp_addrolemember 'db_owner', [<service_principal>] GO
This GitHub Action requires the following repository secrets:
- CLIENT_ID: Client/Application ID of the Service Principal.
- CLIENT_SECRET: Service Principal Secret.
- SUBSCRIPTION_ID: Azure Subscription ID of controlDB
- TENANT_ID: Entra Tenant ID of controlDB
- CONTROLDB_CONNECTIONSTRING: controlDB connection string in service principal authentication format.
Server=<SQL Server>;Authentication=Active Directory Service Principal; Encrypt=True;Database=controlDB;User Id=<Service Principal Client/Application ID>;Password=<Service Principal Secret>
Now, hit the Run Workflow on GitHub action ControlDB-deployment.yml to deploy database objects.
- Microsoft Fabric data platform using ELT Framework
- Azure Databricks data platform using ELT Framework
- Azure Synapse data platform using ELT Framework
You can collaborate in various ways, including:
- Pull Requests
- Update/Enrich Wiki documentation
- Raise issues when you spot one
- Answer questions in the discussion forum
Please contact me to be added as a contributor.
If you have any questions or need support, please contact the maintainer: