Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Org-EthereaLogic.ai

Databricks-native Enterprise Data Trust portfolio: data quality, governance, drift detection, and LLM-ready reliability controls. 844+ passing tests.

Typing SVG

EthereaLogic GitHub Banner


🧠 About This Organization

Org-EthereaLogic.ai is the open-source home of the Enterprise Data Trust portfolio — a suite of Databricks-native data quality and governance controls built for Medallion Architectures. Every repository runs on Databricks Free Edition, every claim is backed by passing tests, and every control pattern is production-reproducible.

"Is your data pipeline trustworthy — or just running without errors?"


🚀 The Enterprise Data Trust Portfolio

Benchmark headline — Perfect challenger recall (1.00 vs 0.8767 industry baseline) validated across 3 public datasets totaling 6.6M rows — Census ACS, NYC TLC Taxi, UCI Adult. 100% detection of silent data corruption before it reaches executive dashboards. Methodology and preregistered experiments: From Theory to Evidence →

Chapter Repository Description Tests
Ch 1 Trusted Source Intake Certifies every record before downstream consumption. 7 contract checks, replay detection, schema drift handling, and quarantine with explicit reasons. 56
Ch 2 Silent Failure Prevention Detects when business columns collapse despite healthy schema and row counts. Distribution stability scoring, 6 publication gates, blocked Gold refresh on degradation. 50
Ch 3 Measurable Control Effectiveness Scores data controls against known failure scenarios with precision, recall, and ground truth. Perfect recall where industry baselines missed injected drift. 37
Ch 4 DriftSentinel Unified platform — intake certification, drift gating, and control benchmarking in a single governed pipeline. Operator dashboard included. 397
Ch 5 AetheriaForge Coherence-scored transformation engine — entity resolution, temporal reconciliation, and schema enforcement with append-only evidence. Published on PyPI. 304

Chapters 4 and 5 are full Databricks-deployable applications with operator dashboards, Asset Bundle deployment, and PyPI packages.

Quick install:

pip install etherealogic-driftsentinel # Ch 4 — Shannon entropy drift detection (355+ tests)
pip install etherealogic-aetheriaforge # Ch 5 — coherence-scored transformation engine (300+ tests)

Both packages are Databricks-deployable via Asset Bundles. See each repo's README for the bootstrap workflow. All five Data Trust chapter repos are MIT-licensed.


🛠 Tech Stack

Core Platform

Databricks Apache Spark Python PySpark

Data Quality & Governance

Unity Catalog Delta Lake Medallion Architecture

CI/CD & Code Quality

GitHub Actions Codacy Codecov Snyk pytest

Infrastructure

Gradio Docker Databricks Asset Bundles


🏆 Portfolio Highlights

🛡️ Data Trust 🧪 Test-Driven 🔓 Open Source 📊 Production-Ready
Every control pattern is evidence-backed 844+ passing tests across 5 repos All repos run on Databricks Free Edition Operator dashboards, CI/CD, and security scanning

📬 Contact


All repositories are open source and reproducible. ⭐ Star them if you find them useful — it helps others in the community find them too.

Pinned Loading

  1. DriftSentinel DriftSentinel Public

    Databricks-native data trust pipeline — intake certification, drift gating, and control benchmarking in a single deployable product.

    Python 3

  2. measurable-control-effectiveness measurable-control-effectiveness Public

    A reproducible benchmark that scores data controls against known failure scenarios with precision, recall, and ground truth. Custom approach achieved perfect recall; industry baselines missed injec...

    Python

  3. silent-failure-prevention silent-failure-prevention Public

    A release control that detects when business columns collapse despite healthy schema and row counts. Distribution stability scoring, 6 publication gates, and blocked Gold refresh when the health sc...

    Python

  4. AetheriaForge AetheriaForge Public

    Databricks-native intelligent data transformation engine — coherence-scored Bronze/Silver/Gold with entity resolution and temporal reconciliation in a single deployable product.

    Python 1

  5. trusted-source-intake trusted-source-intake Public

    A Databricks control pattern that certifies every record before downstream consumption. 7 contract checks, replay detection, schema drift handling, and quarantine with explicit reasons. 56 passing ...

    Python

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 10 repositories

Top languages

Loading...

Most used topics

Loading...

AltStyle によって変換されたページ (->オリジナル) /