Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@JeffWilliams2
JeffWilliams2
Follow

JeffWilliams2

🎯
Focusing

Block or report JeffWilliams2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
JeffWilliams2 /ReadMe.md

Hi, I'm Jeff

Data Engineer with an MS in Advanced Data Analytics and production experience building healthcare and financial data systems. I work across the full modern data stack — ingestion through reporting layer — with a focus on well-tested, documented pipelines and clear output for stakeholders.

Currently at PermianBytes building ELT pipelines, dimensional models, and data APIs for client engagements. Previously at General Genomics architecting a Kubernetes medallion lakehouse for clinical and imaging data (DICOM, FHIR R4). Finance domain background from JPMorgan Chase and Charles Schwab — I understand the compliance rules and data conventions that make financial pipelines hard.


Tech Stack

Languages & Frameworks

Python SQL PySpark Bash

Data Engineering

dbt Apache Airflow Apache Spark Streamlit

Warehouse & Storage

Snowflake Databricks PostgreSQL DuckDB

Cloud & Infrastructure

AWS Docker Kubernetes GitHub Actions


Currently Reading

  • The Data Warehouse Toolkit — Kimball & Ross
  • Fundamentals of Data Engineering — Reis & Housley
  • Designing Data-Intensive Applications — Kleppmann

Pinned Loading

  1. legacy-bank-data-pipeline legacy-bank-data-pipeline Public

    Medallion dbt pipeline (bronze→silver→gold) with BSA/AML compliance, FDIC coverage, and Streamlit dashboard. Runs on DuckDB, deploys to Snowflake.

    Python

  2. icd-10-coder icd-10-coder Public

    Explainable medical coding with RAG. Every suggested code links back to the source text that justifies it.

  3. ten-k-analyzer ten-k-analyzer Public

    Ask natural-language questions about SEC filings. Get cited answers in seconds.

  4. realtime-banking-cdc-pipeline realtime-banking-cdc-pipeline Public

    Real-Time Banking CDC Pipeline: PostgreSQL → Debezium → Kafka → Snowflake with DBT transformations

    Python

  5. ehr-data-pipeline ehr-data-pipeline Public

    EHR clinical data pipeline: dbt + DuckDB + Streamlit · bronze/silver/gold medallion architecture over synthetic FHIR R4-style data

    Python

AltStyle によって変換されたページ (->オリジナル) /