Investigative data journalism: quantifying fixable waste in US healthcare, one issue at a time. Open-source analysis of CMS, OECD, and federal datasets. 428ドル.6B in savings identified so far across 8 published issues.
-
Updated
Jun 14, 2026 - Python
Investigative data journalism: quantifying fixable waste in US healthcare, one issue at a time. Open-source analysis of CMS, OECD, and federal datasets. 428ドル.6B in savings identified so far across 8 published issues.
XGBoost pipeline predicting hospital Excess Readmission Ratios on FY2024 CMS HRRP data. SHAP explainability, 5-fold CV (R2 0.938), and a live Streamlit dashboard for per-hospital risk assessment.
SQL analytics + AI natural language query interface built on 9M+ real CMS Medicare billing records
SQL + Tableau dashboard analyzing U.S. hospital quality performance using CMS data. Includes data cleaning pipeline, KPI development, geographic benchmarking, and interactive measure filtering.
SQL + Power BI + Excel analytics system built on public CMS hospital data. Tracks 30-day readmission rates, HCAHPS patient satisfaction, and clinical quality indicators across 4,800+ U.S. hospitals — with benchmarking, trend analysis, and operational efficiency reporting across three dashboard pages.
Pediatric Growth & Development Database - An R Markdown Project
Exploratory analysis and predictive modeling of patient satisfaction across 4,789 U.S. hospitals using CMS HCAHPS public data — Python, Scikit-learn, Matplotlib
Python analytics project using CMS hospital quality data to clean, summarize, and visualize hospital ratings, reporting patterns, and facility characteristics.
SQL analytics case study on 169,635 real CMS Medicare records — 20 queries across window functions, CTEs, and multi-table JOINs analyzing cost and readmission data across 3,000+ US hospitals.
Medicaid drug spending analysis (2019–2023) using MySQL, Power BI, and Python — built as a healthcare data analyst portfolio project.
Healthcare analytics system identifying hospitals at risk for CMS readmission penalties. Analyzes 5ドル.96B in financial exposure, predicts intervention opportunities, saves 1ドル.46B. Python • scikit-learn • CMS data • Healthcare policy
Azure Medallion lakehouse on 9.66M CMS Medicare provider-service records — PySpark Bronze→Silver→Gold on ADLS Gen2, with Power BI + marimo dashboards surfacing 5 hero billing insights.
Anomaly detection and overcharge prediction in Medicare Part-B claims using unsupervised outlier detection and supervised ML (XGBoost, Random Forest, logit regression). Analyzes 800K+ Illinois claims from CMS 2015 data.
How to request a computing account for a new student.
Healthcare analytics project using official CMS Medicare provider-service data, Python, SQL-style analysis, and Power BI to examine payment drivers and Connecticut benchmarks.
Healthcare KPI reporting project using Databricks SQL, notebook workflows, and Power BI reporting dashboards.
Predictive modeling + dashboard on federal CMS data - Random Forest models hospital excess readmissions.
Pharma commercial analytics on CMS Open Payments + Medicare Part D. dbt + Postgres + Streamlit + Groq LLM. Live demo linked.
Analyzing CMS hospital readmission data — SQL + Python + statistical testing + predictive modeling
Analytics engineering case study using real CMS ACA Marketplace data, dbt, DuckDB, and LookML to model premiums, benefits, plan availability, and insurance market competition.
Add a description, image, and links to the cms-data topic page so that developers can more easily learn about it.
To associate your repository with the cms-data topic, visit your repo's landing page and select "manage topics."