-
Notifications
You must be signed in to change notification settings - Fork 3
Cohort Extraction
Manu Murugesan edited this page Mar 13, 2026
·
2 revisions
The cohort extraction module is the primary tool for building patient-level analytic files. It identifies patients matching diagnosis/procedure criteria, applies inclusion/exclusion filters, and exports the resulting claim files.
from medicaid_utils.filters.patients.cohort_extraction import extract_cohort extract_cohort( state="AL", lst_year=[2016, 2017, 2018], dct_diag_proc_codes=dct_codes, dct_filters=dct_filters, lst_types_to_export=["ip", "ot", "ps"], dct_data_paths=dct_paths, cms_format="TAF", )
Use ICD-9 and/or ICD-10 codes with inclusion and exclusion logic. Codes are matched using prefix matching — "250" matches "2500", "25000", "25002", etc.
dct_codes = { "diag_codes": { "diabetes_t2": { "incl": { 9: ["250"], # ICD-9 prefix 10: ["E11"], # ICD-10 prefix }, "excl": { 9: ["25001", "25003", "25011", "25013"], # Odd 5th digits = Type 1 10: ["E10"], # Exclude Type 1 }, }, }, "proc_codes": {}, }
Procedure codes are keyed by procedure coding system:
dct_codes = { "diag_codes": {}, "proc_codes": { "methadone": { 7: [ # ICD-10-PCS (system code 7) "HZ81ZZZ", "HZ84ZZZ", "HZ85ZZZ", "HZ86ZZZ", ], }, }, }
Common procedure system codes:
-
1— CPT/HCPCS -
6— ICD-9-CM procedure -
7— ICD-10-PCS
Filters control which claims and patients are included:
dct_filters = { "cohort": { "ip": { "missing_dob": 0, # Exclude missing DOB "range_numeric_age_prncpl_proc": (18, 64), # Age 18-64 }, "ot": { "missing_dob": 0, "range_numeric_age_srvc_bgn": (18, 64), }, }, "export": {}, }
| Type | Example | Description |
|---|---|---|
| Column value | "missing_dob": 0 |
Keep rows where column equals value |
| Numeric range | "range_numeric_age_srvc_bgn": (18, 64) |
Keep rows where column is within range (inclusive) |
| Date range | "range_date_srvc_bgn_date": ("20160101", "20181231") |
Keep rows where date is within range |
| Exclusion | "excl_female": 1 |
Exclude patients with positive exclusion flag |
After extraction, the export folder contains:
-
cohort_{STATE}.csv— patient-level file with condition flags, inclusion indicator, and date of birth -
cohort_{STATE}_{YEAR}.csv— year-specific patient file -
cohort_exclusions_{TYPE}_{STATE}_{YEAR}.parquet— filter statistics - Exported claim files in the requested format (CSV or Parquet)
for state in ["AL", "IL", "CA", "NY", "TX"]: extract_cohort( state=state, lst_year=[2016, 2017, 2018], dct_diag_proc_codes=dct_codes, dct_filters=dct_filters, lst_types_to_export=["ip", "ot", "ps"], dct_data_paths={ "source_root": "/data/cms/", "export_folder": f"/output/cohort/{state}/", }, cms_format="TAF", )
- Common Recipes — More code examples
- Risk Adjustment Algorithms — Apply after cohort extraction
Getting Started
User Guide
Recipes & How-Tos
Reference
Links