Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Cohort Extraction

Manu Murugesan edited this page Mar 13, 2026 · 2 revisions

Cohort Extraction

The cohort extraction module is the primary tool for building patient-level analytic files. It identifies patients matching diagnosis/procedure criteria, applies inclusion/exclusion filters, and exports the resulting claim files.

Basic Usage

from medicaid_utils.filters.patients.cohort_extraction import extract_cohort
extract_cohort(
 state="AL",
 lst_year=[2016, 2017, 2018],
 dct_diag_proc_codes=dct_codes,
 dct_filters=dct_filters,
 lst_types_to_export=["ip", "ot", "ps"],
 dct_data_paths=dct_paths,
 cms_format="TAF",
)

Defining Diagnosis Codes

Use ICD-9 and/or ICD-10 codes with inclusion and exclusion logic. Codes are matched using prefix matching"250" matches "2500", "25000", "25002", etc.

dct_codes = {
 "diag_codes": {
 "diabetes_t2": {
 "incl": {
 9: ["250"], # ICD-9 prefix
 10: ["E11"], # ICD-10 prefix
 },
 "excl": {
 9: ["25001", "25003", "25011", "25013"], # Odd 5th digits = Type 1
 10: ["E10"], # Exclude Type 1
 },
 },
 },
 "proc_codes": {},
}

Defining Procedure Codes

Procedure codes are keyed by procedure coding system:

dct_codes = {
 "diag_codes": {},
 "proc_codes": {
 "methadone": {
 7: [ # ICD-10-PCS (system code 7)
 "HZ81ZZZ", "HZ84ZZZ", "HZ85ZZZ", "HZ86ZZZ",
 ],
 },
 },
}

Common procedure system codes:

  • 1 — CPT/HCPCS
  • 6 — ICD-9-CM procedure
  • 7 — ICD-10-PCS

Defining Filters

Filters control which claims and patients are included:

dct_filters = {
 "cohort": {
 "ip": {
 "missing_dob": 0, # Exclude missing DOB
 "range_numeric_age_prncpl_proc": (18, 64), # Age 18-64
 },
 "ot": {
 "missing_dob": 0,
 "range_numeric_age_srvc_bgn": (18, 64),
 },
 },
 "export": {},
}

Filter Types

Type Example Description
Column value "missing_dob": 0 Keep rows where column equals value
Numeric range "range_numeric_age_srvc_bgn": (18, 64) Keep rows where column is within range (inclusive)
Date range "range_date_srvc_bgn_date": ("20160101", "20181231") Keep rows where date is within range
Exclusion "excl_female": 1 Exclude patients with positive exclusion flag

Output Files

After extraction, the export folder contains:

  • cohort_{STATE}.csv — patient-level file with condition flags, inclusion indicator, and date of birth
  • cohort_{STATE}_{YEAR}.csv — year-specific patient file
  • cohort_exclusions_{TYPE}_{STATE}_{YEAR}.parquet — filter statistics
  • Exported claim files in the requested format (CSV or Parquet)

Multiple States

for state in ["AL", "IL", "CA", "NY", "TX"]:
 extract_cohort(
 state=state,
 lst_year=[2016, 2017, 2018],
 dct_diag_proc_codes=dct_codes,
 dct_filters=dct_filters,
 lst_types_to_export=["ip", "ot", "ps"],
 dct_data_paths={
 "source_root": "/data/cms/",
 "export_folder": f"/output/cohort/{state}/",
 },
 cms_format="TAF",
 )

See Also

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /