Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

MAX vs TAF

Manu Murugesan edited this page Mar 14, 2026 · 6 revisions

MAX vs TAF: Understanding the Two CMS File Formats

medicaid-utils supports both Medicaid file formats published by CMS. Understanding their differences is essential for working with Medicaid claims data.

Overview

Feature MAX (Medicaid Analytic eXtract) TAF (T-MSIS Analytic Files)
Years available 1999–2015 2014–present
Diagnosis coding Primarily ICD-9-CM Primarily ICD-10-CM
File structure Single flat file per claim type Multiple sub-files per claim type
Beneficiary ID BENE_MSIS, BENE_ID, or MSIS_ID BENE_MSIS, BENE_ID, or MSIS_ID
Raw CMS claim types IP, OT, RX, PS, CC IP, OT, LT, RX, DE (person summary)
Supported in medicaid-utils IP, OT, PS, CC IP, OT, LT, RX, PS
Diagnosis columns DIAG_CD_1DIAG_CD_9 DGNS_CD_1DGNS_CD_12
Procedure columns PRCDR_CD_1PRCDR_CD_6 PRCDR_CD_1PRCDR_CD_6, LINE_PRCDR_CD
Date columns SRVC_BGN_DT, ADMSN_DT SRVC_BGN_DT, ADMSN_DT

Key Differences in Code

Accessing DataFrames

MAX — Single DataFrame accessible via .df:

from medicaid_utils.preprocessing import max_ip
ip = max_ip.MAXIP(year=2012, state="WY", data_root="/data/cms")
df = ip.df # Single Dask DataFrame

TAF — Multiple sub-file DataFrames in .dct_files:

from medicaid_utils.preprocessing import taf_ip
ip = taf_ip.TAFIP(year=2019, state="AL", data_root="/data/cms")
df_base = ip.dct_files["base"] # Header/base records
df_line = ip.dct_files["line"] # Line-level detail
df_dx = ip.dct_files["base_diag_codes"] # Diagnosis codes
df_ndc = ip.dct_files["line_ndc_codes"] # NDC codes

Specifying Format

Most functions accept a cms_format parameter:

# MAX (after constructing LST_DIAG_CD from DIAG_CD_* columns)
score(ip.df, lst_diag_col_name="LST_DIAG_CD", cms_format="MAX")
# TAF (after calling ip.gather_bene_level_diag_ndc_codes())
score(ip.dct_files["base_diag_codes"], lst_diag_col_name="LST_DIAG_CD", cms_format="TAF")

Cohort Extraction

The extract_cohort function handles format differences internally:

# Just change cms_format — the rest of the API is the same
extract_cohort(state="WY", lst_year=[2012], cms_format="MAX", ...)
extract_cohort(state="AL", lst_year=[2019], cms_format="TAF", ...)

TAF Sub-File Types

Each TAF claim type is split into sub-files:

Suffix Description Dict Key
h (e.g., iph) Header/base "base"
l (e.g., ipl) Line-level detail "line"
occr (e.g., ipoccr) Occurrence codes "occurrence_code"
dx (e.g., ipdx) Diagnosis codes "base_diag_codes"
ndc (e.g., ipndc) NDC codes "line_ndc_codes"

Which Format Should I Use?

  • ICD-9 studies (pre-October 2015): Use MAX data
  • ICD-10 studies (post-October 2015): Use TAF data
  • Cross-era studies: Use both, with ICD-9 and ICD-10 code mappings in your dct_diag_proc_codes
  • Pharmacy studies: TAF preferred (medicaid-utils implements TAF RX preprocessing via TAFRX; MAX RX data exists in CMS but is not yet supported in the package)

Beneficiary ID (BENE_MSIS)

BENE_MSIS is a composite identifier constructed by medicaid-utils (not a raw CMS column). It applies to both MAX and TAF:

BENE_MSIS = STATE_CD + "-" + HAS_BENE + "-" + (BENE_ID or MSIS_ID)
  • BENE_ID — CMS-assigned, intended to be unique across states and years
  • MSIS_ID — State-assigned, unique only within a state and year
  • HAS_BENE — 1 if BENE_ID exists, 0 otherwise (falls back to MSIS_ID)

Example: "AL-1-123456789" (Alabama, has BENE_ID, ID is 123456789)

The index_col parameter on all claim classes accepts any of the three IDs: "BENE_MSIS", "BENE_ID", or "MSIS_ID". The default is "BENE_MSIS".

Column Name Reference

See Glossary for the complete column name mapping between MAX and TAF.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /