Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BUG: Challenges with Nested Metadata Extraction Using pandas.json_normalize( #60254

Open
Labels
Bug IO JSONread_json, to_json, json_normalize Needs InfoClarification about behavior needed to assess issue
@DavidNaizheZhou

Description

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
data = {
 "level1": {
 "rows": [
 {"col1": 1, "col2": 2},
 ]
 },
 "meta1": {
 "meta_sub1": 1,
 }, 
}
df = pd.json_normalize(data, record_path=["level1", "rows"], meta=["meta1"])
print(df)
df = pd.json_normalize(
 data,
 record_path=["level1", "rows"],
 meta=[["meta1", "meta_sub1"]], # Trying to access sub-fields within meta1
)

Issue Description

Description of the Issue

This reproducible example demonstrates the challenges and potential pitfalls when using pandas.json_normalize() to extract and flatten hierarchical data structures with nested metadata:

Data Structure

The data dictionary is multi-layered, with nested dictionaries and a list of dictionaries (rows) under level1. Additionally, meta1 is structured as a dictionary containing subfields.

Successful Normalization

The first call to pd.json_normalize() extracts the data from rows under level1 and includes meta1as a top-level metadata field. This works as intended becausemeta1 is accessed directly as a single key.

Output:

 col1 col2 meta1
0 1 2 {'meta_sub1': 1}

KeyError with Nested Meta Fields

The second pd.json_normalize() call attempts to extract subfields from meta1 using a nested path (meta=[["meta1", "meta_sub1"]]). This results in a KeyError because json_normalize() does not natively support nested lists for specifying paths within the meta parameter.

Expected Behavior

df = pd.json_normalize(
 data,
 record_path=["level1", "rows"],
 meta=[["meta1", "meta_sub1"]], # Trying to access sub-fields within meta1
)
 col1 col2 meta1
0 1 2 1

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.12.1
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : de_DE.cp1252

pandas : 2.2.3
numpy : 1.26.2
pytz : 2024.1
dateutil : 2.8.2
pip : 24.3.1
Cython : None
sphinx : 8.1.3
IPython : 8.17.2
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 202490
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.3
lxml.etree : 5.2.2
matplotlib : 3.8.3
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 15.0.0
pyreadstat : None
pytest : 8.1.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.11.4
sqlalchemy : 2.0.28
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bug IO JSONread_json, to_json, json_normalize Needs InfoClarification about behavior needed to assess issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /