ENH: Support pd.json_normalize for normalizing only meta fields · Issue #60479 · pandas-dev/pandas

ENH: Support pd.json_normalize for normalizing only meta fields #60479

New issue

Open

Labels

Enhancement IO JSON Needs Discussion

@Ynjxsjmh

Description

@Ynjxsjmh

Ynjxsjmh

opened

on Dec 3, 2024

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Currently, meta is used with when record_path is not None. The logic is to extract both the record_path and the meta. For example:

data = [
 {
 "state": "Florida",
 "shortname": "FL",
 "info": {"governor": "Rick Scott", "year": 2014},
 "counties": [
 {"name": "Dade", "population": 12345},
 {"name": "Broward", "population": 40000},
 {"name": "Palm Beach", "population": 60000},
 ],
 },
 {
 "state": "Ohio",
 "shortname": "OH",
 "info": {"governor": "John Kasich", "year": 2015},
 "counties": [
 {"name": "Summit", "population": 1234},
 {"name": "Cuyahoga", "population": 1337},
 ],
 },
]
result = pd.json_normalize(
 data, "counties", ["state", "shortname", ["info", "governor"]]
)
result
 name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich

In the above example, pd.json_normalize not only retrieves counties, but also retrieves state, shortname and info.governor.

When record_path is not given, meta is ignored, for example:

result = pd.json_normalize(
 data, meta=["state", "shortname", ["info", "governor"]]
)
result
 state shortname counties info.governor info.year
0 Florida FL [{'name': 'Dade', 'population': 12345}, {'name... Rick Scott 2014
1 Ohio OH [{'name': 'Summit', 'population': 1234}, {'nam... John Kasich 2015

Feature Description

I would expect when record_path is None or an empty list, json_normalize can only extract the meta.

result = pd.json_normalize(
 data, meta=["state", "shortname", ["info", "governor"]]
)
result
 shortname state info.governor
0 FL Florida Rick Scott
1 OH Ohio John Kasich

The behavior can be summarized as:

record_path is None, meta is None: normalize all records.
record_path is not None, meta is None: normalize only record_path.
record_path is not None, meta is not None: normalize record_path and meta.
record_path is None, meta is not None: normalize only meta. [This request]

Alternative Solutions

The alternative solution is to use pd.json_normalize to create the dataframe then select desired columns:

result = pd.json_normalize(data)[["shortname", "state", "info.governor"]]
 shortname state info.governor
0 FL Florida Rick Scott
1 OH Ohio John Kasich

However, this is inconvenient that when you specify sep in pd.json_normalize, the columns name would also change as f"info{sep}governor":

pd.json_normalize(data, sep='-')[["shortname", "state", "info-governor"]]

The most similar feature is DataFrame.from_records columns parameter, but it is not able to parse inner dictionary.

result = pd.DataFrame.from_records(data, columns=["state", "shortname", "info"])
result
 state shortname info
0 Florida FL {'governor': 'Rick Scott'}
1 Ohio OH {'governor': 'John Kasich'}
df = pd.DataFrame.from_records(data, columns=["state", "shortname", ["info", "governor"]])
Traceback (most recent call last):
 File "d:\sourcecode\pandas\test.py", line 20, in <module>
 df = pd.DataFrame.from_records(data, columns=["state", "shortname", ["info", "governor"]])
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "D:\software\Python\Python312\Lib\site-packages\pandas\core\frame.py", line 2491, in from_records
 arrays, arr_columns = to_arrays(data, columns)
 ^^^^^^^^^^^^^^^^^^^^^^^^
 File "D:\software\Python\Python312\Lib\site-packages\pandas\core\internals\construction.py", line 837, in to_arrays
 arr, columns = _list_of_dict_to_arrays(data, columns)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "D:\software\Python\Python312\Lib\site-packages\pandas\core\internals\construction.py", line 924, in _list_of_dict_to_arrays
 content = lib.dicts_to_array(data, list(columns))
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "lib.pyx", line 401, in pandas._libs.lib.dicts_to_array
TypeError: unhashable type: 'list'

Metadata

Assignees

No one assigned

Labels

Enhancement IO JSON Needs Discussion

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Support pd.json_normalize for normalizing only meta fields #60479

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions