Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Support pd.json_normalize for normalizing only meta fields #60479

Open
Labels
Enhancement IO JSONread_json, to_json, json_normalize Needs DiscussionRequires discussion from core team before further action
@Ynjxsjmh

Description

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently, meta is used with when record_path is not None. The logic is to extract both the record_path and the meta. For example:

data = [
 {
 "state": "Florida",
 "shortname": "FL",
 "info": {"governor": "Rick Scott", "year": 2014},
 "counties": [
 {"name": "Dade", "population": 12345},
 {"name": "Broward", "population": 40000},
 {"name": "Palm Beach", "population": 60000},
 ],
 },
 {
 "state": "Ohio",
 "shortname": "OH",
 "info": {"governor": "John Kasich", "year": 2015},
 "counties": [
 {"name": "Summit", "population": 1234},
 {"name": "Cuyahoga", "population": 1337},
 ],
 },
]
result = pd.json_normalize(
 data, "counties", ["state", "shortname", ["info", "governor"]]
)
result
 name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich

In the above example, pd.json_normalize not only retrieves counties, but also retrieves state, shortname and info.governor.


When record_path is not given, meta is ignored, for example:

result = pd.json_normalize(
 data, meta=["state", "shortname", ["info", "governor"]]
)
result
 state shortname counties info.governor info.year
0 Florida FL [{'name': 'Dade', 'population': 12345}, {'name... Rick Scott 2014
1 Ohio OH [{'name': 'Summit', 'population': 1234}, {'nam... John Kasich 2015

Feature Description

I would expect when record_path is None or an empty list, json_normalize can only extract the meta.

result = pd.json_normalize(
 data, meta=["state", "shortname", ["info", "governor"]]
)
result
 shortname state info.governor
0 FL Florida Rick Scott
1 OH Ohio John Kasich

The behavior can be summarized as:

  • record_path is None, meta is None: normalize all records.
  • record_path is not None, meta is None: normalize only record_path.
  • record_path is not None, meta is not None: normalize record_path and meta.
  • record_path is None, meta is not None: normalize only meta. [This request]

Alternative Solutions

The alternative solution is to use pd.json_normalize to create the dataframe then select desired columns:

result = pd.json_normalize(data)[["shortname", "state", "info.governor"]]
 shortname state info.governor
0 FL Florida Rick Scott
1 OH Ohio John Kasich

However, this is inconvenient that when you specify sep in pd.json_normalize, the columns name would also change as f"info{sep}governor":

pd.json_normalize(data, sep='-')[["shortname", "state", "info-governor"]]

The most similar feature is DataFrame.from_records columns parameter, but it is not able to parse inner dictionary.

result = pd.DataFrame.from_records(data, columns=["state", "shortname", "info"])
result
 state shortname info
0 Florida FL {'governor': 'Rick Scott'}
1 Ohio OH {'governor': 'John Kasich'}
df = pd.DataFrame.from_records(data, columns=["state", "shortname", ["info", "governor"]])
Traceback (most recent call last):
 File "d:\sourcecode\pandas\test.py", line 20, in <module>
 df = pd.DataFrame.from_records(data, columns=["state", "shortname", ["info", "governor"]])
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "D:\software\Python\Python312\Lib\site-packages\pandas\core\frame.py", line 2491, in from_records
 arrays, arr_columns = to_arrays(data, columns)
 ^^^^^^^^^^^^^^^^^^^^^^^^
 File "D:\software\Python\Python312\Lib\site-packages\pandas\core\internals\construction.py", line 837, in to_arrays
 arr, columns = _list_of_dict_to_arrays(data, columns)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "D:\software\Python\Python312\Lib\site-packages\pandas\core\internals\construction.py", line 924, in _list_of_dict_to_arrays
 content = lib.dicts_to_array(data, list(columns))
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "lib.pyx", line 401, in pandas._libs.lib.dicts_to_array
TypeError: unhashable type: 'list'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Enhancement IO JSONread_json, to_json, json_normalize Needs DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /