Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: add option to save json without escaping forward slashes #61442

Open
Labels
Enhancement IO JSONread_json, to_json, json_normalize Needs TriageIssue that has not been reviewed by a pandas team member
@ellisbrown

Description

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.

the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas .to_json() function, I have to import the json package and manually write the file myself. this can be much slower for very large files

I am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement

Feature Description

add a new parameter to pandas.DataFrame.to_json() to escape_forward_slashes

def to_json(self, ..., escape_forward_slashes=True) -> str | None:
 ...

or even a ujson_options dict

def to_json(self, ..., ujson_options={}) -> str | None:
 ...

Alternative Solutions

instead of

df.to_json(path)

you have to manually use the json package

import json
with open(path, "w") as f:
 json.dump(df.to_dict(orient="records"), f)

Additional Context

also note that the ujson project explicitly states

this library has been put into a maintenance-only mode... Users are encouraged to migrate to orjson which is both much faster and less likely to introduce a surprise buffer overflow vulnerability in the future.

so it might be worth migrating to orjson during this development effort

Metadata

Metadata

Assignees

No one assigned

    Labels

    Enhancement IO JSONread_json, to_json, json_normalize Needs TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /