Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
skalwaghe-56 wants to merge 2 commits into pandas-dev:main
base: main
Choose a base branch
Loading
from skalwaghe-56:fix-issue-61837

Conversation

Copy link
Contributor

@skalwaghe-56 skalwaghe-56 commented Sep 8, 2025
edited
Loading


This PR fixes a regression in the CSV parsers when using on_bad_lines as a callable.

Thanks!

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 3 times, most recently from f579800 to d77afef Compare September 10, 2025 10:43
Copy link
Contributor Author

@jbrockmendel @rhshadrach If you could please guide me further.

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 4 times, most recently from 7009e84 to 0729267 Compare September 12, 2025 12:14
Copy link
Contributor Author

@rhshadrach @jorisvandenbossche When I ran the test locally for the changes 1 test xpassed. Related to #10153 I think.
Its this test

@pytest.mark.parametrize("dtype", [{"b": "category"}, {1: "category"}])
def test_categorical_dtype_single(all_parsers, dtype, request):
 # see gh-10153
 parser = all_parsers
 data = """a,b,c
1,a,3.4
1,a,3.4
2,b,4.5"""
 expected = DataFrame(
 {"a": [1, 1, 2], "b": Categorical(["a", "a", "b"]), "c": [3.4, 3.4, 4.5]}
 )
 if parser.engine == "pyarrow":
 mark = pytest.mark.xfail(
 strict=False,
 reason="Flaky test sometimes gives object dtype instead of Categorical",
 )
 request.applymarker(mark)
 actual = parser.read_csv(StringIO(data), dtype=dtype)
 tm.assert_frame_equal(actual, expected)

I would like you guys to check this out and check the PR too!

Thanks!

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 4 times, most recently from 02e9bd2 to 7f303f7 Compare September 16, 2025 16:40
Copy link
Contributor Author

@skalwaghe-56 skalwaghe-56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the tests too now. The CI should be successful now.

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 2 times, most recently from e1f405e to 2fa7f70 Compare September 20, 2025 09:00
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

- Always emit ParserWarning and drop extra fields when an on_bad_lines
 callable returns more elements than expected, regardless of index_col,
 in PythonParser._rows_to_cols. [GH#61837]
- Ensure non-bad rows are appended in the outer else branch so good lines
 are preserved.
- Add regression test
 pandas/tests/io/parser/test_python_parser_only.py::test_on_bad_lines_callable_warns_and_truncates_with_index_col
 covering index_col in [None, 0].
Closes pandas-dev#61837.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

@rhshadrach rhshadrach Awaiting requested review from rhshadrach

Requested changes must be addressed to merge this pull request.

Assignees
No one assigned
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

BUG: read_csv() on_bad_lines callable does not raise ParserWarning when index_col is set

AltStyle によって変換されたページ (->オリジナル) /