BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

skalwaghe-56 wants to merge 2 commits into pandas-dev:main

from skalwaghe-56:fix-issue-61837

Open

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

skalwaghe-56 wants to merge 2 commits into pandas-dev:main from skalwaghe-56:fix-issue-61837

+55 −9

Conversation

skalwaghe-56

Copy link

Contributor

@skalwaghe-56 skalwaghe-56 commented Sep 8, 2025 •

edited

Loading

closes BUG: read_csv() on_bad_lines callable does not raise ParserWarning when index_col is set #61837 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This PR fixes a regression in the CSV parsers when using on_bad_lines as a callable.

Thanks!

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 3 times, most recently from f579800 to d77afef Compare

September 10, 2025 10:43

@skalwaghe-56

Copy link

Contributor Author

skalwaghe-56 commented Sep 10, 2025

@jbrockmendel @rhshadrach If you could please guide me further.

@simonjayhawkins simonjayhawkins added Bug IO CSV labels

Sep 10, 2025

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 4 times, most recently from 7009e84 to 0729267 Compare

September 12, 2025 12:14

@skalwaghe-56

Copy link

Contributor Author

skalwaghe-56 commented Sep 12, 2025

@rhshadrach @jorisvandenbossche When I ran the test locally for the changes 1 test xpassed. Related to #10153 I think.
Its this test

@pytest.mark.parametrize("dtype", [{"b": "category"}, {1: "category"}])
def test_categorical_dtype_single(all_parsers, dtype, request):
 # see gh-10153
 parser = all_parsers
 data = """a,b,c
1,a,3.4
1,a,3.4
2,b,4.5"""
 expected = DataFrame(
 {"a": [1, 1, 2], "b": Categorical(["a", "a", "b"]), "c": [3.4, 3.4, 4.5]}
 )
 if parser.engine == "pyarrow":
 mark = pytest.mark.xfail(
 strict=False,
 reason="Flaky test sometimes gives object dtype instead of Categorical",
 )
 request.applymarker(mark)
 actual = parser.read_csv(StringIO(data), dtype=dtype)
 tm.assert_frame_equal(actual, expected)

I would like you guys to check this out and check the PR too!

Thanks!

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 4 times, most recently from 02e9bd2 to 7f303f7 Compare

September 16, 2025 16:40

rhshadrach

rhshadrach requested changes

Sep 16, 2025

View reviewed changes

pandas/io/parsers/python_parser.py Outdated Show resolved Hide resolved

pandas/tests/io/parser/test_python_parser_only.py Outdated Show resolved Hide resolved

pandas/io/parsers/python_parser.py Outdated Show resolved Hide resolved

pandas/io/parsers/base_parser.py Outdated Show resolved Hide resolved

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch from 7f303f7 to f6887a2 Compare

September 17, 2025 17:00

rhshadrach

rhshadrach requested changes

Sep 17, 2025

View reviewed changes

pandas/io/parsers/python_parser.py Outdated Show resolved Hide resolved

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch from f6887a2 to 014e05f Compare

September 18, 2025 12:15

@skalwaghe-56 skalwaghe-56 requested a review from rhshadrach

September 18, 2025 12:15

skalwaghe-56

skalwaghe-56 commented

Sep 18, 2025

View reviewed changes

Copy link

Contributor Author

@skalwaghe-56 skalwaghe-56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the tests too now. The CI should be successful now.

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch 2 times, most recently from e1f405e to 2fa7f70 Compare

September 20, 2025 09:00

rhshadrach

rhshadrach requested changes

Sep 20, 2025

View reviewed changes

Copy link

Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

pandas/io/parsers/python_parser.py Outdated Show resolved Hide resolved

doc/source/whatsnew/v2.3.3.rst Outdated Show resolved Hide resolved

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch from 2fa7f70 to c103dcc Compare

September 23, 2025 15:34

@skalwaghe-56 skalwaghe-56 requested a review from rhshadrach

September 23, 2025 15:35

skalwaghe-56 added 2 commits

September 24, 2025 15:32

@skalwaghe-56


 BUG: read_csv(on_bad_lines=callable)+index_col should warn; add test

b7d555a

- Always emit ParserWarning and drop extra fields when an on_bad_lines
 callable returns more elements than expected, regardless of index_col,
 in PythonParser._rows_to_cols. [GH#61837]
- Ensure non-bad rows are appended in the outer else branch so good lines
 are preserved.
- Add regression test
 pandas/tests/io/parser/test_python_parser_only.py::test_on_bad_lines_callable_warns_and_truncates_with_index_col
 covering index_col in [None, 0].
Closes pandas-dev#61837.

@skalwaghe-56


 DOC: whatsnew entry for on_bad_lines regression fix (GH#61837)

86d45aa

@skalwaghe-56 skalwaghe-56 force-pushed the fix-issue-61837 branch from c103dcc to 86d45aa Compare

September 24, 2025 10:02

Labels

Bug IO CSV

3 participants

@skalwaghe-56 @rhshadrach @simonjayhawkins

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

Are you sure you want to change the base?

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

Conversation

@skalwaghe-56 skalwaghe-56 commented Sep 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

skalwaghe-56 commented Sep 10, 2025

Uh oh!

skalwaghe-56 commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@skalwaghe-56 skalwaghe-56 left a comment

Choose a reason for hiding this comment

Uh oh!

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

Are you sure you want to change the base?

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

Conversation

@skalwaghe-56 skalwaghe-56 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skalwaghe-56 commented Sep 10, 2025

Uh oh!

skalwaghe-56 commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@skalwaghe-56 skalwaghe-56 left a comment

Choose a reason for hiding this comment

Uh oh!

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@skalwaghe-56 skalwaghe-56 commented Sep 8, 2025 •

edited

Loading