Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

TST: groupby.sum with large integers #62372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SergioGarcia00 wants to merge 3 commits into pandas-dev:main
base: main
Choose a base branch
Loading
from SergioGarcia00:tst/groupby-large-int-sum

Conversation

Copy link

@SergioGarcia00 SergioGarcia00 commented Sep 18, 2025
edited
Loading

This PR adds regression tests for groupby.sum with large integers (int64, uint64, and nullable dtypes).
These tests would have failed before the bug was fixed, and they now pass, ensuring no regressions in the future.

Copy link
Contributor

@Alvaro-Kothe Alvaro-Kothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these tests actually verify the error, mainly because everything is below 64 bits.

You should have a test using object dtype using large integers >64 bits and >128 bits.

Copy link
Author

Oh!, I miss understod the task sorry. I will change this with a new aproach

Copy link
Contributor

Sorry, I installed pandas==1.0.4 on python 3.8.20 and ran the reproduction. The errors occur on 54-63 bits. The datatypes are int64 and uint64.

import pandas as pd
for i in range(129):
 n = 2 ** i
 df = pd.DataFrame([['A', 14], ['A', n]], columns=['gb', 'val'])
 gb_sum = df.groupby('gb').sum().values[0][0]
 df_sum = df.sum().values[1]
 if gb_sum != df_sum:
 print(df["val"])
 print(f"Trying n = 2 ** {i} '{n}'...")
 print(f"df.sum().values[1] '{df_sum}' != df.groupby('gb').sum().values[0][0] '{gb_sum}")

Output:

0 14
1 18014398509481984
Name: val, dtype: int64
Trying n = 2 ** 54 '18014398509481984'...
df.sum().values[1] '18014398509481998' != df.groupby('gb').sum().values[0][0] '18014398509482000
0 14
1 36028797018963968
Name: val, dtype: int64
Trying n = 2 ** 55 '36028797018963968'...
df.sum().values[1] '36028797018963982' != df.groupby('gb').sum().values[0][0] '36028797018963984
0 14
1 72057594037927936
Name: val, dtype: int64
Trying n = 2 ** 56 '72057594037927936'...
df.sum().values[1] '72057594037927950' != df.groupby('gb').sum().values[0][0] '72057594037927952
0 14
1 144115188075855872
Name: val, dtype: int64
Trying n = 2 ** 57 '144115188075855872'...
df.sum().values[1] '144115188075855886' != df.groupby('gb').sum().values[0][0] '144115188075855872
0 14
1 288230376151711744
Name: val, dtype: int64
Trying n = 2 ** 58 '288230376151711744'...
df.sum().values[1] '288230376151711758' != df.groupby('gb').sum().values[0][0] '288230376151711744
0 14
1 576460752303423488
Name: val, dtype: int64
Trying n = 2 ** 59 '576460752303423488'...
df.sum().values[1] '576460752303423502' != df.groupby('gb').sum().values[0][0] '576460752303423488
0 14
1 1152921504606846976
Name: val, dtype: int64
Trying n = 2 ** 60 '1152921504606846976'...
df.sum().values[1] '1152921504606846990' != df.groupby('gb').sum().values[0][0] '1152921504606846976
0 14
1 2305843009213693952
Name: val, dtype: int64
Trying n = 2 ** 61 '2305843009213693952'...
df.sum().values[1] '2305843009213693966' != df.groupby('gb').sum().values[0][0] '2305843009213693952
0 14
1 4611686018427387904
Name: val, dtype: int64
Trying n = 2 ** 62 '4611686018427387904'...
df.sum().values[1] '4611686018427387918' != df.groupby('gb').sum().values[0][0] '4611686018427387904
0 14
1 9223372036854775808
Name: val, dtype: uint64
Trying n = 2 ** 63 '9223372036854775808'...
df.sum().values[1] '9223372036854775822' != df.groupby('gb').sum().values[0][0] '9223372036854775808

Copy link
Author

Then the test are correct, but the bug i thought was fixed still exsist right?

Copy link
Contributor

the bug i thought was fixed still exsist right?

It was fixed. I can't reproduce it on main.

Copy link
Author

SergioGarcia00 commented Sep 19, 2025
edited
Loading

the bug i thought was fixed still exsist right?

It was fixed. I can't reproduce it on main.

Hi, now I see that the issue has been closed. Did this even help?

Copy link
Contributor

Alvaro-Kothe commented Sep 20, 2025
edited
Loading

Did this even help?

Your tests are fine, but according to #34681 (comment), the problem raised in the issue was already tested on #48018.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers
1 more reviewer

@Alvaro-Kothe Alvaro-Kothe Alvaro-Kothe requested changes

Reviewers whose approvals may not affect merge requirements
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

BUG: groupby.sum() is inconsistent with df.sum() for large integers

AltStyle によって変換されたページ (->オリジナル) /