-
-
Notifications
You must be signed in to change notification settings - Fork 19k
TST: groupby.sum with large integers #62372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: groupby.sum with large integers #62372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these tests actually verify the error, mainly because everything is below 64 bits.
You should have a test using object
dtype using large integers >64 bits and >128 bits.
Oh!, I miss understod the task sorry. I will change this with a new aproach
Sorry, I installed pandas==1.0.4 on python 3.8.20 and ran the reproduction. The errors occur on 54-63 bits. The datatypes are int64 and uint64.
import pandas as pd for i in range(129): n = 2 ** i df = pd.DataFrame([['A', 14], ['A', n]], columns=['gb', 'val']) gb_sum = df.groupby('gb').sum().values[0][0] df_sum = df.sum().values[1] if gb_sum != df_sum: print(df["val"]) print(f"Trying n = 2 ** {i} '{n}'...") print(f"df.sum().values[1] '{df_sum}' != df.groupby('gb').sum().values[0][0] '{gb_sum}")
Output:
0 14
1 18014398509481984
Name: val, dtype: int64
Trying n = 2 ** 54 '18014398509481984'...
df.sum().values[1] '18014398509481998' != df.groupby('gb').sum().values[0][0] '18014398509482000
0 14
1 36028797018963968
Name: val, dtype: int64
Trying n = 2 ** 55 '36028797018963968'...
df.sum().values[1] '36028797018963982' != df.groupby('gb').sum().values[0][0] '36028797018963984
0 14
1 72057594037927936
Name: val, dtype: int64
Trying n = 2 ** 56 '72057594037927936'...
df.sum().values[1] '72057594037927950' != df.groupby('gb').sum().values[0][0] '72057594037927952
0 14
1 144115188075855872
Name: val, dtype: int64
Trying n = 2 ** 57 '144115188075855872'...
df.sum().values[1] '144115188075855886' != df.groupby('gb').sum().values[0][0] '144115188075855872
0 14
1 288230376151711744
Name: val, dtype: int64
Trying n = 2 ** 58 '288230376151711744'...
df.sum().values[1] '288230376151711758' != df.groupby('gb').sum().values[0][0] '288230376151711744
0 14
1 576460752303423488
Name: val, dtype: int64
Trying n = 2 ** 59 '576460752303423488'...
df.sum().values[1] '576460752303423502' != df.groupby('gb').sum().values[0][0] '576460752303423488
0 14
1 1152921504606846976
Name: val, dtype: int64
Trying n = 2 ** 60 '1152921504606846976'...
df.sum().values[1] '1152921504606846990' != df.groupby('gb').sum().values[0][0] '1152921504606846976
0 14
1 2305843009213693952
Name: val, dtype: int64
Trying n = 2 ** 61 '2305843009213693952'...
df.sum().values[1] '2305843009213693966' != df.groupby('gb').sum().values[0][0] '2305843009213693952
0 14
1 4611686018427387904
Name: val, dtype: int64
Trying n = 2 ** 62 '4611686018427387904'...
df.sum().values[1] '4611686018427387918' != df.groupby('gb').sum().values[0][0] '4611686018427387904
0 14
1 9223372036854775808
Name: val, dtype: uint64
Trying n = 2 ** 63 '9223372036854775808'...
df.sum().values[1] '9223372036854775822' != df.groupby('gb').sum().values[0][0] '9223372036854775808
Then the test are correct, but the bug i thought was fixed still exsist right?
the bug i thought was fixed still exsist right?
It was fixed. I can't reproduce it on main.
the bug i thought was fixed still exsist right?
It was fixed. I can't reproduce it on main.
Hi, now I see that the issue has been closed. Did this even help?
Did this even help?
Your tests are fine, but according to #34681 (comment), the problem raised in the issue was already tested on #48018.
Uh oh!
There was an error while loading. Please reload this page.
This PR adds regression tests for
groupby.sum
with large integers (int64
,uint64
, and nullable dtypes).These tests would have failed before the bug was fixed, and they now pass, ensuring no regressions in the future.