Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: re-enable Comet abs #2595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hsiang-c wants to merge 8 commits into apache:main
base: main
Choose a base branch
Loading
from hsiang-c:enable_abs
Open

fix: re-enable Comet abs #2595

hsiang-c wants to merge 8 commits into apache:main from hsiang-c:enable_abs

Conversation

Copy link
Contributor

@hsiang-c hsiang-c commented Oct 16, 2025
edited
Loading

Which issue does this PR close?

Closes #1890
Partially closes #2314

Rationale for this change

What changes are included in this PR?

  • Implemented Spark's ANSI mode that throws org.apache.spark.SparkArithmeticException on the MIN_VALUE of Spark's IntegralType, see doc.
  • In CometTestBase, changed the types of column _9, _10, _11 and _12 from UINT_8/16/32/64 to INT_8/16/32/64 b/c we actually have negative values in test data.

How are these changes tested?

  • unit tests w/ MIN_VALUE and decimal values with different precision and scale.
  • SparkSQL tests

Copy link

codecov-commenter commented Oct 17, 2025
edited
Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.33%. Comparing base (f09f8af) to head (1ddff98).
⚠️ Report is 621 commits behind head on main.

Additional details and impacted files
@@ Coverage Diff @@
## main #2595 +/- ##
============================================
+ Coverage 56.12% 59.33% +3.20% 
- Complexity 976 1444 +468 
============================================
 Files 119 146 +27 
 Lines 11743 13758 +2015 
 Branches 2251 2353 +102 
============================================
+ Hits 6591 8163 +1572 
- Misses 4012 4373 +361 
- Partials 1140 1222 +82 

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Seq(2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 17).foreach { col =>
checkSparkAnswerAndOperator(s"SELECT abs(_${col}) FROM tbl")
test("abs") {
Seq(true, false).foreach { ansi_enabled =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the diff, test with ANSI mode on/off.

| optional int32 _10(UINT_16);
| optional int32 _11(UINT_32);
| optional int64 _12(UINT_64);
| optional int32 _9(INT_8);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store negative values in these columns, I think the schema should not be unsigned int.

// CometTestBase.scala
 record.add(8, (-i).toByte)
 record.add(9, (-i).toShort)
 record.add(10, -i)
 record.add(11, (-i).toLong)

Copy link
Contributor

@parthchandra parthchandra Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have UINT here to make sure we cover all types that parquet has. The data files created here are specifically designed to test whether parquet readers can handle all types correctly. Negative values stored in a UINT parquet type test the values around the boundary of allowed values.
To illustrate with an example, when you store the value -1 in a UINT_8 field what gets stored is the bit pattern 0xff. On reading, this is read back as the value 255 which is the maximum value for a UINT_8.
This is both correct and desirable.

hsiang-c reacted with eyes emoji
@hsiang-c hsiang-c marked this pull request as ready for review October 17, 2025 20:31
Copy link
Contributor

Thanks @hsiang-c WDYT of implementing abs with spark flavor in DF? Like I did recently for concat apache/datafusion#18128

hsiang-c reacted with thumbs up emoji

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@parthchandra parthchandra parthchandra left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Unsupported expressions found in spark sql unit test Add support for abs

AltStyle によって変換されたページ (->オリジナル) /