Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: Fall back to Spark for trunc / date_trunc functions when format string is unsupported, or is not a literal value #2634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
andygrove merged 18 commits into apache:main from andygrove:date-trunc
Oct 30, 2025

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Oct 22, 2025
edited
Loading

Which issue does this PR close?

Closes #2621

There is a follow-up issue #2649 for another bug discovered while working on this PR.

Rationale for this change

Both the trunc and date_trunc functions accept a format string parameter, such as year or month.

My understanding is that most users would specify a literal string for this parameter. However, it is possible to specify a column, or any other valid Spark expression. Unfortunately, Comet is not compatible with Spark for this use case if the expression evaluates to a format string that is not supported. The query will fail with an error such as:

Unsupported format: "foo" for function 'date_trunc'

The correct behavior would be to return NULL instead.

scala> spark.sql("select now(), a, date_trunc(a, now()) from t1").show
+--------------------+-----+--------------------+
| now()| a|date_trunc(a, now())|
+--------------------+-----+--------------------+
|2025年10月23日 15:57:...| year| 2025年01月01日 00:00:00|
|2025年10月23日 15:57:...|month| 2025年10月01日 00:00:00|
|2025年10月23日 15:57:...| foo| NULL|
+--------------------+-----+--------------------+

The same is true for literal format strings:

scala> spark.sql("select now(), date_trunc('foo', now())").show
+--------------------+----------------------+
| now()|date_trunc(foo, now())|
+--------------------+----------------------+
|2025年10月23日 15:59:...| NULL|
+--------------------+----------------------+

What changes are included in this PR?

  • Fall back to Spark for invalid literal format string
  • Fall back to Spark for non-literal format string
  • Implement an improved test
  • Remove tests specific to the format being an array, which is no longer supported

How are these changes tested?

@andygrove andygrove changed the title (削除) fix: Fix various bugs in trunc / date_trunc functions [WIP] (削除ここまで) (追記) fix: Fix various bugs in trunc / date_trunc functions (追記ここまで) Oct 23, 2025
Copy link

codecov-commenter commented Oct 23, 2025
edited
Loading

Codecov Report

❌ Patch coverage is 91.17647% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.29%. Comparing base (f09f8af) to head (8d15e51).
⚠️ Report is 644 commits behind head on main.

Files with missing lines Patch % Lines
...c/main/scala/org/apache/comet/serde/datetime.scala 93.93% 0 Missing and 2 partials ⚠️
...on/src/main/scala/org/apache/comet/CometConf.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@ Coverage Diff @@
## main #2634 +/- ##
============================================
+ Coverage 56.12% 59.29% +3.16% 
- Complexity 976 1449 +473 
============================================
 Files 119 147 +28 
 Lines 11743 13789 +2046 
 Branches 2251 2369 +118 
============================================
+ Hits 6591 8176 +1585 
- Misses 4012 4388 +376 
- Partials 1140 1225 +85 

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title (削除) fix: Fix various bugs in trunc / date_trunc functions (削除ここまで) (追記) fix: Fall back to Spark for trunc / date_trunc functions when format string is not a literal value (追記ここまで) Oct 23, 2025
@andygrove andygrove marked this pull request as ready for review October 25, 2025 15:40
@andygrove andygrove changed the title (削除) fix: Fall back to Spark for trunc / date_trunc functions when format string is not a literal value (削除ここまで) (追記) fix: Fall back to Spark for trunc / date_trunc functions when format string is unsupported, or is not a literal value (追記ここまで) Oct 27, 2025
}
}

test("trunc with format array") {
Copy link
Member Author

@andygrove andygrove Oct 27, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer support format being an array

Copy link
Contributor

@parthchandra parthchandra Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is better to keep this functionality rather than lose it for a case that no user has complained about yet.

Copy link
Member Author

@andygrove andygrove Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I made it Incompatible rather than Unsupported and re-instated the tests

"minute",
"hour",
"week",
"quarter")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

week and quarter are listed twice

andygrove reacted with thumbs up emoji
Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

object CometTruncDate extends CometExpressionSerde[TruncDate] {

val supportedFormats: Seq[String] =
Seq("year", "yyyy", "yy", "quarter", "mon", "month", "mm", "week")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of possible formats are:

fmt - the format representing the unit to be truncated to
"YEAR", "YYYY", "YY" - truncate to the first date of the year that the ts falls in, the time part will be zero out
"QUARTER" - truncate to the first date of the quarter that the ts falls in, the time part will be zero out
"MONTH", "MM", "MON" - truncate to the first date of the month that the ts falls in, the time part will be zero out
"WEEK" - truncate to the Monday of the week that the ts falls in, the time part will be zero out
"DAY", "DD" - zero out the time part
"HOUR" - zero out the minute and second with fraction part
"MINUTE"- zero out the second with fraction part
"SECOND" - zero out the second fraction part
"MILLISECOND" - zero out the microseconds
"MICROSECOND" - everything remains

Am I missing a reason why Comet doesn't support smaller than week formats?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@andygrove andygrove Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very confusing in Spark, but date_trunc is actually TruncTimestamp and not TruncDate.

comphead reacted with thumbs up emoji
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove lgtm
Btw when reviewing the code I found custom DateTrunc impl in Comet which probably not in use anymore spark-expr/src/datetime_funcs/date_trunc.rs

If it is I'll create a clean up PR

Copy link
Member Author

Thanks @andygrove lgtm Btw when reviewing the code I found custom DateTrunc impl in Comet which probably not in use anymore spark-expr/src/datetime_funcs/date_trunc.rs

If it is I'll create a clean up PR

I assumed that those were the implementations that we were using, but I could be mistaken

Copy link
Member Author

Thanks for the reviews @hsiang-c and @comphead

@andygrove andygrove merged commit 977a189 into apache:main Oct 30, 2025
103 checks passed
@andygrove andygrove deleted the date-trunc branch October 30, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@comphead comphead comphead approved these changes

@parthchandra parthchandra Awaiting requested review from parthchandra

@mbutrovich mbutrovich Awaiting requested review from mbutrovich

+2 more reviewers

@martin-g martin-g martin-g left review comments

@hsiang-c hsiang-c hsiang-c approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

InternalError: Unsupported format: "" for function 'date_trunc'.

AltStyle によって変換されたページ (->オリジナル) /