-
Notifications
You must be signed in to change notification settings - Fork 247
fix: Fall back to Spark for trunc / date_trunc functions when format string is unsupported, or is not a literal value
#2634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
trunc / date_trunc functions [WIP] (削除ここまで)trunc / date_trunc functions (追記ここまで)
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@ ## main #2634 +/- ## ============================================ + Coverage 56.12% 59.29% +3.16% - Complexity 976 1449 +473 ============================================ Files 119 147 +28 Lines 11743 13789 +2046 Branches 2251 2369 +118 ============================================ + Hits 6591 8176 +1585 - Misses 4012 4388 +376 - Partials 1140 1225 +85 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
trunc / date_trunc functions (削除ここまで)trunc / date_trunc functions when format string is not a literal value (追記ここまで)
trunc / date_trunc functions when format string is not a literal value (削除ここまで)trunc / date_trunc functions when format string is unsupported, or is not a literal value (追記ここまで)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we no longer support format being an array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it is better to keep this functionality rather than lose it for a case that no user has complained about yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I made it Incompatible rather than Unsupported and re-instated the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
week and quarter are listed twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of possible formats are:
fmt - the format representing the unit to be truncated to
"YEAR", "YYYY", "YY" - truncate to the first date of the year that the ts falls in, the time part will be zero out
"QUARTER" - truncate to the first date of the quarter that the ts falls in, the time part will be zero out
"MONTH", "MM", "MON" - truncate to the first date of the month that the ts falls in, the time part will be zero out
"WEEK" - truncate to the Monday of the week that the ts falls in, the time part will be zero out
"DAY", "DD" - zero out the time part
"HOUR" - zero out the minute and second with fraction part
"MINUTE"- zero out the second with fraction part
"SECOND" - zero out the second fraction part
"MILLISECOND" - zero out the microseconds
"MICROSECOND" - everything remains
Am I missing a reason why Comet doesn't support smaller than week formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very confusing in Spark, but date_trunc is actually TruncTimestamp and not TruncDate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove lgtm
Btw when reviewing the code I found custom DateTrunc impl in Comet which probably not in use anymore spark-expr/src/datetime_funcs/date_trunc.rs
If it is I'll create a clean up PR
Thanks @andygrove lgtm Btw when reviewing the code I found custom DateTrunc impl in Comet which probably not in use anymore
spark-expr/src/datetime_funcs/date_trunc.rsIf it is I'll create a clean up PR
I assumed that those were the implementations that we were using, but I could be mistaken
Uh oh!
There was an error while loading. Please reload this page.
Which issue does this PR close?
Closes #2621
There is a follow-up issue #2649 for another bug discovered while working on this PR.
Rationale for this change
Both the
truncanddate_truncfunctions accept a format string parameter, such asyearormonth.My understanding is that most users would specify a literal string for this parameter. However, it is possible to specify a column, or any other valid Spark expression. Unfortunately, Comet is not compatible with Spark for this use case if the expression evaluates to a format string that is not supported. The query will fail with an error such as:
The correct behavior would be to return
NULLinstead.The same is true for literal format strings:
What changes are included in this PR?
How are these changes tested?