[Help] Spark 4.x status update and new sub-issues · apache/gluten · Discussion #11925

baibaichen
Apr 13, 2026
Collaborator

Hi all,

I've completed a systematic triage of Spark 4.x related test issues. Here's a summary of the new issues created and how to get involved.

1. Simple suite fixes in #11550 — self-assign welcome

Several disabled suites in #11550 only require simple exclude + testGluten rewrites with no core-layer changes. These are marked with an empty Owner or "RC" in the table. If you have permission, feel free to assign yourself directly. Otherwise, leave a comment on the issue and I'll update the owner for you.

These include: GlutenExplainSuite, GlutenPlannerSuite, GlutenProjectedOrderingAndPartitioningSuite, GlutenRemoveRedundantProjectsSuite, GlutenRemoveRedundantSortsSuite, etc.

2. Bug sub-issues under #11550

These require deeper fixes at the Gluten core or C++ layer:

Issue	Description
#11911	Enable Structured Streaming test suites (20 disabled suites)
#11912	JNI and Velox exception handling loses Spark error condition and exception type
#11913	Velox split function returns incorrect results with limit parameter (SPARK-49968)
#11914	Support Parquet struct field compatibility improvements (SPARK-53535)
#11915	Support checksum-based shuffle writers (SPARK-53322)
#11916	Diagnose and enable TODO SQL query test files
#11917	Velox decimal arithmetic does not respect allowPrecisionLoss context (SPARK-53968)
#11918	CastTransformer does not pass per-expression timezone for timestamp formatting

3. Feature sub-issues under #11910 (Spark 4.x new feature tracking)

These are new Spark 4.x features that Gluten/Velox does not yet support natively:

Issue	Description	Spark
#11919	Add TimeType (TIME data type) support (SPARK-51162)	4.1
#11920	Support dual-mode ColumnarToRow nodes (SPARK-51474)	4.1
#11921	Support NullType Parquet read/write (SPARK-54220)	4.1
#11922	Support memory shuffle spill by size threshold (SPARK-49386)	4.1

Additionally, existing issues #11371 (Variant) and #10134 (ANSI mode) have been linked under #11910 as well.

Note on Variant and ANSI

I've taken #11371 (Variant) and #10134 (ANSI mode) mainly to coordinate the overall effort, not to work on them exclusively. As we dig deeper, more sub-issues may be created. Contributions are very welcome.

Feel free to pick up any issue that interests you. Questions and discussions are welcome in the respective issue threads.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help] Spark 4.x status update and new sub-issues #11925

Uh oh!

{{title}}

Uh oh!

baibaichen
Apr 13, 2026
Collaborator

1. Simple suite fixes in #11550 — self-assign welcome

2. Bug sub-issues under #11550

3. Feature sub-issues under #11910 (Spark 4.x new feature tracking)

Note on Variant and ANSI

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

wecharyu
Apr 15, 2026

Select a reply

Uh oh!

[Help] Spark 4.x status update and new sub-issues #11925

Uh oh!

baibaichen Apr 13, 2026 Collaborator

1. Simple suite fixes in #11550 — self-assign welcome

2. Bug sub-issues under #11550

3. Feature sub-issues under #11910 (Spark 4.x new feature tracking)

Note on Variant and ANSI

Replies: 1 comment

Uh oh!

wecharyu Apr 15, 2026

baibaichen
Apr 13, 2026
Collaborator

wecharyu
Apr 15, 2026