-
Notifications
You must be signed in to change notification settings - Fork 618
March 06, 2026: Weekly Status Update in Gluten #11714
-
This weekly update is generated by LLMs. You're welcome to join our Github for in-depth discussions.
Overall Activity Summary
The past 7 days have seen intense activity across the Gluten project with 60+ pull requests and 20+ active issues. The community is actively preparing for the upcoming 1.6.0 release while simultaneously advancing major features like ANSI mode support, Parquet type widening, and dynamic filtering optimizations. The Velox backend continues to dominate development focus with significant performance improvements and bug fixes.
Key Ongoing Projects
Dynamic Filter Pushdown & Performance Optimizations
- @acvictor is leading major dynamic filtering improvements with [GLUTEN-11605][VL] Push dynamic filters down to ValueStream #11657 implementing dynamic filter pushdown to ValueStream and [GLUTEN-11708][VL] Translate might_contain as a subfield filter for scan-level bloom filter pushdown #11711 translating might_contain as subfield filters for bloom filter pushdown
- @JkSelf's [GLUTEN-7548][VL] Optimize BHJ in velox backend #8931 optimizes broadcast hash joins by building hash tables once per executor, showing 1.29x performance improvement on TPC-DS Q23a
ANSI Mode Support Expansion
- @malinjawi completed ANSI-compliant string to boolean casting ([GLUTEN-10134][VL] Add ANSI mode support for cast string to boolean #11437 ) and is actively working on expanding ANSI support for other type conversions
- @n0r0shi added ANSI mode decimal arithmetic with overflow checking ([VL] Support ANSI mode decimal Add/Subtract with checked overflow #11705 )
Parquet Type Widening & Schema Evolution
- @baibaichen is enabling GlutenParquetTypeWideningSuite for Spark 4.0/4.1 ([GLUTEN-11683][VL] Enable GlutenParquetTypeWideningSuite for Spark 4.0 and 4.1 #11684 , [GLUTEN-11683][VL] Fix SPARK-18108 and parquet-thrift compatibility #11689 ) to support SPARK-40876, fixing type conversion issues and enabling 45+ previously failing tests
Release Preparation
- @zhztheplayer coordinated the 1.6.0 release process with multiple PRs ([CORE] Release 1.6: Prepare for 1.6.0-rc1 #11700 , [INFRA] Release 1.6: Port "Cleanup for TLP release process (#11696)" #11701 , [CORE] Release 1.6: Bump version to 1.6.0 (RC1) #11702 , [INFRA] Cleanup for TLP release process #11696 ) including version bumps and release script updates
Priority Items
Critical Bug Fixes Needed:
- [VL][BUG]Spark UTs from suite DynamicPartitionPruningHiveScanSuite are failing #11692 : Dynamic partition pruning regression in Hive scans causing test failures - @acvictor has draft PR [VL][BUG] Fix DPP regression for Hive scans and add DynamicPartitionPruningHiveScanSuite #11710
- [VL] CrossRelNode's expression is not validated in native validation #11678 : CrossRelNode expression validation missing in native validation - @wecharyu has open PR [GLUTEN-11678][VL] Native validation should check CrossRelNode's expression #11679
- [VL] Iceberg tests failed and were skipped #11630 : Iceberg test failures requiring immediate attention - multiple PRs attempted
Performance Critical:
- [GLUTEN-11605][VL] Push dynamic filters down to ValueStream #11657 : Dynamic filter pushdown to ValueStream (589 additions) - needs review
- [GLUTEN-7548][VL] Optimize BHJ in velox backend #8931 : BHJ optimization (2243 additions, 263 comments) - long-running optimization effort
Notable Discussions
#11585: Useful Velox PRs Tracking - @FelixYBW maintains a comprehensive tracker of 100+ Velox PRs submitted by the Gluten community that haven't been merged upstream, including critical fixes for ANSI mode, Parquet reading, and performance optimizations.
#11713: Apache Gluten Graduation Tasks - @weiting-chen coordinates Gluten's transition from Apache Incubator to Top Level Project, involving repository renaming, documentation updates, and process changes.
#8429: Gluten Slack Channel - @zhouyuan announced the new ASF workspace Slack channel for real-time community discussions.
Emerging Trends
- ANSI Mode as Default: With Spark 4.0 enabling ANSI by default, the community is rapidly implementing ANSI-compliant functions and type conversions
- Dynamic Filtering Revolution: Multiple PRs focus on pushing filters closer to storage for significant performance gains
- Release Quality Focus: Extensive test suite fixes and infrastructure improvements ahead of 1.6.0 release
- Cross-Backend Compatibility: Increased attention to ensuring features work across Velox, ClickHouse, and other backends
Good First Issues
#11699: S3 IMDS Configuration - Add support for Velox's new S3 IMDS configuration options. Good for contributors familiar with cloud storage configurations.
#11703: Iceberg Configuration Mapping - Map Iceberg writer configurations to Velox equivalents. Requires understanding of both Iceberg and Velox configuration systems.
#11513: Fix input_file_name() for Iceberg - Resolve the issue where input_file_name() returns empty strings on Iceberg tables. Good introduction to Iceberg integration.
#10134: ANSI Mode Support - Contribute to the comprehensive ANSI mode implementation. Multiple sub-tasks available for different type casting functions, suitable for contributors wanting to learn Spark's type system.
#11622: TIMESTAMP_NTZ Type Support - Implement support for Spark's TIMESTAMP_NTZ type in Velox backend. Good for learning type system integration between Spark and native engines.
Beta Was this translation helpful? Give feedback.