-
Notifications
You must be signed in to change notification settings - Fork 618
March 13, 2026: Weekly Status Update in Gluten #11761
GlutenPerfBot
started this conversation in
General
-
This weekly update is generated by LLMs. You're welcome to join our Github for in-depth discussions.
Overall Activity Summary
The Gluten project celebrated its graduation as an Apache Top-Level Project (TLP) this week, with extensive cleanup efforts to remove "incubator" references across the codebase. Development activity remained strong with 47 merged PRs and 21 open PRs, focusing heavily on Velox backend improvements, Spark 4.x compatibility, and memory management optimizations.
Key Ongoing Projects
- TLP Graduation Cleanup: Major effort by @weiting-chen and team to remove all incubator references from source code, CI workflows, Dockerfiles, and documentation (Update repository references from incubator-gluten to gluten after TLP graduation #11735 , Remove Incubating references from source code #11737 , Remove DISCLAIMER file after TLP graduation #11738 , Update release scripts and template for TLP graduation #11739 , Update GitHub CI workflows for TLP graduation #11741 , Update dev scripts and Dockerfiles for TLP graduation #11742 )
- Spark 4.x Test Suite Enablement: @baibaichen continues enabling disabled test suites, with significant progress on variant handling and XML functions ([GLUTEN-11550][VL][UT] Enable Variant test suites #11726 , [GLUTEN-11550][VL] Enable GlutenDataFrameSubquerySuite for Spark 4.1 #11727 , [GLUTEN-11550][UT] Enable GlutenXmlFunctionsSuite for Spark 4.0 and 4.1 #11725 )
- Velox Backend Optimization: Multiple performance improvements including hash join optimizations ([GLUTEN-7548][VL] Optimize BHJ in velox backend #8931 by @JkSelf), dynamic filter pushdown ([GLUTEN-11605][VL] Push dynamic filters down to ValueStream #11657 by @acvictor), and memory management enhancements
- Parquet Type Widening Support: @baibaichen leading effort to enable GlutenParquetTypeWideningSuite with 84 tests for Spark 4.x compatibility ([VL] Support type widening in Parquet reader (SPARK-40876) #11683 , [GLUTEN-11683][VL] Enable GlutenParquetTypeWideningSuite for Spark 4.0 and 4.1 #11684 , [GLUTEN-11683][VL] Fix SPARK-18108 and parquet-thrift compatibility #11689 , [GLUTEN-11683][VL] Add Parquet type widening support #11719 )
Priority Items
- Memory Management Issues: Critical OOM issues reported (OOM but memory is enough #11747 by @FelixYBW) requiring immediate attention for production deployments
- Dynamic Partition Pruning Regression: Hive scan DPP failures ([VL][BUG]Spark UTs from suite DynamicPartitionPruningHiveScanSuite are failing #11692 by @manikumararyas) affecting Spark 4.x users
- Columnar Shuffle Buffer Issues: DirectByteBuffer availability problems (Columnar shuffle with Velox fails with UnsupportedOperationException: DirectByteBuffer not available #11716 by @ijbgreen) blocking columnar shuffle adoption
- Complex Type Support: Native write validation incorrectly rejecting supported ArrayType/MapType/StructType ([VL] [BUG] Complex types already supported in Velox are considered not supported by Gluten #11746 by @VvanFalleaves)
Notable Discussions
- [VL] useful Velox PRs not merged into upstream #11585 : Comprehensive tracking of useful Velox PRs not yet merged upstream, providing visibility into potential performance improvements
- Gluten Slack Channel #gluten #8429 : New ASF Slack channel setup for community communication and support
- Spark 4.x: Tracking disabled test suites #11550 : Spark 4.x disabled test suite tracking showing 51 unique suites need attention across both Spark versions
Emerging Trends
- Graduation Momentum: Strong community response to TLP status with increased contribution activity
- Spark 4.x Focus: Major push to achieve full Spark 4.0/4.1 compatibility with 79% of disabled test suites now enabled
- Memory Optimization: Multiple PRs addressing memory management, off-heap allocation, and OOM prevention
- Velox Integration Deepening: Daily Velox version updates and close collaboration on upstream features
Good First Issues
- [VL] Support TIMESTAMP_NTZ Type #11622 : Support TIMESTAMP_NTZ Type - Well-defined scope with clear implementation steps for Velox backend
- Map iceberg configuration with Velox configuration #11703 : Map iceberg configuration with Velox configuration - Straightforward configuration mapping task with clear requirements
- [VL] Add ANSI mode support #10134 : Add ANSI mode support - Large but well-organized tracking issue with multiple sub-tasks suitable for new contributors
- [VL] Adding Configrations for S3 IMDS #11699 : S3 IMDS configuration support - Simple configuration addition task with clear implementation path
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment