-
Notifications
You must be signed in to change notification settings - Fork 617
February 20, 2026: Weekly Status Update in Gluten #11638
-
This weekly update is generated by LLMs. You're welcome to join our Github for in-depth discussions.
Overall Activity Summary
The Apache Gluten project has been highly active over the past 7 days with 42 pull requests and 20+ issues, focusing on major infrastructure improvements, performance optimizations, and Spark 4.x compatibility. The community is preparing for the 1.6.0 release while advancing multiple backend enhancements.
Key Ongoing Projects
Build System Modernization
- Gradle Build Support: @liuneng1994 is leading a comprehensive effort ([WIP] [BUILD] Add Gradle build support and replace Maven in CI #11576 ) to add Gradle as an alternative to Maven, featuring multi-version support, native C++ integration, and significant build performance improvements
- Incremental Build Optimization: @baibaichen delivered major improvements ([GLUTEN-11559][Build] Improve incremental build time for test-compile phase #11560 , [GLUTEN-11559][VL] Add incremental C++ build script for fast development iteration #11595 ) reducing incremental build times from ~3 minutes to under 30 seconds through Ninja build system adoption and smart caching
Performance & Memory Management
- Native Delta Statistics Writer: @zhztheplayer achieved remarkable 61% performance improvement ([GLUTEN-10215][VL] Delta write: Native statistics tracker to eliminate C2R overhead #11419 ) by eliminating C2R overhead through native Velox aggregation tasks
- Broadcast Hash Join Optimization: @JkSelf implemented executor-level hash table caching ([GLUTEN-7548][VL] Optimize BHJ in velox backend #8931 ) showing 1.29x performance improvement in TPC-DS benchmarks
- Memory Management: Multiple PRs addressing off-heap memory issues in shuffle operations ([VL][1.5] Not enough spark off-heap execution memory on rss shuffle writer #11542 , [VL][1.5] Not enough spark off-heap execution memory on window #11540 )
Spark 4.x Compatibility
- Python 3.10 Migration: @ReemaAlzaid completed CI updates ([VL] Update CI Python to 3.10 for Spark 4.1 and enable ArrowEvalPythonExecSuite tests #11481 , [VL][CI] Migrate Spark 4.1 tests to CentOS 9 #11519 ) to support Spark 4.1's Python requirements
- Test Suite Stabilization: @baibaichen and team are systematically fixing disabled test suites (Spark 4.x: Tracking disabled test suites #11550 , [GLUTEN-11550][UT] Enable GlutenXmlExpressionsSuite for spark4x and exclude 'from_xml- invalid data' #11580 ) with 51 unique suites across Spark 4.0/4.1 versions
Priority Items
Critical Infrastructure
- GPU CI Infrastructure: @zhouyuan temporarily disabled GPU CI ([GLUTEN-11611][VL] Temporary disable GPU CI job #11612 ) due to FBOS upgrade compatibility issues - needs container updates
- S3 Integration Testing: @Mariamalmesfer enabled comprehensive S3 integration tests ([VL] Add S3 integration gluten tests #11516 ) closing a long-standing gap
Function Support Expansion
- ANSI Mode Implementation: @philo-he is coordinating comprehensive ANSI SQL compliance ([VL] Add ANSI mode support #10134 ) with multiple contributors working on type casting and arithmetic functions
- Missing Spark Functions: @zhztheplayer added support for approx_count_distinct_for_intervals ([VL] Add support for
approx_count_distinct_for_intervals#11599 ) essential for Spark CBO + histogram functionality
Notable Discussions
Release Planning
- Gluten 1.6.0 Release: @zhztheplayer is coordinating the upcoming release (Gluten Release 1.6.0 #11603 ) with version bump completed ([CORE] Bump version to 1.7.0-SNAPSHOT #11592 )
New Backend Introduction
- Bolt Backend Integration: @WangGuangxin initiated discussion (Add a new backend: Bolt #10929 ) about integrating Bolt, a Velox fork from ByteDance with production-hardened features and LLVM-based JIT compilation
Emerging Trends
- AI-Driven Development: Multiple PRs explicitly mention AI tooling usage (Claude, GitHub Copilot) for development acceleration
- Production Optimization: Focus shifting from basic functionality to production-ready features like memory management, performance tuning, and comprehensive testing
- Multi-Backend Strategy: Growing interest in supporting multiple execution backends beyond Velox
- Build Performance: Significant engineering effort on developer experience improvements
Good First Issues
#10134: ANSI Mode Support
Skills needed: Scala, SQL, Type Systems
Why it's good: Comprehensive tracking issue with individual tasks that can be picked up independently, excellent for learning Spark SQL internals
#11513: Input_file_name() returns "" on iceberg tables
Skills needed: Java/Scala, Iceberg integration
Why it's good: Well-defined bug with clear scope, good introduction to Gluten's data lake integration
#11501: Docker Dependency Caching
Skills needed: Docker, CI/CD, Maven
Why it's good: Straightforward infrastructure improvement with clear requirements to pre-install Java dependencies in CI Docker images for faster builds
Beta Was this translation helpful? Give feedback.