-
Notifications
You must be signed in to change notification settings - Fork 618
-
While benchmarking TPC-H 1TB on Gluten+Velox, we identified a major performance bottleneck compared to Databricks Photon in shuffle-heavy queries (Q8, Q9, Q17, Q18, Q21).
The Evidence
- Photon: Effectively prunes 95-99% of
lineitemrecords using Bloom Filters, keeping Disk I/O below 10%. - Velox: Shows negligible filtering, leading to 30-50% Disk I/O.
- Total Runtime Impact: Photon achieved a 1.83x speedup overall, but when I/O-heavy queries are excluded, the gap narrows to 1.13x.
The Root Cause
Spark has a default limit of 10MB for Bloom Filter, so when we increased the limit to 1GB to make comparable with Databricks Photon, bloom filter is created after config change but filter is negligable.
The Velox backend seems limited by a hardcoded Bloom Filter size of 4,194,304 bits (). This limit is too low to maintain low false-positive rates for 1TB cardinality, rendering the filter ineffective.
Questions for Maintainers
- Are there plans to allow the
maxNumBitsfor Bloom Filters to scale beyond the current hardcoded limit? - Why is Velox failing to generate/utilize the filter as effectively as Photon at this scale?
Spark Plan for Q17
Velox
localhost_18080_history_app-20260202185100-0001_SQL_execution__id=266
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 4 comments 11 replies
-
@FelixYBW can you please look.
Beta Was this translation helpful? Give feedback.
All reactions
-
@shadowmmu Thanks for sharing the findings and analysis.
Gluten actually allows to config this via:
spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits
Looks like in your example run for Q17 with Gluten, Spark run time filter is not triggered. Have you also tried to lower the application side threshold?
spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold = 0
Please also note by default DBX enabled the local caching feature which can significantly improve performance for subsequent queries in a power test run. It also collects runtime statistics, helping later queries generate better execution plans(their CBO is enabled by default)
Beta Was this translation helpful? Give feedback.
All reactions
-
These configs should respect Spark config except maxNumBits since memory issue. CC @zhouyuan
spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits
val RUNTIME_BLOOM_FILTER_MAX_NUM_BITS =
buildConf("spark.sql.optimizer.runtime.bloomFilter.maxNumBits")
.doc("The max number of bits to use for the runtime bloom filter")
.version("3.3.0")
.longConf
.createWithDefault(67108864L)
Beta Was this translation helpful? Give feedback.
All reactions
-
Yes @jinchengchenghh, and also we have dedicated gluten configs as well for the above configs. But like is there any way to increase the values bloomFilter.maxNumBits beyond its default values to check its efficiency on larger scale.
Thanks -khem
Beta Was this translation helpful? Give feedback.
All reactions
-
Please update the value 256 to a big value https://github.com/facebookincubator/velox/blob/main/velox/common/memory/MemoryAllocator.h#L492 to relax the memory restriction and take a try.
I don't try that, but this config default value is set by the size class memory limit.
Beta Was this translation helpful? Give feedback.
All reactions
-
I think it will also require to update here as well to increase the hard-coded limit https://github.com/facebookincubator/velox/blob/da0d2fdba364c5e14ea98536ecae57d3d7562041/velox/core/QueryConfig.h#L1245
Beta Was this translation helpful? Give feedback.
All reactions
-
yes
Beta Was this translation helpful? Give feedback.
All reactions
-
@zhouyuan do you have any suggestion about how can I overcome this runtime filter bottleneck?
Also what should be the performance difference between photon and Velox.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @shadowmmu Thanks for the detailed information — I see the issue now. This appears to be a hard limit in the memory allocator, and modifying line may help(https://github.com/facebookincubator/velox/blob/main/velox/common/memory/MemoryAllocator.h#L492)
Cc: @zhli1142015
To improve BHJ performance, Gluten has a WIP patch (#8931
) that should significantly improve performance for large BHJ workloads.
On the query planning side, DBX performs much better than vanilla Spark, especially on the DS benchmark. There are also several useful optimizations from the Spark community, but not been merged. Some of the cloud vendors also improved on the spark catalyst in their product to improve the planner.
For the Aggregation perf diff, could you please also the the example queries?
thanks, -yuan
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @zhouyuan for you effort,
Its now all clear about the BHJ, bloom filter and Optimized query plan.
For the bloom filter part, need to study Velox code if we can increase the limit without breaking anything.
if you have anything that might help, please share it here.
For the aggregation We have observed the difference in TPCH Q18 for instance.
Here is the related query plans
Velox
image
Photon
image
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @shadowmmu Thanks for the detailed information — I see the issue now. This appears to be a hard limit in the memory allocator, and modifying line may help(https://github.com/facebookincubator/velox/blob/main/velox/common/memory/MemoryAllocator.h#L492) Cc: @zhli1142015
To improve BHJ performance, Gluten has a WIP patch (#8931 ) that should significantly improve performance for large BHJ workloads.
On the query planning side, DBX performs much better than vanilla Spark, especially on the DS benchmark. There are also several useful optimizations from the Spark community, but not been merged. Some of the cloud vendors also improved on the spark catalyst in their product to improve the planner.
For the Aggregation perf diff, could you please also the the example queries?
thanks, -yuan
Internally, we’ve removed this limitation and are using Spark’s default value of 64 MB for bloomFilter.maxNumBits. @shadowmmu Is 1 GB the default setting for all queries? What is the overall impact of this?
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @zhli1142015 , the default limit is 10MB from spark side, I increased the limit to 1GB to compare it with Databrick's photon but it wasn't effective at all, like even after filtering it was nearly returning all the rows.
Performance wise Velox is nearly 5x slower for Q17 compared to Photon with the default limit of 10 MB for bloom filter.
And on full TPCH suite its 2x slower.
Changing to 1GB hasn't effected anything.
And btw how did you increased the bloomFilter.maxNumBits?
Beta Was this translation helpful? Give feedback.
All reactions
-
Velox uses a very simple and different BloomFilter https://github.com/facebookincubator/velox/blob/main/velox/common/base/BloomFilter.h#L120 vs Spark BloomFilter, this may affect the filter efficiency.
Beta Was this translation helpful? Give feedback.