Newest 'bucketing' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

9 questions

0 votes

1 answer

57 views

Why 2 tables bucketed by col1 and joined by (col1, col2) are shuffled?

// Enable all bucketing optimizations spark.conf.set("spark.sql.requireAllClusterKeysForDistribution", "false") spark.conf.set("spark.sql.sources.bucketing.enabled&...

user2417458's user avatar

user2417458

asked Dec 25, 2025 at 12:03

0 votes

1 answer

231 views

Bucket records into batches of a certain size in Snowflake

What would be the best way to bucket records into batches of a predefined size? I would like to tag each record with a batch/bucket number for further processing. For example, let's say I have 1110 ...

Marco Roy's user avatar

Marco Roy

5,513

asked May 3, 2024 at 20:09

0 votes

1 answer

272 views

'save' does not support bucketBy and sortBy right now

I am trying to apply bucketing on my dataframe when saving it on HDFS using command bellow. df.write .format("parquet") .bucketBy(200,"groupIdProjection") .sortBy("...

user24123007's user avatar

user24123007

asked Apr 8, 2024 at 17:07

1 vote

0 answers

113 views

What is bucketBy equivalent in spark dataframe V2 API or Iceberg?

We have Spark dataframe V1 API with bucketBy option. df0.write .bucketBy(50, "userid") .saveAsTable("myHiveTable") I don't see similar option in DataFrameWriterV2 API. What ...

user2417458's user avatar

user2417458

asked Jan 16, 2024 at 22:17

2 votes

0 answers

278 views

Why does Spark shuffle the data while joining two partitioned & bucketed tables

I am trying to create a view on top of two tables. Table 1: Partitioned by col1 Bucketed by col2 (no of buckets: 3600) Table 2: Partitioned by col1 Bucketed by col2 ( no of buckets:3600) View: Table1 ...

user2417458's user avatar

user2417458

asked Dec 1, 2023 at 5:25

1 vote

1 answer

76 views

CQL retrieve timeseries data by time range

I have sensors at different locations, each measuring multiple parameters. There will be around 2 millions of measurements per day per sensor. I need to query by location/time range, but the range ...

jernejt's user avatar

jernejt

asked Oct 16, 2023 at 10:23

1 vote

0 answers

752 views

Bucketed joins in PySpark/Iceberg

I'm trying to perform a join between two tables in PySpark using the iceberg format. I'm trying to use bucketing to improve performance, and avoid a shuffle, but it appears to be having no effect ...

swagmasta's user avatar

swagmasta

asked Aug 8, 2023 at 17:07

1 vote

0 answers

514 views

bucketing values in python

I want to split my values associating them with hash between buckets. As HASH I am using embedded hash Python function which range is -abs(sys.maxsize) to sys.maxsize. I have created a function to ...

Jonito's user avatar

Jonito

asked Nov 22, 2022 at 10:07

0 votes

1 answer

374 views

Can I increase number of buckets after table creation in hive?

In hive, once the table is created with n buckets. Is their any way to increase number of buckets?

Deekshith's user avatar

Deekshith

asked Nov 5, 2022 at 12:06

CollectivesTM on Stack Overflow

Why 2 tables bucketed by col1 and joined by (col1, col2) are shuffled?

Bucket records into batches of a certain size in Snowflake

'save' does not support bucketBy and sortBy right now

What is bucketBy equivalent in spark dataframe V2 API or Iceberg?

Why does Spark shuffle the data while joining two partitioned & bucketed tables

CQL retrieve timeseries data by time range

Bucketed joins in PySpark/Iceberg

bucketing values in python

Can I increase number of buckets after table creation in hive?

Hot Network Questions