Newest 'data-compaction' Questions

Stack Overflow

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

13 questions

Newest Active Bountied Unanswered

Advice

0 votes

0 replies

31 views

Sorted Runs vs SST Files in RocksDB Universal Compaction

This documentation here states that in Universal Compaction with num_levels=1 then the entire database can be written into a single SST file. I understand that the entire database will be written into ...

cmcnealy's user avatar

cmcnealy

asked Nov 2, 2025 at 19:01

0 votes

0 answers

31 views

is Apache Kafka compaction key (md5 hash) collision safe?

When I read the code of the SkimpyOffsetMap https://github.com/apache/kafka/blob/4.0.0/storage/src/main/java/org/apache/kafka/storage/internals/log/SkimpyOffsetMap.java#L148 I see if (Arrays.equals(...

raphaelauv's user avatar

raphaelauv

1,039

asked Jun 24, 2025 at 1:27

0 votes

2 answers

385 views

Data in hive table is changed after running a compaction in pyspark

Following previously asked question adding link. in short: I wrote a file compactor in spark, the way that it works is by reading all files under a directory into a dataframe, performing coalesce over ...

Liran Eliyahu's user avatar

Liran Eliyahu

asked Jul 16, 2023 at 13:30

1 vote

1 answer

376 views

Directory size increased after compaction using pyspark

I wrote a file compactor using pyspark. The way that it works is by reading all the content of a directory into a spark dataframe and then performing a repartition action in order to reduce the number ...

Liran Eliyahu's user avatar

Liran Eliyahu

asked Jul 5, 2023 at 7:44

1 vote

0 answers

45 views

Using multiple TTL values in Cassandra table

What are the disadvantages of using multiple TTL values(One in table level and another for specific data rows to override the TTL for those rows) in Cassandra table.Will it result into incomplete data ...

Cassandra Thrift's user avatar

Cassandra Thrift

asked Nov 1, 2022 at 16:01

0 votes

2 answers

3k views

Kafka - changing log.cleanup.policy to existing topic

I have a Kafka topic that receives many many messages. Many of them have the same key and I'm interested only in the latest messages. Looking around this topic seems perfect for the config log.cleanup....

freedev's user avatar

freedev

30.9k

asked Oct 25, 2022 at 16:18

1 vote

1 answer

508 views

Does etcd's storage footprint grow linearly with respect to keys and values?

I noticed that, when running some stress tests on a Kubernetes cluster, etcd snapshot sizes didnt increase much, even as I added more and more stuff to my cluster. I collected snapshots via: etcdctl --...

jayunit100's user avatar

jayunit100

17.7k

asked Oct 24, 2022 at 18:14

0 votes

1 answer

428 views

rocksdb all compaction jobs done notification

I use rocksdb's bulk loading mechanism to load a bunch of sst files generated by offline spark tasks. In order to avoid a large number of disk IO during the loading and compacting process from ...

user2260241's user avatar

user2260241

asked May 19, 2022 at 14:11

1 vote

1 answer

697 views

CouchDB 3.2 disable auto compaction for a specific database

How can I disable auto compaction in couchdb 3.2? I want to preserve all the history for a specific database. Or completely disable auto compaction. note) couchdb(3.2) configuration has changed from 2....

Zeta's user avatar

Zeta

asked May 7, 2022 at 15:43

0 votes

1 answer

348 views

How to free disk space from Cassandra when a lot of tombstones have collected in sizeTieredCompaction strategy

I am running cqlsh version 5.0.1, having a 6 node cluster, where recently I have done a major data cleanup in a table that uses sizeTieredCompaction strategy in order to free some disk space but that ...

Yash Tandon's user avatar

Yash Tandon

asked Jan 3, 2022 at 6:02

0 votes

1 answer

221 views

hbase: For major compaction config does not take effect

I have do the config :habse.offpeak.end.hour:22 ,hbase.offpeak.start.hour: 18.hbase.hregion.majorcompaction: 86400000.but hbase still do major compaction in random time ,like:9:00 ,13:55 and so on. ...

haiwangch's user avatar

haiwangch

asked Nov 26, 2021 at 8:06

0 votes

1 answer

716 views

How to remove old revisions of the documents in a couchdb database?

I have a very large database with some GB of data. And when I try to compact it's taking me more than 12 hours. Is there any other way to delete old revisions? Does the _revs_limit help in this. I can ...

Rahib Rasheed's user avatar

Rahib Rasheed

asked Oct 12, 2021 at 7:23

-2 votes

1 answer

126 views

Which compaction strategy is recommended for a table with minimal updates [closed]

I am looking for compaction strategy for the data which has following characteristics We don't need the data after 60-90 days. At extreme scenarios maybe 180 days. Ideally insert happens and updates ...

vineeth kanaparthi's user avatar

vineeth kanaparthi

2,393

asked Jul 8, 2021 at 9:18

CollectivesTM on Stack Overflow

Sorted Runs vs SST Files in RocksDB Universal Compaction

is Apache Kafka compaction key (md5 hash) collision safe?

Data in hive table is changed after running a compaction in pyspark

Directory size increased after compaction using pyspark

Using multiple TTL values in Cassandra table

Kafka - changing log.cleanup.policy to existing topic

Does etcd's storage footprint grow linearly with respect to keys and values?

rocksdb all compaction jobs done notification

CouchDB 3.2 disable auto compaction for a specific database

How to free disk space from Cassandra when a lot of tombstones have collected in sizeTieredCompaction strategy

hbase: For major compaction config does not take effect

How to remove old revisions of the documents in a couchdb database?

Which compaction strategy is recommended for a table with minimal updates [closed]

Hot Network Questions