13 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
Advice
0
votes
0
replies
31
views
Sorted Runs vs SST Files in RocksDB Universal Compaction
This documentation here states that in Universal Compaction with num_levels=1 then the entire database can be written into a single SST file.
I understand that the entire database will be written into ...
0
votes
0
answers
31
views
is Apache Kafka compaction key (md5 hash) collision safe?
When I read the code of the SkimpyOffsetMap https://github.com/apache/kafka/blob/4.0.0/storage/src/main/java/org/apache/kafka/storage/internals/log/SkimpyOffsetMap.java#L148
I see
if (Arrays.equals(...
0
votes
2
answers
385
views
Data in hive table is changed after running a compaction in pyspark
Following previously asked question adding link.
in short:
I wrote a file compactor in spark, the way that it works is by reading all files under a directory into a dataframe, performing coalesce over ...
1
vote
1
answer
376
views
Directory size increased after compaction using pyspark
I wrote a file compactor using pyspark.
The way that it works is by reading all the content of a directory into a spark dataframe and then performing a repartition action in order to reduce the number ...
1
vote
0
answers
45
views
Using multiple TTL values in Cassandra table
What are the disadvantages of using multiple TTL values(One in table level and another for specific data rows to override the TTL for those rows) in Cassandra table.Will it result into incomplete data ...
0
votes
2
answers
3k
views
Kafka - changing log.cleanup.policy to existing topic
I have a Kafka topic that receives many many messages. Many of them have the same key and I'm interested only in the latest messages. Looking around this topic seems perfect for the config log.cleanup....
1
vote
1
answer
508
views
Does etcd's storage footprint grow linearly with respect to keys and values?
I noticed that, when running some stress tests on a Kubernetes cluster, etcd snapshot sizes didnt increase much, even as I added more and more stuff to my cluster.
I collected snapshots via:
etcdctl --...
0
votes
1
answer
428
views
rocksdb all compaction jobs done notification
I use rocksdb's bulk loading mechanism to load a bunch of sst files generated by offline spark tasks. In order to avoid a large number of disk IO during the loading and compacting process from ...
1
vote
1
answer
697
views
CouchDB 3.2 disable auto compaction for a specific database
How can I disable auto compaction in couchdb 3.2?
I want to preserve all the history for a specific database.
Or completely disable auto compaction.
note) couchdb(3.2) configuration has changed from 2....
0
votes
1
answer
348
views
How to free disk space from Cassandra when a lot of tombstones have collected in sizeTieredCompaction strategy
I am running cqlsh version 5.0.1, having a 6 node cluster, where recently I have done a major data cleanup in a table that uses sizeTieredCompaction strategy in order to free some disk space but that ...
0
votes
1
answer
221
views
hbase: For major compaction config does not take effect
I have do the config :habse.offpeak.end.hour:22 ,hbase.offpeak.start.hour: 18.hbase.hregion.majorcompaction: 86400000.but hbase still do major compaction in random time ,like:9:00 ,13:55 and so on.
...
0
votes
1
answer
716
views
How to remove old revisions of the documents in a couchdb database?
I have a very large database with some GB of data. And when I try to compact it's taking me more than 12 hours. Is there any other way to delete old revisions? Does the _revs_limit help in this. I can ...
-2
votes
1
answer
126
views
Which compaction strategy is recommended for a table with minimal updates [closed]
I am looking for compaction strategy for the data which has following characteristics
We don't need the data after 60-90 days. At extreme scenarios maybe 180 days.
Ideally insert happens and updates ...