key/value store optimized for disk storage

Tim Chase python.list at tim.thechases.com
Fri May 4 13:46:43 EDT 2012


On 05/04/12 12:22, Steve Howell wrote:
> Which variant do you recommend?
>> """ anydbm is a generic interface to variants of the DBM database
> — dbhash (requires bsddb), gdbm, or dbm. If none of these modules
> is installed, the slow-but-simple implementation in module
> dumbdbm will be used.
>> """

If you use the stock anydbm module, it automatically chooses the
best it knows from the ones available:
 import os
 import hashlib
 import random
 from string import letters
 import anydbm
 KB = 1024
 MB = KB * KB
 GB = MB * KB
 DESIRED_SIZE = 1 * GB
 KEYS_TO_SAMPLE = 20
 FNAME = "mydata.db"
 i = 0
 md5 = hashlib.md5()
 db = anydbm.open(FNAME, 'c')
 try:
 print("Generating junk data...")
 while os.path.getsize(FNAME) < 6*GB:
 key = md5.update(str(i))[:16]
 size = random.randrange(1*KB, 4*KB)
 value = ''.join(random.choice(letters)
 for _ in range(size))
 db[key] = value
 i += 1
 print("Gathering %i sample keys" % KEYS_TO_SAMPLE)
 keys_of_interest = random.sample(db.keys(), KEYS_TO_SAMPLE)
 finally:
 db.close()
 print("Reopening for a cold sample set in case it matters")
 db = anydbm.open(FNAME)
 try:
 print("Performing %i lookups")
 for key in keys_of_interest:
 v = db[key]
 print("Done")
 finally:
 db.close()
(your specs said ~6gb of data, keys up to 16 characters, values of
1k-4k, so this should generate such data)
-tkc


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /