2

I am trying to design a HashTable from scratch. I am starting with an initial bucket size of 11 and trying to maintain a load factor 0.75.

Java documentation mentions that whenever the number of items reaches the load factor the HashTable increases the size of buckets by double.

My question is if I am starting with initial bucket size of 11 and I double the buckets as soon as the load factor is reached the number of buckets would be 22. Now that the load factor is reached I need to rehash the table and 22 will be used as to rehash the keys (because I am using division hashing method) but since 22 is not a prime it will cause a lot more collisions.

What is the best way to increase the bucket size in this case?

BobDalgleish
4,7495 gold badges20 silver badges23 bronze badges
asked Feb 20, 2020 at 14:48
1
  • I found a document with Java code which shows how to generates prime numbers greater than the bucket size. Maybe this is the right way to go about it. Document - cs.columbia.edu/~allen/S14/NOTES/hash.pdf (page 3) Commented Feb 20, 2020 at 16:17

2 Answers 2

1

You decide a factor by how much you want to increase the size when you increase it, and a starting size. For example, you picked a starting size of 11 and a factor 2 for the increase.

Before you go any further, you write a little program that prints the smallest prime>= 11 ( which is 11), the smallest prime>= 22 (which is 23), the smallest prime>= 2*23 (which is 47) and so on.

Take the output of this program, and add a static array with these sizes to your hashing code, that’s 11, 23, 47, 97, 197 etc. ). When you want to increase the size of the table, and the current size is x, then you find the smallest number> x in your table, and that is the new size.

You may experiment with different increases. For example a factor 1.5 instead of 2.0 will force you to resize the table more often, but may reduce waste of memory. For example if you have 1,000,000 entries then going to 1.5 million instead of 2 million saves 25% of the memory - as long as the 1.5 million is enough.

Some implementations allow giving a capacity when the hash table is created. If you do that, you should assume that what you were told is the truth or close. So if you are told "capacity one million" you would start the table with number of slots such that filling one million entries gives you a decent load factor, say 1.5 million slots or whatever you find is a good number. And you wouldn’t resize until the number of slots is much more than the initial capacity. You wouldn’t resize at 1,000,001 entries but maybe 1,050,000.

answered Feb 22, 2020 at 11:16
1
  • Thanks for the explanation. I think I will take the path you have recommended. Commented Feb 22, 2020 at 17:40
1

You could do what .Net does and maintain an array of primes and every time you expanded the table you could just lookup the smallest prime that was>= 2 times as big as the current size. Although once your table gets large enough you will have to calculate primes the old fashion way.

answered Feb 20, 2020 at 17:05
1
  • Thanks, Andrew for your answer. I think I will start by maintaining an array of primes. Commented Feb 20, 2020 at 17:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.