Load factor and prime number for hashing in HashTable

Question 1

I am trying to design a HashTable from scratch. I am starting with an initial bucket size of 11 and trying to maintain a load factor 0.75.

Java documentation mentions that whenever the number of items reaches the load factor the HashTable increases the size of buckets by double.

My question is if I am starting with initial bucket size of 11 and I double the buckets as soon as the load factor is reached the number of buckets would be 22. Now that the load factor is reached I need to rehash the table and 22 will be used as to rehash the keys (because I am using division hashing method) but since 22 is not a prime it will cause a lot more collisions.

What is the best way to increase the bucket size in this case?

Question 2

I found a document with Java code which shows how to generates prime numbers greater than the bucket size. Maybe this is the right way to go about it. Document - cs.columbia.edu/~allen/S14/NOTES/hash.pdf (page 3)

Question 3

You decide a factor by how much you want to increase the size when you increase it, and a starting size. For example, you picked a starting size of 11 and a factor 2 for the increase.

Before you go any further, you write a little program that prints the smallest prime>= 11 ( which is 11), the smallest prime>= 22 (which is 23), the smallest prime>= 2*23 (which is 47) and so on.

Take the output of this program, and add a static array with these sizes to your hashing code, that’s 11, 23, 47, 97, 197 etc. ). When you want to increase the size of the table, and the current size is x, then you find the smallest number> x in your table, and that is the new size.

You may experiment with different increases. For example a factor 1.5 instead of 2.0 will force you to resize the table more often, but may reduce waste of memory. For example if you have 1,000,000 entries then going to 1.5 million instead of 2 million saves 25% of the memory - as long as the 1.5 million is enough.

Some implementations allow giving a capacity when the hash table is created. If you do that, you should assume that what you were told is the truth or close. So if you are told "capacity one million" you would start the table with number of slots such that filling one million entries gives you a decent load factor, say 1.5 million slots or whatever you find is a good number. And you wouldn’t resize until the number of slots is much more than the initial capacity. You wouldn’t resize at 1,000,001 entries but maybe 1,050,000.

Question 4

Thanks for the explanation. I think I will take the path you have recommended.

Question 5

You could do what .Net does and maintain an array of primes and every time you expanded the table you could just lookup the smallest prime that was>= 2 times as big as the current size. Although once your table gets large enough you will have to calculate primes the old fashion way.

Question 6

Thanks, Andrew for your answer. I think I will start by maintaining an array of primes.

gnasher729 gnasher729 49.3k4 gold badges71 silver badges137 bronze badges · Accepted Answer · 2020-02-22 11:16:27Z

You decide a factor by how much you want to increase the size when you increase it, and a starting size. For example, you picked a starting size of 11 and a factor 2 for the increase.

Before you go any further, you write a little program that prints the smallest prime>= 11 ( which is 11), the smallest prime>= 22 (which is 23), the smallest prime>= 2*23 (which is 47) and so on.

Take the output of this program, and add a static array with these sizes to your hashing code, that’s 11, 23, 47, 97, 197 etc. ). When you want to increase the size of the table, and the current size is x, then you find the smallest number> x in your table, and that is the new size.

You may experiment with different increases. For example a factor 1.5 instead of 2.0 will force you to resize the table more often, but may reduce waste of memory. For example if you have 1,000,000 entries then going to 1.5 million instead of 2 million saves 25% of the memory - as long as the 1.5 million is enough.

Some implementations allow giving a capacity when the hash table is created. If you do that, you should assume that what you were told is the truth or close. So if you are told "capacity one million" you would start the table with number of slots such that filling one million entries gives you a decent load factor, say 1.5 million slots or whatever you find is a good number. And you wouldn’t resize until the number of slots is much more than the initial capacity. You wouldn’t resize at 1,000,001 entries but maybe 1,050,000.

Thanks for the explanation. I think I will take the path you have recommended.

Stack Exchange Network

Load factor and prime number for hashing in HashTable

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Load factor and prime number for hashing in HashTable

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions