Skip to main content
Code Review

Return to Answer

added 1406 characters in body
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210
  1. The functions malloc and realloc might fail if there is not enough memory, and in this case they would return NULL, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.

    The functions malloc and realloc might fail if there is not enough memory, and in this case they would return NULL, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.

Update: a comment was skeptical about this recommendation, noting that on some operating systems, trying to write through a null pointer will result in a segmentation fault, and for some programs this is acceptable behaviour. This might indeed be fine in some cases, but when writing a reusable component like a hash table library, it is important to do better than that. The problems are: (i) Writing through a null pointer does not result in a segmentation fault in all situations, for example the code might be running on an embedded controller using an operating system that does not protect memory at address zero, or on hardware that lacks memory protection at all. The world does not consist only of Windows and Linux. (ii) In the presence of an optimizing compiler, there is no guarantee that you will merely get a segmentation fault. Writing through a null pointer is undefined behaviour, and might lead to data corruption or a security vulnerability instead of (or as well as) a crash. See this article by Harry Lee for some examples. (iii) In many cases it's not acceptable to crash, for example in safety-critical software. There are techniques for coping with memory exhaustion such as refusing requests until memory pressure abates, or garbage collection.

  1. If the hash table is full, and you call insert with a key that hashes to zero (modulo the size of the table) then the while loop will never terminate and the program will hang.

  2. If the hash table is full, and you call insert with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.

  3. In Prime_num, if old_size is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after 2*old_size (114) is 127, which is more than old_size * 2 + 10 (124). (If you compiled with warnings turned on then your compiler would have complained about this.)

  4. In get_item, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.

  5. In get_item, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.

  6. In resize_HashTable, the tmp_array is allocated but never freed. This is a memory leak.

  7. In resize_HashTable, the original bucket structures are not freed. This is a memory leak.

  8. In contains, there is no check against the occupied flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.

  9. The algorithm in delete is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.

  1. The functions malloc and realloc might fail if there is not enough memory, and in this case they would return NULL, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.

  2. If the hash table is full, and you call insert with a key that hashes to zero (modulo the size of the table) then the while loop will never terminate and the program will hang.

  3. If the hash table is full, and you call insert with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.

  4. In Prime_num, if old_size is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after 2*old_size (114) is 127, which is more than old_size * 2 + 10 (124). (If you compiled with warnings turned on then your compiler would have complained about this.)

  5. In get_item, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.

  6. In get_item, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.

  7. In resize_HashTable, the tmp_array is allocated but never freed. This is a memory leak.

  8. In resize_HashTable, the original bucket structures are not freed. This is a memory leak.

  9. In contains, there is no check against the occupied flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.

  10. The algorithm in delete is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.

  1. The functions malloc and realloc might fail if there is not enough memory, and in this case they would return NULL, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.

Update: a comment was skeptical about this recommendation, noting that on some operating systems, trying to write through a null pointer will result in a segmentation fault, and for some programs this is acceptable behaviour. This might indeed be fine in some cases, but when writing a reusable component like a hash table library, it is important to do better than that. The problems are: (i) Writing through a null pointer does not result in a segmentation fault in all situations, for example the code might be running on an embedded controller using an operating system that does not protect memory at address zero, or on hardware that lacks memory protection at all. The world does not consist only of Windows and Linux. (ii) In the presence of an optimizing compiler, there is no guarantee that you will merely get a segmentation fault. Writing through a null pointer is undefined behaviour, and might lead to data corruption or a security vulnerability instead of (or as well as) a crash. See this article by Harry Lee for some examples. (iii) In many cases it's not acceptable to crash, for example in safety-critical software. There are techniques for coping with memory exhaustion such as refusing requests until memory pressure abates, or garbage collection.

  1. If the hash table is full, and you call insert with a key that hashes to zero (modulo the size of the table) then the while loop will never terminate and the program will hang.

  2. If the hash table is full, and you call insert with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.

  3. In Prime_num, if old_size is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after 2*old_size (114) is 127, which is more than old_size * 2 + 10 (124). (If you compiled with warnings turned on then your compiler would have complained about this.)

  4. In get_item, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.

  5. In get_item, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.

  6. In resize_HashTable, the tmp_array is allocated but never freed. This is a memory leak.

  7. In resize_HashTable, the original bucket structures are not freed. This is a memory leak.

  8. In contains, there is no check against the occupied flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.

  9. The algorithm in delete is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.

Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210

I spotted some bugs:

  1. The functions malloc and realloc might fail if there is not enough memory, and in this case they would return NULL, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.

  2. If the hash table is full, and you call insert with a key that hashes to zero (modulo the size of the table) then the while loop will never terminate and the program will hang.

  3. If the hash table is full, and you call insert with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.

  4. In Prime_num, if old_size is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after 2*old_size (114) is 127, which is more than old_size * 2 + 10 (124). (If you compiled with warnings turned on then your compiler would have complained about this.)

  5. In get_item, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.

  6. In get_item, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.

  7. In resize_HashTable, the tmp_array is allocated but never freed. This is a memory leak.

  8. In resize_HashTable, the original bucket structures are not freed. This is a memory leak.

  9. In contains, there is no check against the occupied flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.

  10. The algorithm in delete is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.

This is a common mistake! In The Art of Computer Programming, Knuth comments, "Many computer programmers have great faith in algorithms, and they are surprised to find that the obvious way to delete records from a hash table doesn't work." (Vol. III, p. 533.)

If you want to get this right, Knuth gives an algorithm (pp. 533–4) for deletion in a open hash table with linear probing.

But a simpler approach is to mark the deleted key with a flag meaning "there was a key here but it was deleted" which you later remove when growing or shrinking the table.

lang-c

AltStyle によって変換されたページ (->オリジナル) /