6
\$\begingroup\$

I've isolated this function, cons, for hash-consing in C. It is where the bottleneck of my program is. How could its performance be improved?

var32 cons(var32 x, var32 y){
 var32 ptr;
 uid id, hashedId;
 map newMap, currentMap;
 u32 colls;
 #ifdef __DEBUG__
 ++consCount;
 colls = 0;
 #endif
 // Before anything, compute the unique id of that node.
 id = (((uid)(x.ptr))<<32) | (uid)y.ptr;
 // Try to find the node. If it already exists, return its ptr.
 hashedId = id % MAP_HASH_SIZE;
 currentMap = mapByHashedId[hashedId];
 if (currentMap) do {
 if (mapId[currentMap] == id)
 return mapPtr[currentMap];
 #ifdef __DEBUG__
 ++colls;
 #endif
 } while ((currentMap = mapNext[currentMap]));
 #ifdef __DEBUG__
 if (colls<16) ++collisionCount[colls];
 ++allocCount;
 #endif
 // If does not exist, then alloc a new ptr to it.
 ptr = nodeNewPtr[nodeNewPtrIndex--];
 // Add that new ptr to the map.
 newMap = mapNewPtr[mapNewPtrIndex--];
 if (mapByHashedId[hashedId]) 
 mapNext[newMap] = mapByHashedId[hashedId];
 mapId[newMap] = id;
 mapPtr[newMap] = ptr;
 mapByHashedId[hashedId] = newMap;
 // Fill that new ptr.
 nodeLeft[GETPTR(ptr)] = x;
 nodeRight[GETPTR(ptr)] = y;
 return ptr;
};

Here is a link for complete source with tests!

asked Jul 14, 2014 at 14:22
\$\endgroup\$
2
  • \$\begingroup\$ Your code link has gone dead, so I cannot see the complete source. \$\endgroup\$ Commented Oct 25, 2014 at 14:31
  • \$\begingroup\$ I'm sorry, @rolfl. Fixed it now. \$\endgroup\$ Commented Oct 26, 2014 at 0:07

1 Answer 1

6
\$\begingroup\$

Your code here is hard to follow because your code style and format are inconvenient.

Structures like this:

currentMap = mapByHashedId[hashedId];
if (currentMap) do {
 if (mapId[currentMap] == id)
 return mapPtr[currentMap];
 #ifdef __DEBUG__
 ++colls;
 #endif
} while ((currentMap = mapNext[currentMap]));

are .... irregular. a better format would be:

currentMap = mapByHashedId[hashedId];
while (currentMap)
{ 
 if (mapId[currentMap] == id)
 return mapPtr[currentMap];
 #ifdef __DEBUG__
 ++colls;
 #endif
 currentMap = mapNext[currentMap]
}

Additionally, this is the only loop in your method. All the other code are simple array and pointer manipulations. This method can only be slow because of three reasons:

  1. it is called many, many times
  2. the while loop iterates many many times
  3. there's something else you're not showing.

Assuming the number of calls are appropriate, and you're not missing anything, the only alternative is that the while loop loops often. This indicates there are many collisions. Collisions happen for two reasons:

  1. your MAP_HASH_SIZE is too small
  2. your hashing function is irregular

I am going to go with both of those. In fact, I am going to guess that MAP_HASH_SIZE is something like 8192, or some other power of 2. The reason this is my (educated) guess, is that your hash function is vulnerable to inconsistencies.

Consider how you build your function, using two 32-bit values with the x value shifted and or'd with the y value. Let's also use some test data:

 x y
--- ---
0x1 0x1
0x2 0x1
0x3 0x1
0x4 0x1

Now, in your function you would produce the ID by shifting the X, and the hash, by the modulo of the table size. Let's assume the table size is 32768 (a nice big value that should eliminate collisions, right)?

 x y id hashedId
--- --- ---------- --------
0x1 0x1 0x10000001 0x1
0x2 0x1 0x20000001 0x1
0x3 0x1 0x30000001 0x1
0x4 0x1 0x40000001 0x1

Because your MAP_HASH_SIZE is likely a power-of-2, the % MAP_HASH_SIZE is essentially the same as a bitmask of the low-order bits (or 0x7fff for size 32768)

If all your y values are similar, you will end up with many collisions in just a few buckets in the hash table.

For the hash function I would consider guaranteeing a minimum size (say 16 bits, or 64K), and then having a function which at least shifts some of the x value in to the range, and does an Xor....

hashedId = ((x << 7) ^ (y * 31)) % MAP_HASH_SIZE;

The above function will give you better distribution (perhaps not perfect) even if the hash size is a power of 2> about 16 bits.

answered Oct 25, 2014 at 14:29
\$\endgroup\$
1
  • \$\begingroup\$ Great answer. Since this question is kinda old, I've learned some of those issues in the meanwhile, and it has improved a lot in too many (some obvious) ways. I'll be using your hash function. Thanks! \$\endgroup\$ Commented Oct 26, 2014 at 0:12

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.