I've isolated this function, cons
, for hash-consing in C. It is where the bottleneck of my program is. How could its performance be improved?
var32 cons(var32 x, var32 y){
var32 ptr;
uid id, hashedId;
map newMap, currentMap;
u32 colls;
#ifdef __DEBUG__
++consCount;
colls = 0;
#endif
// Before anything, compute the unique id of that node.
id = (((uid)(x.ptr))<<32) | (uid)y.ptr;
// Try to find the node. If it already exists, return its ptr.
hashedId = id % MAP_HASH_SIZE;
currentMap = mapByHashedId[hashedId];
if (currentMap) do {
if (mapId[currentMap] == id)
return mapPtr[currentMap];
#ifdef __DEBUG__
++colls;
#endif
} while ((currentMap = mapNext[currentMap]));
#ifdef __DEBUG__
if (colls<16) ++collisionCount[colls];
++allocCount;
#endif
// If does not exist, then alloc a new ptr to it.
ptr = nodeNewPtr[nodeNewPtrIndex--];
// Add that new ptr to the map.
newMap = mapNewPtr[mapNewPtrIndex--];
if (mapByHashedId[hashedId])
mapNext[newMap] = mapByHashedId[hashedId];
mapId[newMap] = id;
mapPtr[newMap] = ptr;
mapByHashedId[hashedId] = newMap;
// Fill that new ptr.
nodeLeft[GETPTR(ptr)] = x;
nodeRight[GETPTR(ptr)] = y;
return ptr;
};
-
\$\begingroup\$ Your code link has gone dead, so I cannot see the complete source. \$\endgroup\$rolfl– rolfl2014年10月25日 14:31:28 +00:00Commented Oct 25, 2014 at 14:31
-
\$\begingroup\$ I'm sorry, @rolfl. Fixed it now. \$\endgroup\$MaiaVictor– MaiaVictor2014年10月26日 00:07:26 +00:00Commented Oct 26, 2014 at 0:07
1 Answer 1
Your code here is hard to follow because your code style and format are inconvenient.
Structures like this:
currentMap = mapByHashedId[hashedId]; if (currentMap) do { if (mapId[currentMap] == id) return mapPtr[currentMap]; #ifdef __DEBUG__ ++colls; #endif } while ((currentMap = mapNext[currentMap]));
are .... irregular. a better format would be:
currentMap = mapByHashedId[hashedId];
while (currentMap)
{
if (mapId[currentMap] == id)
return mapPtr[currentMap];
#ifdef __DEBUG__
++colls;
#endif
currentMap = mapNext[currentMap]
}
Additionally, this is the only loop in your method. All the other code are simple array and pointer manipulations. This method can only be slow because of three reasons:
- it is called many, many times
- the while loop iterates many many times
- there's something else you're not showing.
Assuming the number of calls are appropriate, and you're not missing anything, the only alternative is that the while loop loops often. This indicates there are many collisions. Collisions happen for two reasons:
- your
MAP_HASH_SIZE
is too small - your hashing function is irregular
I am going to go with both of those. In fact, I am going to guess that MAP_HASH_SIZE
is something like 8192, or some other power of 2. The reason this is my (educated) guess, is that your hash function is vulnerable to inconsistencies.
Consider how you build your function, using two 32-bit values with the x
value shifted and or'd with the y
value. Let's also use some test data:
x y
--- ---
0x1 0x1
0x2 0x1
0x3 0x1
0x4 0x1
Now, in your function you would produce the ID by shifting the X, and the hash, by the modulo of the table size. Let's assume the table size is 32768 (a nice big value that should eliminate collisions, right)?
x y id hashedId
--- --- ---------- --------
0x1 0x1 0x10000001 0x1
0x2 0x1 0x20000001 0x1
0x3 0x1 0x30000001 0x1
0x4 0x1 0x40000001 0x1
Because your MAP_HASH_SIZE is likely a power-of-2, the % MAP_HASH_SIZE
is essentially the same as a bitmask of the low-order bits (or 0x7fff
for size 32768)
If all your y
values are similar, you will end up with many collisions in just a few buckets in the hash table.
For the hash function I would consider guaranteeing a minimum size (say 16 bits, or 64K), and then having a function which at least shifts some of the x value in to the range, and does an Xor....
hashedId = ((x << 7) ^ (y * 31)) % MAP_HASH_SIZE;
The above function will give you better distribution (perhaps not perfect) even if the hash size is a power of 2> about 16 bits.
-
\$\begingroup\$ Great answer. Since this question is kinda old, I've learned some of those issues in the meanwhile, and it has improved a lot in too many (some obvious) ways. I'll be using your hash function. Thanks! \$\endgroup\$MaiaVictor– MaiaVictor2014年10月26日 00:12:14 +00:00Commented Oct 26, 2014 at 0:12
Explore related questions
See similar questions with these tags.