Skip to main content
Code Review

Return to Revisions

9 of 9
replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/

A bitwise common prefix algorithm in Python

Is my algorithm slow because it has problems? Or, is any bitwise common prefix solution not going to give me the performance I'm looking for?

After profiling my algorithm, I found that over 60% of the time was spent on one line len(os.path.commonprefix((item1, item2))). I'm only looking for the length of the prefix

To solve this I tried to write a bitwise prefix solution

def bit_prefix(a, b):
 min_len = min(len(a), len(b))
 if min_len > 0:
 x = str(bin(
 int(a[::-1].encode('hex'), 16) ^ int(b[::-1].encode('hex'), 16)))
 y = x.strip('0')
 if len(y) is 1:
 return min_len
 else:
 return (len(x) - len(y)) / 8
 else:
 return 0

I've only gotten a marginal improvement in speed with long prefixes

a = 'a' * 1000000 + 'z'
b = 'a' * 900000 + 'z'
timeit.timeit(lambda: bit_prefix(a, b), number=100)
Out[34]: 6.340372534867129
timeit.timeit(lambda: len(os.path.commonprefix((a, b))), number=100)
Out[35]: 7.5483549056534684
print bit_prefix(a, b), len(os.path.commonprefix((a, b)))
900000 900000

And my algorithm performs more poorly with short prefixes

a = 'aaz'
b = 'az'
timeit.timeit(lambda: bit_prefix(a, b), number=1000000)
Out[42]: 3.968956086175467
timeit.timeit(lambda: len(os.path.commonprefix((a, b))), number=1000000)
Out[43]: 1.1592788235707303
print bit_prefix(a, b), len(os.path.commonprefix((a, b)))
1 1

If my algorithm isn't broken and a bitwise solution won't give me the performance boost I'm looking for, can you refer me to a common prefix solution that would outperform os.path.commonprefix?

Here's the bit_pefix profile

Line # Hits Time Per Hit % Time Line Contents
==============================================================
 29 @profile
 30 def bit_prefix(a, b):
 31 36140 81099 2.2 4.2 min_len = min(len(a), len(b))
 32 36140 49386 1.4 2.5 if min_len > 0:
 33 36140 49739 1.4 2.6 x = str(
 34 36140 47846 1.3 2.5 bin(
 35 36140 47232 1.3 2.4 int(
 36 36140 89499 2.5 4.6 a[::-1]
 37 36140 242994 6.7 12.5 .encode('hex'),
 38 36140 164442 4.6 8.4 16)
 39 ^ 
 40 36140 49601 1.4 2.5 int(
 41 36140 88745 2.5 4.6 b[::-1]
 42 36140 216488 6.0 11.1 .encode('hex'),
 43 36140 504425 14.0 25.9 16)))
 44 36140 187571 5.2 9.6 y = x.strip('0')
 45 36140 61027 1.7 3.1 if len(y) is 1:
 46 return min_len
 47 else:
 48 36140 67507 1.9 3.5 return (len(x) - len(y)) / 8
 49 else:
 50 return 0

Update

I've created another algorithm that seems to be faster than others. Here's the Code Review for that.

lang-py

AltStyle によって変換されたページ (->オリジナル) /