The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.
###Efficiency/Logic###
Efficiency/Logic
It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of
base ** (len(pattern) - 1)
before your loop. Exponentiation is an expensive function, so it's better to do it only one time.
###Readability###
Readability
- Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
- Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
- Might make more sense to return
None
instead of-1
in case of no match.
Modified code
def letter_position(letter):
return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
base = 26
p_hash = 0
s_hash = 0
for letter in pattern:
p_hash = p_hash * base + letter_position(letter)
first_letter_base = base ** (len(pattern) - 1)
for i, letter in enumerate(source):
s_hash = s_hash * base + letter_position(letter)
i < len(pattern) - 1:
continue
match_start_index = i - len(pattern) + 1
if s_hash == p_hash:
return match_start_indexj
s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
return None
The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.
###Efficiency/Logic###
It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of
base ** (len(pattern) - 1)
before your loop. Exponentiation is an expensive function, so it's better to do it only one time.
###Readability###
- Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
- Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
- Might make more sense to return
None
instead of-1
in case of no match.
Modified code
def letter_position(letter):
return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
base = 26
p_hash = 0
s_hash = 0
for letter in pattern:
p_hash = p_hash * base + letter_position(letter)
first_letter_base = base ** (len(pattern) - 1)
for i, letter in enumerate(source):
s_hash = s_hash * base + letter_position(letter)
i < len(pattern) - 1:
continue
match_start_index = i - len(pattern) + 1
if s_hash == p_hash:
return match_start_indexj
s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
return None
The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.
Efficiency/Logic
It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of
base ** (len(pattern) - 1)
before your loop. Exponentiation is an expensive function, so it's better to do it only one time.
Readability
- Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
- Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
- Might make more sense to return
None
instead of-1
in case of no match.
Modified code
def letter_position(letter):
return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
base = 26
p_hash = 0
s_hash = 0
for letter in pattern:
p_hash = p_hash * base + letter_position(letter)
first_letter_base = base ** (len(pattern) - 1)
for i, letter in enumerate(source):
s_hash = s_hash * base + letter_position(letter)
i < len(pattern) - 1:
continue
match_start_index = i - len(pattern) + 1
if s_hash == p_hash:
return match_start_indexj
s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
return None
The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.
###Efficiency/Logic###
It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of
base ** (len(pattern) - 1)
before your loop. Exponentiation is an expensive function, so it's better to do it only one time.
###Readability###
- Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
- Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
- Might make more sense to return
None
instead of-1
in case of no match.
Modified code
def letter_position(letter):
return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
base = 26
p_hash = 0
s_hash = 0
for letter in pattern:
p_hash = p_hash * base + letter_position(letter)
first_letter_base = base ** (len(pattern) - 1)
for i, letter in enumerate(source):
s_hash = s_hash * base + letter_position(letter)
i < len(pattern) - 1:
continue
match_start_index = i - len(pattern) + 1
if s_hash == p_hash:
return match_start_indexj
s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
return None