Return to Answer

Commonmark migration

edited Jun 10, 2020 at 13:24

The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.

###Efficiency/Logic###

Efficiency/Logic

It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of base ** (len(pattern) - 1) before your loop. Exponentiation is an expensive function, so it's better to do it only one time.

###Readability###

Readability

Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
Might make more sense to return None instead of -1 in case of no match.

Modified code

def letter_position(letter):
 return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
 base = 26
 p_hash = 0
 s_hash = 0
 for letter in pattern:
 p_hash = p_hash * base + letter_position(letter)
 first_letter_base = base ** (len(pattern) - 1)
 for i, letter in enumerate(source):
 s_hash = s_hash * base + letter_position(letter)
 i < len(pattern) - 1:
 continue
 
 match_start_index = i - len(pattern) + 1
 if s_hash == p_hash:
 return match_start_indexj
 s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
 return None

The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.

###Efficiency/Logic###

It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of base ** (len(pattern) - 1) before your loop. Exponentiation is an expensive function, so it's better to do it only one time.

###Readability###

Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
Might make more sense to return None instead of -1 in case of no match.

Modified code

def letter_position(letter):
 return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
 base = 26
 p_hash = 0
 s_hash = 0
 for letter in pattern:
 p_hash = p_hash * base + letter_position(letter)
 first_letter_base = base ** (len(pattern) - 1)
 for i, letter in enumerate(source):
 s_hash = s_hash * base + letter_position(letter)
 i < len(pattern) - 1:
 continue
 
 match_start_index = i - len(pattern) + 1
 if s_hash == p_hash:
 return match_start_indexj
 s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
 return None

The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.

Efficiency/Logic

It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of base ** (len(pattern) - 1) before your loop. Exponentiation is an expensive function, so it's better to do it only one time.

Readability

Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
Might make more sense to return None instead of -1 in case of no match.

Modified code

def letter_position(letter):
 return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
 base = 26
 p_hash = 0
 s_hash = 0
 for letter in pattern:
 p_hash = p_hash * base + letter_position(letter)
 first_letter_base = base ** (len(pattern) - 1)
 for i, letter in enumerate(source):
 s_hash = s_hash * base + letter_position(letter)
 i < len(pattern) - 1:
 continue
 
 match_start_index = i - len(pattern) + 1
 if s_hash == p_hash:
 return match_start_indexj
 s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
 return None

Source Link

answered Feb 3, 2017 at 2:07

Hesham Attia

answered Feb 3, 2017 at 2:07

Hesham Attia

The code looks good, this probably improves the runtime in some cases, but I am not sure if hash matches happen often enough for this to be effective on average.

###Efficiency/Logic###

It's probably a good idea to check for the pattern size at the beginning of your method, to avoid overflows. This technically won't happen in python as it will be automatically change the storage to big-int, but the performance of that will be much worse.
Minor optimization for runtime is to save the value of base ** (len(pattern) - 1) before your loop. Exponentiation is an expensive function, so it's better to do it only one time.

###Readability###

Try to avoid one letter variables like i and j, except for numeric loop indices. A more expressive variable name would improve readability a lot.
Look for repeated parts in your code, and see if it makes sense to extract them into functions. One example in your code is calculating the letter index which is repeated multiple times.
Might make more sense to return None instead of -1 in case of no match.

Modified code

def letter_position(letter):
 return ord(letter) - ord('A')
def rabin_karp_match(source, pattern):
 base = 26
 p_hash = 0
 s_hash = 0
 for letter in pattern:
 p_hash = p_hash * base + letter_position(letter)
 first_letter_base = base ** (len(pattern) - 1)
 for i, letter in enumerate(source):
 s_hash = s_hash * base + letter_position(letter)
 i < len(pattern) - 1:
 continue
 
 match_start_index = i - len(pattern) + 1
 if s_hash == p_hash:
 return match_start_indexj
 s_hash = s_hash - letter_position(source[match_start_index]) * first_letter_base
 return None

lang-py