I'm just learning about the regular expressions, and found this problem on the internet.It's a nice practice for a beginner in regexp.
Problem instruction:
Have the function take the string parameter which will contain single digit numbers, letters and question marks, and check if there are exactly 3 question marks between every pair of two numbers that add up to (n).
The things I'm most interested from this code review:
- The regexp I used is quite simple and limited. I isolated all the strings with the 3 question marks between the pair of two numbers, with it. Then I used a separate function to iterate over the array holding those strings, and checked the sum of the first and last character for each string. I'm wondering is it possible to do the checking in the regexp itself?
- Usefulness of the testing function used for the regexp search. What would be an appropriate way to test something like this?
- Naming convention
- Code cleanliness
- Structuring of the code
- Commenting of the code sample
import string
import random
import re
def str_generrator( size, chars = string.ascii_lowercase + string.digits ):
""" Generates a random string of lowercase letters and single digit
numbers, with the length specified by the user.
Arguments:
type:<int> - An integer representing the length of the generated string.
type:<str> - A string containing the digits and lowercase letters.
Return:
type:<str> - Random generated string.
"""
return ''.join( random.SystemRandom().choice( chars ) for c in range( size ))
#----------------------------------------------------------------------------------
def pattern_search( str ):
"""Finds a specific string from the input using a regexp text
string for describing a search pattern. The function will check if there
are exactly 3 question marks between every pair of two numbers.
Arguments:
type:<str> - A string where the search will be preformed.
Return:
type:<arr> - An array containing the results of the search.
"""
return re.findall( r'\d[A-Za-z]*?\?[A-Za-z]*?\?[A-Za-z]*?\?[A-Za-z]*?\d', str )
#----------------------------------------------------------------------------------
def shuffle_string( string1, string2 = '???????????????????' ):
"""Joins and makes a random shuffle of two input strings.
Creating a new string with the length of first input string.
Arguments:
type:<str> - Input string one
type:<str> - Input string two
Return:
type:<str> - String containing mixed values of input strings.
"""
# Concatenate the inputs, and convert it to the list.
string = string1 + string2
strlst = list( string )
# Shuffles the list, converts it to the string and returns the results.
random.SystemRandom().shuffle( strlst )
shuffled_string = ''.join( strlst[:len( string1 )])
return shuffled_string
#----------------------------------------------------------------------------------
def calc_additions( num, str_arr ):
"""Calculates if the input string first and last character add up to the
number specified by the user.
Arguments:
type<int> - A required sum for the input strings characters.
type<arr> - An array with strings on which the calculations are
preformed.
Return:
type<arr> - An array holdin the results of the calculations.
"""
results = []
for string in str_arr:
if( int( string[0] ) + int( string[-1] ) == num ):
results.append( string )
return results
#----------------------------------------------------------------------------------
def main():
search_str = ''
patterns = []
# Test for the regexp search.
for c in range( 1000 ):
search_str = shuffle_string( str_generrator(40) )
patterns = pattern_search( search_str )
print(' Search string : {}'.format( search_str ))
print(' Patterns found : {}'.format( patterns ))
print(' Valid patterns : {}'.format( calc_additions( 10, patterns)))
#----------------------------------------------------------------------------------
if __name__ == "__main__":
main()
1 Answer 1
Good job documenting the code!
Here are some of the things I would improve in the proposed code and also some random ideas and high-level thoughts:
- remove the
#---...--
lines - just use 2 blank lines between the top-level blocks like function definitions - you are overusing spaces in the expressions and statements. Remove the extra spaces after the
(
and before the)
. When defining keyword arguments, PEP8 recommends to not use spaces, e.g.chars=string.ascii_lowercase + string.digits
(no spaces around the=
) - variable and function naming issues.
str_generrator
should probably be renamed to something more descriptive likegenerate_input_string()
orgenerate_test_string()
. Variable names likestr
andstring
shadowing the built-instr
and the importedstring
module. We can also use some more descriptive names elsewhere - e.g.desired_sum
instead ofnum
in thecalc_additions()
function - loop variable
c
is unused, use_
for this kind of throwaway variable names search_str
andpatterns
are unused in themain()
function, remove them- typo:
holdin
->holding
you can use a list comprehension together with an extended unpacking in the
calc_additions()
function:def calc_additions(desired_sum, strings): """docstring here """ return [string for first_string, *_, last_string in strings if (int(first_string) + int(last_string)) == desired_sum]
you can use
f-strings
to report results. Consider, may be, even using a multi-linef-string
, like:print(f""" Search string : {search_str} Patterns found : {patterns} Valid patterns : {calc_additions( 10, patterns)}""")
using type annotations would also be a good idea to improve on self-documenting
- you can pre-compile the regular expression using
re.compile()
and then using it to callfindall()
on - also, look into
.finditer()
to have an iterator and avoid having an extra list of matches - it might also make sense to use a regular expression verbose mode to improve readability of the expression
- what if you would initialize
random.SystemRandom()
instance once and re-use?
if
statement in parens. See python.org/dev/peps/pep-0008 for more style guide. You don't need to "convert" string to list to call use withjoin()
. Just use the string itself.SystemRandom
might be a bit excessive for this task... it's just a test after all. Also, your regexp doesn't do what the description says. A regex cannot do this in principle, since the language you describe isn't regular. Howevertry: map(int, s.split('???')); return True; except: return False
would do. \$\endgroup\$\d+([?]{3}\d+)*
regexp. Sorry for confusing you. Well, another proof that you shouldn't believe everything you read on the internet :) \$\endgroup\$if sum == n then count('?') == 3
. But you seem to be checkingif count('?') == 3 then sum == n
. \$\endgroup\$