I'm working with a database that has typos in the values. To compensate for them I've made a function called standardizer()
which removes spaces and converts all letters to uppercase. I do this so the value red in from the database can correctly be interpreted by the program. Also in values starting with the prefix 'sp_etc_'
, I noticed a common mistake that the 't'
is left out, giving 'sp_ec_'
so I compensate for this as well. Bellow is an example:
import sys
#remove all spaces and convert to uppercase, for comparision purposes
def standardizer(str):
str = str.replace("sp_ec_", "sp_etc_")#this is a common typo in the db, the 't' is left out
str = str.replace(" ", "")
str = str.upper()
return str
#this function is for testing purposes and would actually read in values from db
def getUserInput():
return ' this is a test'
def main():
str1 = "I'mok"
str2 = 'this is a test'
str3 = 'two spaces'
str4 = ' spaces at front and end '
str5 = 'sp_ec_blah'
print(str1, standardizer(str1))
print(str2, standardizer(str2))
print(str3, standardizer(str3))
print(str4, standardizer(str4))
print(str5, standardizer(str5))
#this is an example of how the function would actually be used
print(standardizer(str2), standardizer(getUserInput()))
if standardizer(str2) == standardizer(getUserInput()):
print('matched')
else:
print('not a match')
if __name__ == '__main__':
main()
Any suggestions on the standardizer()
function? First off I think it needs a better name. I'm wondering if I should break it into two functions, one being for the missing 't' and the other being for removing spaces and making upper case (by the way, from what I've seen it's more common to convert everything to upper case than lower case, for comparison purposes). Also, how would you comment something like this?
-
1\$\begingroup\$ I think the term that is commonly used is "canonical" instead of "standard". \$\endgroup\$mkrieger1– mkrieger12015年08月06日 15:15:40 +00:00Commented Aug 6, 2015 at 15:15
2 Answers 2
You can simplify avoiding reassignement:
def standardizer(str):
return str.replace("sp_ec_", "sp_etc_").replace(" ", "").upper()
You should use a for loop to avoid repetition in the following lines:
print(str1, standardizer(str1))
print(str2, standardizer(str2))
print(str3, standardizer(str3))
print(str4, standardizer(str4))
print(str5, standardizer(str5))
-
\$\begingroup\$ Yes that's clear. Question about python and indentation, why doesn't having it on a new line mess it up? If you continue the command on a new line do you just need to add 1 more indentation? \$\endgroup\$Celeritas– Celeritas2015年08月06日 07:02:30 +00:00Commented Aug 6, 2015 at 7:02
-
\$\begingroup\$ @Celeritas The only sensible answer is that it works with newlines because the grammar says so. Newlines are for readibility only. \$\endgroup\$Caridorc– Caridorc2015年08月06日 07:04:43 +00:00Commented Aug 6, 2015 at 7:04
-
\$\begingroup\$ oh well then wouldn't you say that's less readable having all those functions chained together? \$\endgroup\$Celeritas– Celeritas2015年08月06日 16:50:29 +00:00Commented Aug 6, 2015 at 16:50
-
\$\begingroup\$ @Celeritas Reassigning a variable like that is noise. You should write in code exactly what You want. DRY \$\endgroup\$Caridorc– Caridorc2015年08月06日 19:38:12 +00:00Commented Aug 6, 2015 at 19:38
Don't test like that: with a main
function printing stuff.
To verify it works correctly, you have to read and understand the output.
Doc tests are perfect for this task:
def sanitize(text):
"""
>>> sanitize("I'mok")
"I'MOK"
>>> sanitize('this is a test')
'THISISATEST'
>>> sanitize('two spaces')
'TWOSPACES'
>>> sanitize(' spaces at front and end ')
'SPACESATFRONTANDEND'
>>> sanitize('sp_ec_blah')
'SP_ETC_BLAH'
"""
text = text.replace("sp_ec_", "sp_etc_")
text = text.replace(" ", "")
text = text.upper()
return text
If your script is in a file called sanitizer.py
, you can run the doc tests with:
python -m doctest sanitizer.py
In this implementation, there is no noise, no messiness, a doc string that explains nicely what the function is expected to do, and the doctest verifies that it actually does it.
Other improvements:
str
shadows the name of a built-in. Better rename that variable to something else.- "standardizer" is not a good name for a function, because it's a noun. Verbs are better, for example "standardize". I went further and used "sanitize", which is more common for this kind of purpose.