1
\$\begingroup\$

I'm working with a database that has typos in the values. To compensate for them I've made a function called standardizer() which removes spaces and converts all letters to uppercase. I do this so the value red in from the database can correctly be interpreted by the program. Also in values starting with the prefix 'sp_etc_', I noticed a common mistake that the 't' is left out, giving 'sp_ec_' so I compensate for this as well. Bellow is an example:

import sys
#remove all spaces and convert to uppercase, for comparision purposes
def standardizer(str):
 str = str.replace("sp_ec_", "sp_etc_")#this is a common typo in the db, the 't' is left out
 str = str.replace(" ", "")
 str = str.upper()
 return str
#this function is for testing purposes and would actually read in values from db
def getUserInput():
 return ' this is a test'
def main():
 str1 = "I'mok"
 str2 = 'this is a test'
 str3 = 'two spaces'
 str4 = ' spaces at front and end '
 str5 = 'sp_ec_blah'
 print(str1, standardizer(str1))
 print(str2, standardizer(str2))
 print(str3, standardizer(str3))
 print(str4, standardizer(str4))
 print(str5, standardizer(str5))
 #this is an example of how the function would actually be used
 print(standardizer(str2), standardizer(getUserInput()))
 if standardizer(str2) == standardizer(getUserInput()):
 print('matched')
 else:
 print('not a match')
if __name__ == '__main__':
 main()

Any suggestions on the standardizer() function? First off I think it needs a better name. I'm wondering if I should break it into two functions, one being for the missing 't' and the other being for removing spaces and making upper case (by the way, from what I've seen it's more common to convert everything to upper case than lower case, for comparison purposes). Also, how would you comment something like this?

jonrsharpe
14k2 gold badges36 silver badges62 bronze badges
asked Aug 6, 2015 at 6:10
\$\endgroup\$
1
  • 1
    \$\begingroup\$ I think the term that is commonly used is "canonical" instead of "standard". \$\endgroup\$ Commented Aug 6, 2015 at 15:15

2 Answers 2

1
\$\begingroup\$

You can simplify avoiding reassignement:

def standardizer(str):
 return str.replace("sp_ec_", "sp_etc_").replace(" ", "").upper()

You should use a for loop to avoid repetition in the following lines:

print(str1, standardizer(str1))
print(str2, standardizer(str2))
print(str3, standardizer(str3))
print(str4, standardizer(str4))
print(str5, standardizer(str5))
answered Aug 6, 2015 at 6:36
\$\endgroup\$
4
  • \$\begingroup\$ Yes that's clear. Question about python and indentation, why doesn't having it on a new line mess it up? If you continue the command on a new line do you just need to add 1 more indentation? \$\endgroup\$ Commented Aug 6, 2015 at 7:02
  • \$\begingroup\$ @Celeritas The only sensible answer is that it works with newlines because the grammar says so. Newlines are for readibility only. \$\endgroup\$ Commented Aug 6, 2015 at 7:04
  • \$\begingroup\$ oh well then wouldn't you say that's less readable having all those functions chained together? \$\endgroup\$ Commented Aug 6, 2015 at 16:50
  • \$\begingroup\$ @Celeritas Reassigning a variable like that is noise. You should write in code exactly what You want. DRY \$\endgroup\$ Commented Aug 6, 2015 at 19:38
1
\$\begingroup\$

Don't test like that: with a main function printing stuff. To verify it works correctly, you have to read and understand the output. Doc tests are perfect for this task:

def sanitize(text):
 """
 >>> sanitize("I'mok")
 "I'MOK"
 >>> sanitize('this is a test')
 'THISISATEST'
 >>> sanitize('two spaces')
 'TWOSPACES'
 >>> sanitize(' spaces at front and end ')
 'SPACESATFRONTANDEND'
 >>> sanitize('sp_ec_blah')
 'SP_ETC_BLAH'
 """
 text = text.replace("sp_ec_", "sp_etc_")
 text = text.replace(" ", "")
 text = text.upper()
 return text

If your script is in a file called sanitizer.py, you can run the doc tests with:

python -m doctest sanitizer.py

In this implementation, there is no noise, no messiness, a doc string that explains nicely what the function is expected to do, and the doctest verifies that it actually does it.

Other improvements:

  • str shadows the name of a built-in. Better rename that variable to something else.
  • "standardizer" is not a good name for a function, because it's a noun. Verbs are better, for example "standardize". I went further and used "sanitize", which is more common for this kind of purpose.
answered Aug 6, 2015 at 17:55
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.