I'm working on a Python function that takes a piece of text and indicates if words are modified by words like very or not.
I'm looking for instance to label very nice differently from not nice. This does what it's supposed to do but I feel it might be clunky or there might be a more very efficient approach I'm not aware of.
def polarizer(input_sentence, modifier_dict, window_length):
"""
The polarizer attaches the label given in modifier_dict to all items in input_sentence occurring a window_length after the modifier.
modifier_dict needs to be formatted like so: {'not':'-', 'yes':'+', ..}
Returns list with labels added.
"""
#find positions of modifiers
indices= [(range(n+1, n+1+window_length), modifier_dict[i]) for n,i in enumerate(input_sentence) if i in modifier_dict.keys()]
#iterate over modifiers
for ind in indices:
for n in ind[0]:
#add label unless there is already a label attached
if n < len(input_sentence) and not input_sentence[n].endswith(tuple(modifier_dict.values())):
input_sentence[n]=input_sentence[n]+ind[1]
outputsentence= input_sentence
return outputsentence
#Test sentences
modifier_dict={'very':'+'}
test1="This is fine".split(" ")
test2="This is very fine".split(" ")
print polarizer(test1, modifier_dict, 2)
print polarizer(test2, modifier_dict, 2)
1 Answer 1
When there're comments in the code that tell what it does, it usually indicates that the following piece of code should be a separate function with a meaningful name. I'd create a separate function that the positions of the words to be modified (and it something like
get_indices_to_modify
) and a separate function that checks if a word already has a label.You can make your code more readable the tuples from the
indices
list:for modification_range, label in indices
looks better thanfor ind in indices: # Do something with ind[0] and ind[1]
You can also use the
+=
operator in this line:input_sentence[n] += ind[1]
The names
n
andi
are kind of confusing in your code.i
normally stands for a counter whilen
stands for the number of elements. For instance, I'd changen
andi
in the first line of yourpolarize
function toidx
andword
(wheni
is not the counter, the thing gets pretty confusing).Assuming that it's python 2, you can save some space by using
xrange
instead ofrange
. It can a have a serious impact if thewindow_length
gets really big.What is the point of creating the
outputsentence
variable just to return it? Why not return theinput_sentence
directly?The
polarize
function actually changes theinput_sentence
. Your documentation doesn't say anything about it. This could lead to unexpected results for the users of your code. You should either make a copy and not change the input, or document the fact the input is changed explicitly.Instead of printing the return value of the function to the standard output (and checking that it's correct by looking at it), I'd recommend to write unit-tests. This way, you would be able to check that code remains correct after your change it automatically (moreover, it's quite easy to overlook something when you just look at the output with your eyes).
-
\$\begingroup\$ Thanks so much! That makes a lot of sense & I'll implement those! Quick follow-up question: You write I'd recommend to write unit-tests. What exactly does that mean in this context? Making sure that every piece of the code returns sth valid or throws an appropriate error? (I googled around but could not quite see how it applies to this..) Thanks! \$\endgroup\$patrick– patrick2017年03月13日 11:46:01 +00:00Commented Mar 13, 2017 at 11:46
-
\$\begingroup\$ @patrick What you do now is printing the output to the screen and seeing if it matches the desired output. You can do a similar thing with unit-tests (but it would compare the output and the desired answer automatically). And yes, testing different edge cases (including invalid input) is a good idea. \$\endgroup\$kraskevich– kraskevich2017年03月13日 20:38:45 +00:00Commented Mar 13, 2017 at 20:38
-
\$\begingroup\$ I see, that makes sense! The printout thing I mainly set up for here&I'll set up something more systematic for the real deal. Thanks so much for taking the time to look at this! \$\endgroup\$patrick– patrick2017年03月13日 21:44:06 +00:00Commented Mar 13, 2017 at 21:44