This basic function to detect negation while performing sentiment analysis. So that not good
kind of words can be considered a negative.
This code is working fine. Can someone help to improve it or find if any bug present?
def negate_sequence(self,text):
"""
Detects negations and transforms negated words into "not_" form.
"""
negation = False
delims = "?.,!:;"
result = []
#Here rather then applying split, we can directly feed our extracted symptoms list
words = text.split()
prev = None
pprev = None
for word in words:
# stripped = word.strip(delchars)
stripped = word.strip(delims).lower()
negated = "not_" + stripped if negation else stripped
result.append(negated)
if prev:
bigram = prev + " " + negated
result.append(bigram)
if pprev:
trigram = pprev + " " + bigram
result.append(trigram)
pprev = prev
prev = negated
if any(neg in word for neg in ["not", "n't", "no"]):
negation = not negation
if any(c in word for c in delims):
negation = False
return result
2 Answers 2
Here are a few suggestions on how to improve the code:
State the intent
What is the purpose of the function? What is the input, what is the output? In human words, what does the algorithm do? A docstring
answering those basic questions would be helpful. A series of concise unit tests
would be great.
I ran the function to have a look at the output:
>>> text = "The weather is not good."
>>> result = negate_sequence(self=None, text=text)
>>> print(result)
['the', 'weather', 'the weather', 'is', 'weather is', 'the weather is', 'not', 'is not',
'weather is not', 'not_good', 'not not_good', 'is not not_good']
This doesn't ring a bell with me, so I stopped trying to understand the purpose.
Avoid stateful loops
Iteration i
is coupled to iteration i-1
by the negation
variable, this makes the logic hard to understand and error prone. If you work on bigrams/trigrams, I'd create a list of bigrams/trigrams and iterate over the tuples. This decouples the iterations.
Breakup long functions
This has almost endless benefits, as a starting point see this article. Some possibilities:
- Have the text broken up into all lowercase and without punctuation by
extract_words(text)
- Have the list of trigrams created by
make_trigrams(words)
- Inspect the trigrams by
process(trigrams)
- If needed, have some kind of
aggregate(results)
Once this is done, I guess we are much better to prepared to identify bugs and to further improve functionality.
I can give a suggestion.
Can you just make it more useful by replacing it with real autonyms? That would be way more useful than making it negative without parsing up front.
-
1\$\begingroup\$ Welcome to Code Review. Please note that this question is more than two and a half years old. Also, currently your answer is not a good one since you do not provide a reasoning for your recommendation. \$\endgroup\$AlexV– AlexV2019年05月27日 14:00:11 +00:00Commented May 27, 2019 at 14:00
Explore related questions
See similar questions with these tags.