2
\$\begingroup\$

Is there a more elegant and Pythonic way to handle counters in this solution to an exercise from the NLTK book? The exercise asks to print out the context (one word forward and one word back) for every verb of a particular type (tagged VN).

import nltk
wsj = nltk.corpus.treebank.tagged_words(simplify_tags=True)
cdf = nltk.ConditionalFreqDist((tag, word) for (word, tag) in wsj)
wordlist = cdf['VN'].keys()
# Bug 1: strange exceptions in the for loop when iterating over wsj
# Solution: wsj is a custom NLTK type "ConcatenatedCorpusView"
# cast wsj into a native python type "list" for better iteration.
# I am guessing ConcatenatedCorpusView chokes on empty tuples
wsj_list = list(wsj)
# Bug 2: repeated words return index of the first word only
# Solution: to deal with repeated words
# we keep indexing from the last location. The index method
# takes a second parameter
starts_at = 0
for t in wsj_list:
 if t[0] in wordlist and t[1] == 'VN':
 ndx = wsj_list.index(t,starts_at)
 starts_at = ndx + 1
 print wsj_list[ndx-1:ndx+1], ndx
asked Apr 29, 2014 at 18:32
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

You could use enumerate to get the index. It makes the code both simpler and more efficient, as the linear search of index is avoided. I would also suggest unpacking t to (word, tag) to improve readability.

for ndx, (word, tag) in enumerate(wsj_list):
 if word in wordlist and tag == 'VN':
 print wsj[ndx-1:ndx+1], ndx
answered Apr 29, 2014 at 18:53
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.