How can I change my two functions to be more readable functions (i.e. in steps), because I am using pycharm.
import collections
def compute_word_importance(fpath1,fpath2):
mylist1 = []
mylist2 = []
with open(fpath1, 'rt', encoding='UTF-8') as f:
for line1 in f:
for word in line1.split(" "):
if word.strip('\n'):
mylist1.append(word.strip())
with open(fpath2, 'rt', encoding='UTF-8') as x:
for line2 in x:
for word in line2.split(" "):
if word.strip('\n'):
mylist2.append(word.strip())
if mylist1==0 and mylist2==0:
return None
c=collections.Counter(mylist1)
c.subtract(mylist2)
return c
2 Answers 2
I think your for
loops can be changed to:
with open(fpath1, 'rt', encoding='UTF-8') as f:
for line in f:
mylist1.append(word.strip() for word in line.split())
with open(fpath2, 'rt', encoding='UTF-8') as f:
for line in f:
mylist2.append(word.strip() for word in line.split())
At this point in time one realises that these loops are a little to similar, so let us continue with joining them into something like:
for filepath in (fpath1, fpath2):
with open(filepath, 'rt', encoding='UTF-8') as inputfile:
for line in inputfile:
mylistX.append(word.strip() for word in line.split())
But then we see that we can't append everything into mylist1
or mylist2
, which already in the names are indicating with the addition of 1
and 2
that they should really be an array already. So lets do that:
lists = []
for filepath in (fpath1, fpath2):
tmp_list = []
with open(filepath, 'rt', encoding='UTF-8') as inputfile:
for line in inputfile:
tmp_list.append(word.strip() for word in line.split())
lists.append(tmp_list)
And then to wrap it up we need some changes to the end of the code to adjust to the newly created list of lists:
if len(lists[0]) == 0 or len(lists[1]) == 0:
return None
counter = collections.Counter(lists[0])
counter.subtract(lists[1])
return counter
It is left as an exercise for the user to extend the parameters into a list of filepaths, and extend the logic to return a counter with all remaining files subtracted if that is a wanted extension.
Change lines like this:
for word in line1.split(" "):
To lines like:
for word in (w.strip() for w in line1.split(" ")):
UPDATE:
Here are two ways you could tighten up your function:
1:
def compute_word_importance(fpath1,fpath2):
count = collections.Counter()
with open(fpath1, 'rt', encoding='UTF-8') as fp:
for line in f:
count.update([w.strip() for word in line.split()])
with open(fpath2, 'rt', encoding='UTF-8') as fp:
for line in fp:
count.subtract([w.strip() for word in line.split()])
return len(counter) and counter or none
2:
def compute_word_importance(fpath1,fpath2):
def get_words(path):
words = []
with open(path, 'rt', encoding='UTF-8') as fp:
for line1 in f:
words.extend([w.strip() for word in line.split()])
counter = collections.Counter(get_words(fpath1))
counter.subtract(get_words(fpath2))
return len(counter) and counter or none
-
\$\begingroup\$ +one. better than my suggestion in the comment above. And pythonic. And uses a generator expression. \$\endgroup\$aneroid– aneroid2015年12月10日 02:49:47 +00:00Commented Dec 10, 2015 at 2:49
-
\$\begingroup\$ is there any possible way to make my function more readable because it looks complicated to me \$\endgroup\$misha– misha2015年12月10日 02:59:12 +00:00Commented Dec 10, 2015 at 2:59
-
\$\begingroup\$ i updated my post , Can someone please edit my function to simpler way \$\endgroup\$misha– misha2015年12月10日 02:59:23 +00:00Commented Dec 10, 2015 at 2:59
-
\$\begingroup\$ Instead of
(w.strip() for w in line1.split(" "))
you could usemap
:map(str.strip, line1.split())
. You also don't need to pass" "
tosplit
since it will automatically default to splitting based on whitespace. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年12月14日 09:23:44 +00:00Commented Dec 14, 2015 at 9:23