I wrote this script to collect evidence of the number of days worked for the purpose of claiming some government tax credits.
I'm looking for some ways to clean it up. I'm especially wondering if there is a cleaner way to uniqueify a list than list(set(my_list)
and maybe a better way to do:
d = dict(zip(commiters, [0 for x in xrange(len(commiters))]))
import os
from pprint import pprint
lines = os.popen('git log --all').read().split('\n')
author_lines = filter(lambda str: str.startswith('Author'), lines)
date_lines = filter(lambda str: str.startswith('Date'), lines)
author_lines = map(lambda str: str[8:], author_lines)
date_lines = map(lambda str: str[8:18].strip(), date_lines)
lines = zip(author_lines, date_lines)
lines = sorted(list(set(lines)), key = lambda tup: tup[0])
commiters = list(set(map(lambda tup: tup[0], lines)))
d = dict(zip(commiters, [0 for x in xrange(len(commiters))]))
for item in lines:
d[item[0]] += 1
pprint(d)
1 Answer 1
For this part:
author_lines = filter(lambda str: str.startswith('Author'), lines)
date_lines = filter(lambda str: str.startswith('Date'), lines)
author_lines = map(lambda str: str[8:], author_lines)
date_lines = map(lambda str: str[8:18].strip(), date_lines)
This might be clearer, not that I have anything against map or filter, but list comprehensions do combine them nicely when you need both:
author_lines = [line[8:] for line in lines if line.startswith('Author')]
date_lines = [line[8:18].strip() for line in lines if line.startswith('Date')]
This:
lines = sorted(list(set(lines)), key = lambda tup: tup[0])
Can become:
lines = sorted(set(lines), key = lambda tup: tup[0])
for slightly less repetition (sorted
automatically converts to a list).
And are you sure the key
is even necessary? Tuples get sorted by elements just fine, the only reason to sort specifically by only the first element is if you want to preserve the original order of lines with the same author, rather than sorting them by the date line.
... Actually, why are you even sorting this at all? I don't see anything in the rest of the code that will work any differently whether it's sorted or not.
For this:
commiters = list(set(map(lambda tup: tup[0], lines)))
Why are you zipping author_lines and date_lines and then unzipping again? Just do:
commiters = set(author_lines)
or am I missing something?
And this:
d = dict(zip(commiters, [0 for x in xrange(len(commiters))]))
for item in lines:
d[item[0]] += 1
You're just getting commit counts, right? Use Counter
:
import collections
d = collections.Counter([author_line for author_line,_ in lines])
Or, if your python version doesn't have collections.Counter
:
import collections
d = collections.defaultdict(lambda: 0)
for author_line,_ in lines:
d[author_line] += 1
... Wait, are you even using date_lines anywhere? If not, what are they there for?
-
1\$\begingroup\$ date_lines is there so that list(set(lines)) will leave behind only the lines that have a unique author and day of commit. I'm trying to count the number of days that a particular author made a commit. Its true that they don't need to be sorted, that was left over from when I changed my algorithm slightly. I also like your version of collecting the author_lines and date_lines. I'll look into the collections library more, I'm not sure if I like Counter, but I definitely like the
for author_line,_ in lines
syntax. \$\endgroup\$Drew– Drew2012年07月04日 14:32:51 +00:00Commented Jul 4, 2012 at 14:32 -
\$\begingroup\$ @Drew - Oh, right! That makes sense. In that case I'd probably do the
list(set(lines))
as you did and then just do a list comprehension to discard all the date lines, since you no longer need them and they only seem to get in the way later. (Also I'd probably rename some of your variables, likelines
, to make it more obvious what they're doing, likeauthor_day_pairs
or something). \$\endgroup\$weronika– weronika2012年07月04日 15:25:13 +00:00Commented Jul 4, 2012 at 15:25 -
\$\begingroup\$ Oooh, getting rid of the date_lines afterwards is a great idea. Thanks! \$\endgroup\$Drew– Drew2012年07月04日 21:13:28 +00:00Commented Jul 4, 2012 at 21:13