\$\begingroup\$
\$\endgroup\$
I just finished writing this Python script to calculate daily additions and subtractions from my git log to use in making pretty graphs. This is a rewrite of something I wrote previously in Perl. I have made an attempt to clean it up, but I feel that some of my list comprehensions are messy to say the least, and I'm abusing certain Python features.
#!/usr/bin/env python
# get-pmstats.py
# Henry J Schmale
# November 25, 2017
#
# Calculates the additions and deletions per day within a git repository
# by parsing out the git log. It opens the log itself.
# Produces output as a CSV
#
# This segments out certain file wildcards
import subprocess
from datetime import datetime
from fnmatch import fnmatch
def chomp_int(val):
try:
return int(val)
except ValueError:
return 0
def make_fn_matcher(args):
return lambda x: fnmatch(''.join(map(str, args[2:])), x)
def print_results(changes_by_date):
print('date,ins,del')
for key,vals in changes_by_date.items():
print(','.join(map(str, [key, vals[0], vals[1]])))
EXCLUDED_WILDCARDS = ['*.eps', '*.CSV', '*jquery*']
changes_by_date = {}
git_log = subprocess.Popen(
'git log --numstat --pretty="%at"',
stdout=subprocess.PIPE,
shell=True)
date = None
day_changes = [0, 0]
for line in git_log.stdout:
args = line.decode('utf8').rstrip().split()
if len(args) == 1:
old_date = date
date = datetime.fromtimestamp(int(args[0]))
if day_changes != [0, 0] and date.date() != old_date.date():
changes_by_date[str(date.date())] = day_changes
day_changes = [0, 0]
elif len(args) >= 3:
# Don't count changesets for excluded file types
if True in map(make_fn_matcher(args), EXCLUDED_WILDCARDS):
continue
day_changes = [sum(x) for x in zip(day_changes, map(chomp_int, args[0:2]))]
print_results(changes_by_date)
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Nov 26, 2017 at 2:29
1 Answer 1
\$\begingroup\$
\$\endgroup\$
2
I would apply the following improvements:
- use
collections.defaultdict
to count down the number of added and deleted lines separately. This would help to improve the counting logic and avoid having to check for the old date at all - use
any()
to check for the excluded wildcards - unpack
args
into added, deleted lines and a filename - switch to
gitpython
fromsubprocess
The new version of the code:
from collections import defaultdict
from datetime import datetime
from fnmatch import fnmatch
import git
EXCLUDED_WILDCARDS = ['*.eps', '*.CSV', '*jquery*']
def chomp_int(val):
try:
return int(val)
except ValueError:
return 0
repo = git.Repo(".")
git_log = repo.git.log(numstat=True, pretty="%at").encode("utf-8")
added, deleted = defaultdict(int), defaultdict(int)
for line in git_log.splitlines():
args = line.decode('utf8').rstrip().split()
if len(args) == 1:
date = datetime.fromtimestamp(int(args[0])).date()
elif len(args) >= 3:
added_lines, deleted_lines, filename = args
# Don't count changesets for excluded file types
if any(fnmatch(filename, wildcard) for wildcard in EXCLUDED_WILDCARDS):
continue
added[date] += chomp_int(added_lines)
deleted[date] += chomp_int(deleted_lines)
for date, added_lines in added.items():
print(date, added_lines, deleted[date])
answered Nov 26, 2017 at 4:03
-
\$\begingroup\$ Looks like you have potential for an error on you
elif
's tuple unpacking \$\endgroup\$2017年11月26日 14:40:57 +00:00Commented Nov 26, 2017 at 14:40 -
\$\begingroup\$ @Peilonrayz good point. I was thinking to add a
*_
there, but experimented with a couple repositories I had cloned and could not find a case whereargs
has more than 3 - not sure if it is possible to get with thegit log --numstat --pretty="%at"
. But, yes, it should be== 3
if I am assuming that. Thanks! \$\endgroup\$alecxe– alecxe2017年11月26日 15:30:12 +00:00Commented Nov 26, 2017 at 15:30
lang-py