I cleaned a big repository for my team with this Python code. My goal was for every developer in my team to check if some bad email are in the commit, and replace the information by the good one. I use git filter-branch
and a for loop in bash.
Because I can't make an array in an array in bash, I created a Python script to handle all the developers in my team.
Any idea on how I can optimize this code? git filter-branch
take a long time.
# coding=utf-8
import subprocess
import os
def generate_command(dev):
emails_string = ""
for email in dev["emails"]:
emails_string += '"%s" ' % email
return """git filter-branch -f --env-filter 'OLD_EMAILS=(%s)
CORRECT_NAME="%s"
CORRECT_EMAIL="%s"
for email in ${OLD_EMAILS[@]};
do
if [ "$GIT_COMMITTER_EMAIL" = "$email" ]
then
export GIT_COMMITTER_NAME="$CORRECT_NAME"
export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$email" ]
then
export GIT_AUTHOR_NAME="$CORRECT_NAME"
export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
done' --tag-name-filter cat -- --branches --tags""" % (emails_string.strip(),
dev["author_name"],
dev["author_email"])
developers = [
{
"emails": ["[email protected]", "[email protected]"],
"author_name": "first dev",
"author_email": "[email protected]"
},
{
"emails": ["[email protected]", "[email protected]"],
"author_name": "second dev",
"author_email": "[email protected]"
}
]
if __name__ == '__main__':
for developer in developers:
subprocess.call(generate_command(developer), shell=True)
-
\$\begingroup\$ Don't run filter-branch more than once. Write an external script that you can pass the author/committer information to and which will give you back the correct information than use that in a single-pass run of filter-branch. filter-branch will refuse to run a second time unless you force it or clean up the previous filter-branch saved history anyway. \$\endgroup\$Etan Reisner– Etan Reisner2014年12月11日 18:34:56 +00:00Commented Dec 11, 2014 at 18:34
2 Answers 2
First reaction: wow this is scary: Python script generating Bash which again calls some Bash in it. But I see the filter-env technique comes straight out from an example in the docs.
I would have written this in pure Bash, using a helper function that takes as parameters:
- author name
- author email
- one or more bad email addresses
And then for each bad email address, call git filter-branch
like you did,
but all in pure Bash.
As far as the Python part is concerned, this can be done better:
emails_string = "" for email in dev["emails"]: emails_string += '"%s" ' % email
Using a list comprehension:
emails_string = " ".join(['"%s"' % email for email in dev["emails"]])
With this, you don't need to .strip()
the emails_string
when you generate the command string.
Expanding on my comment.
You don't want to run filter-branch more than once. It is a very expensive process and you shouldn't need to do that anyway in general. Not to mention that, by default, it will refuse to run a second time in a given repo unless you force it (with the --force
flag) if it can find a refs/original
ref already.
Your "problem" that caused you to go down this multiple-filter-branch path was that you couldn't figure out how to the the mapping of "bad" email addresses to "good" usernames and "good" email addresses and so you wanted to use python for that for its more featureful arrays/etc.
Ok, but that's the only bit you "need" python for so just use it for that. Write a python script that you can pass the original email information to and which will give you back the correct information to use in the env-filter.
Something like this:
import sys
developers = [
{
"emails": ["[email protected]", "[email protected]"],
"author_name": "first dev",
"author_email": "[email protected]"
},
{
"emails": ["[email protected]", "[email protected]"],
"author_name": "second dev",
"author_email": "[email protected]"
}
]
for dev in developers:
if sys.argv[1] in dev["emails"]:
print 'export GIT_%s_NAME="%s"' % (sys.argv[2], dev["author_name"],)
print 'export GIT_%s_EMAIL="%s"' % (sys.argv[2], dev["author_email"],)
break
Which you can use in your env-filter script as:
eval $(python email_map.py "$GIT_COMMITTER_EMAIL" COMMITTER)
eval $(python email_map.py "$GIT_AUTHOR_EMAIL" AUTHOR)