5
\$\begingroup\$

I cleaned a big repository for my team with this Python code. My goal was for every developer in my team to check if some bad email are in the commit, and replace the information by the good one. I use git filter-branch and a for loop in bash.

Because I can't make an array in an array in bash, I created a Python script to handle all the developers in my team.

Any idea on how I can optimize this code? git filter-branch take a long time.

# coding=utf-8
import subprocess
import os
def generate_command(dev):
 emails_string = ""
 for email in dev["emails"]:
 emails_string += '"%s" ' % email
 return """git filter-branch -f --env-filter 'OLD_EMAILS=(%s)
CORRECT_NAME="%s"
CORRECT_EMAIL="%s"
for email in ${OLD_EMAILS[@]};
do
 if [ "$GIT_COMMITTER_EMAIL" = "$email" ]
 then
 export GIT_COMMITTER_NAME="$CORRECT_NAME"
 export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
 fi
 if [ "$GIT_AUTHOR_EMAIL" = "$email" ]
 then
 export GIT_AUTHOR_NAME="$CORRECT_NAME"
 export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
 fi
done' --tag-name-filter cat -- --branches --tags""" % (emails_string.strip(),
 dev["author_name"],
 dev["author_email"])
developers = [
 {
 "emails": ["[email protected]", "[email protected]"],
 "author_name": "first dev",
 "author_email": "[email protected]"
 },
 {
 "emails": ["[email protected]", "[email protected]"],
 "author_name": "second dev",
 "author_email": "[email protected]"
 }
]
if __name__ == '__main__':
 for developer in developers:
 subprocess.call(generate_command(developer), shell=True)
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Dec 11, 2014 at 16:09
\$\endgroup\$
1
  • \$\begingroup\$ Don't run filter-branch more than once. Write an external script that you can pass the author/committer information to and which will give you back the correct information than use that in a single-pass run of filter-branch. filter-branch will refuse to run a second time unless you force it or clean up the previous filter-branch saved history anyway. \$\endgroup\$ Commented Dec 11, 2014 at 18:34

2 Answers 2

3
\$\begingroup\$

First reaction: wow this is scary: Python script generating Bash which again calls some Bash in it. But I see the filter-env technique comes straight out from an example in the docs.

I would have written this in pure Bash, using a helper function that takes as parameters:

  • author name
  • author email
  • one or more bad email addresses

And then for each bad email address, call git filter-branch like you did, but all in pure Bash.

As far as the Python part is concerned, this can be done better:

emails_string = ""
for email in dev["emails"]:
 emails_string += '"%s" ' % email

Using a list comprehension:

emails_string = " ".join(['"%s"' % email for email in dev["emails"]])

With this, you don't need to .strip() the emails_string when you generate the command string.

answered Dec 11, 2014 at 19:26
\$\endgroup\$
2
\$\begingroup\$

Expanding on my comment.

You don't want to run filter-branch more than once. It is a very expensive process and you shouldn't need to do that anyway in general. Not to mention that, by default, it will refuse to run a second time in a given repo unless you force it (with the --force flag) if it can find a refs/original ref already.

Your "problem" that caused you to go down this multiple-filter-branch path was that you couldn't figure out how to the the mapping of "bad" email addresses to "good" usernames and "good" email addresses and so you wanted to use python for that for its more featureful arrays/etc.

Ok, but that's the only bit you "need" python for so just use it for that. Write a python script that you can pass the original email information to and which will give you back the correct information to use in the env-filter.

Something like this:

import sys
developers = [
 {
 "emails": ["[email protected]", "[email protected]"],
 "author_name": "first dev",
 "author_email": "[email protected]"
 },
 {
 "emails": ["[email protected]", "[email protected]"],
 "author_name": "second dev",
 "author_email": "[email protected]"
 }
]
for dev in developers:
 if sys.argv[1] in dev["emails"]:
 print 'export GIT_%s_NAME="%s"' % (sys.argv[2], dev["author_name"],)
 print 'export GIT_%s_EMAIL="%s"' % (sys.argv[2], dev["author_email"],)
 break

Which you can use in your env-filter script as:

eval $(python email_map.py "$GIT_COMMITTER_EMAIL" COMMITTER)
eval $(python email_map.py "$GIT_AUTHOR_EMAIL" AUTHOR)
answered Dec 11, 2014 at 20:33
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.