A while ago, someone at our office thought it'd be a great idea to start tracking a number of fairly large binary files in one of our more important repositories. We noticed our builds were slowing down (considerably) and fetching new changes from the remotes could take for up to a minute.
We eventually noticed that there were quite a number of reasonably large objects in our repos, and I was appointed to clean up the mess.
seeing as these objects had been added, removed updated, renamed and replaced in various commits, across a fair number of branches, I decided to write a script to rewrite the head's of all of these branches automatically, instead of going through all the files, and all the commits individually.
Not being a Bash ninja, I stuck to what I know, using a lot of $(<command> | grep | sed | awk )
trickery. I've always gotten by using this approach, but I'd really like to know if bash offers some features I've yet to uncover, that would enable me to write, essentially, better scripts. Hence, I'd like to get some feedback on the type of scripts I'm currently writing, and how I could do better:
#!/usr/bin/env bash
SCRIPT=$(basename ${BASH_SOURCE[0]})
verbose=false
idxfile="packidx.log"
forcepush=false
filterflag="--index-filter"
#get current branch
currentbranch=$(git branch | grep '*' | awk '{print 2ドル}')
function Help {
echo "Usage $SCRIPT [-svfh][-i value]:"
echo " -i [packidx.log]: specify an existing file, containing sorted git verify-pack -v output"
echo " Default is to create or prompt to reuse an existing packidx.log file"
echo " -v : verbose output"
echo " -s : slow, use tree-filter instead of index-filter when removing objects"
echo " -f : Force push. Whenever an object is removed from a branch, perform a force-push"
echo " -h : Help. Display this message"
}
function AfterFilter {
if [ "$verbose" = true ] ; then
echo 'cleaning up .git/refs/original and .git/logs, then gc the git DB'
fi
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc
if [ "$verbose" = true ] ; then
echo 'object-count stats after filter'
git count-objects -v
fi
git prune --expire now
if [ "$verbose" = true ] ; then
echo 'object-count stats after prune'
git count-objects -v
fi
echo ''
if [ "$forcepush" = true ] ; then
git push --force
else
read -p 'push the rewritten head? [Y/n]: ' -n 1 -r
if [[ ! $REPLY =~ ^[nN]$ ]] ; then
git push --force
fi
fi
}
if [ $# -gt 0 ] ; then
while getopts :isvfh flag ; do
case $flag in
i)
idxfile=$OPTARG
;;
f)
forcepush=true
;;
v)
verbose=true
;;
s)
filterflag="--tree-filter"
;;
h)
Help
exit 0
;;
\?)
Help
exit 1
;;
esac
done
fi
if [ ! -f $idxfile ]; then
REPLY=y
else
read -p "create $idxfile file? [y/N]: " -n 1 -r
fi
if [[ $REPLY =~ ^[yY]$ ]]
then
echo "Creating $idxfile on branch $currentbranch"
git gc
packfile=$(ls .git/objects/pack/*.idx)
git verify-pack -v "$packfile" | sort -k 3 -n > packidx.log
fi
for objectref in $(tac packidx.log | grep blob | cut -d " " -f1); do
if [ "$verbose" = true ] ; then
echo 'object-count stats'
git count-objects -v
fi
if [ "$verbose" = true ] ; then
echo "get filename for object $objectref"
fi
filename=$(git rev-list --objects --all | grep $objectref | sed -n -e "s/^$objectref //p")
read -p "process all commits modifying $filename? [y/N] " -n 1 -r
if [[ $REPLY =~ ^[Yy]$ ]]
then
if [ "$verbose" = true ] ; then
echo "get all commits modifying $filename"
git log --oneline --branches -- "$filename"
fi
# output is for user info only, use commit refs here:
commits=() #array of commits
commitlength=0
for commit in $(git log --oneline --branches -- "$filename" | awk '{print 1ドル;}'); do
commits[commitlength]=$commit
commitlength=$((commitlength+1))
done
if (( commitlength == 0 )) ; then
echo "No commits found for $filename, must be a dangling object"
else
commitlength=$((commitlength-1)) #last commit
for (( i=commitlength; i>0; i--)); do
#while [ $commitlength -ge 0 ] ; do
for branch in $(git branch --contains ${commits[$i]} | cut -c 3-) ; do
#which branch is rewritten is considered vital info, verbose or not
#echo this line
echo "rewriting $branch for commit ${commits[$i]}"
if [[ ! "$branch" =~ "$currentbranch" ]] ; then
git checkout $branch
fi
git filter-branch --force $filterflag "git rm --ignore-unmatch --cached $filename" --prune-empty -- "${commits[$i]}"^..
AfterFilter
if [ "$verbose" = true ] ; then
echo "$branch rewritten"
fi
if [[ ! "$branch" =~ "$currentbranch" ]] ; then
#return to current branch
git checkout $currentbranch
fi
done
echo $i
done
#checkout the initial branch
git checkout "$currentbranch"
fi
fi
read -p 'continue? [Y/n]: ' -n 1 -r
if [[ $REPLY =~ ^[nN]$ ]]
then
break
fi
done
echo '' #insert blank line
read -p "remove $idxfile? [y/N]: " -n 1 -r
if [[ $REPLY =~ ^[yY]$ ]]; then
rm $idxfile
fi
1 Answer 1
The shebang:
#!/usr/bin/env bash
is not needed.env
is used to modify the working environment, and since you don't use any of its options, you don't need it. Basically, you're just spending an extra process fork. Just use bash directly:#!/bin/bash
.You name your variable
SCRIPT
, which is a useful name but also uppercase. Uppercase variables are usually reserved by the bash environment. While this is not particularly wrong, you should avoid it to prevent overwriting an important reserved variable likePATH
.You get the script name from the command:
basename ${BASH_SOURCE[0]}
. While this is also not wrong, the modern way would be:{0##\*/}
, which is more efficient since it doesn't call any program. Also, since you're only using this variable once in theHelp
function, it would make more sense to embed${0##\*/}
directly without creating a new variable as it is a ready-made variable.You name your usage function
Help
. This is not wrong of course, but a more traditional Unix-ish name would beusage
.In the
Help
function, you callecho
several times, which is clearly not so efficient (or well-looking). A traditional solution would be to usecat
like this:cat << _EOF__ Usage .... .. _EOF__
You test:
[ "$verbose" = true ]
. If you already know that$verbose
would be eithertrue
orfalse
(which you do, in this case), you can just:if $verbose; then
becausetrue
is treated as a command this time, and it will simply return 0.Since you are regularly checking for true values, you can use a shortcut:
$var && if_true_commands
. So instead of sayingif [ "$verbose" = true ] ; then echo ..
you can just say$verbose && echo ..
. For example, you say:if [[ ! $REPLY =~ ^[nN]$ ]] ; then git push --force fi
You can say:
[[ $REPLY =~ ^[nN]$ ]] || git push --force
This is not all that different, but that's how modern bash scripting style does it, and you have to admit, that's fancier.
Small notes:
You say on line 85:
packfile=$(ls .git/objects/pack/*.idx)
. However, by using glob expansion, you've already listed the files yourself. So, practically, usingecho
would be slightly more efficient.I noticed that you usually omit double-quoting variables and commands. This might result in undesired expansion behaviour, so you just get the first word of output usually accompanied by some bash errors and warnings if more than one word was split.
You use
echo ''
to output an empty line. Well, believe it or not, empty arguments like this one is just ignored by bash, so what you really ran wasecho
. Hopefully for you,echo
just outputs an empty line when it doesn't get any arguments, which is what you wanted after all.For further syntax checking and suggestions, you can check shellcheck.net.
-
\$\begingroup\$ I didn't know about shellcheck.net, nifty! \$\endgroup\$jacwah– jacwah2015年07月24日 19:07:02 +00:00Commented Jul 24, 2015 at 19:07
-
\$\begingroup\$ ditto... thanks for letting me know about shellcheck \$\endgroup\$Elias Van Ootegem– Elias Van Ootegem2015年07月26日 11:25:34 +00:00Commented Jul 26, 2015 at 11:25
git rm
? \$\endgroup\$git filter-branch
bit. The problem is that the repo has had several hundred Mb's in bin files added to it. you can remove these files from git usinggit rm
, but the commits containing those files still hold a reference to them, and sogit rm
will not remove the objects from the repo, resulting in agit clone
that pulls close to 2Gb's, half of which is old bin files that shouldn't have been tracked in the first place \$\endgroup\$