This is my first bash script so I ask to you to say what you think.
The script must check every directory and if it finds a pdf file in this directory: move all the pdf files to parent directory, change the name of the files and remove directory.
So if I had this folder structure:
folder1
file.pdf
file.txt
folder2
some.pdf
another.pdf
folder3
some.txt
the script must change it to:
folder1.pdf
folder2.pdf
folder2_2.pdf
folder3
some.txt
Here is a script code:
#!/bin/bash
working_directory="$HOME/MEGAsync/downloads/books"
cd $working_directory
# for each directory in working directory
for D in `find . -type d`
do
# skip '.'
if [[ $D != "." ]]; then
# get folder name
folder_name=${D##*/}
# and cd to folder
cd $D
# number of pdf files in the current directory
declare -i pdf_count=0
# for each file in the current directory
for f in *; do
# get file without path
file=$(basename "$f")
extension="${file##*.}"
filename="${file%.*}"
if [[ $extension == 'pdf' ]]; then
pdf_count=pdf_count+1
if [[ pdf_count < 2 ]]; then
new_filename="${working_directory}/${folder_name}.pdf";
else
new_filename="${working_directory}/${folder_name}_${pdf_count}.pdf";
fi
echo cp $working_directory/$folder_name/$f $new_filename
cp $working_directory/$folder_name/$f $new_filename
fi
done
# return to parent folder
cd ..
# delete directory if pdf-file was found
if [[ $pdf_count != "0" ]]; then
echo rm -R $working_directory/$folder_name
rm -R $working_directory/$folder_name
fi
fi
done
2 Answers 2
For a first script, this is great, and I am impressed.
- your structure is neat
- you are using loops properly
- you are using variable substitution which is 'advanced'
The script appears that it will work for all the standard cases, and, I have seen much worse in 'production' code.
So, as a beginner, well done.
What's next?
- the edge cases are going to kill this script - spaces in file names.
- the logging and error handling could be better
- I recommend basename and dirname instead of the variable regexes you use:
- dirname: http://linux.die.net/man/1/dirname
- basename: http://linux.die.net/man/1/basename (note that
basename path/to/file.pdf .pdf
returnsfile
)
- consistency on for->do loop - you have
do
on the next line once, and thenfor ... ; do
the next time. Either is fine, but be consistent.
All of the error-handling, logging, and the spaces-in-files can be solved easily by using a function for running commands. Also, the "$@"
construct in bash is 'special'
Consider the bash function:
function runcmd {
echo "$@"
# Fail if the command fails...
"$@" || exit 1
}
So, this function will print the command, and then will run it, and exit the script if the command fails.
Now, instead of structures like:
echo cp $working_directory/$folder_name/$f $new_filename cp $working_directory/$folder_name/$f $new_filename
you can do:
runcmd cp "$working_directory/$folder_name/$f" $new_filename
and you will still get the echo, you will get the new error handling, and you will handle spaces in names just fine.
Safety
As @rolfl said, the script will fail if the directories contain spaces. In any case, a for
loop over the output of the find
command is really not pretty. You can get around the issue by changing the outermost for
loop to a while
loop instead:
find . -type d | while read D; do
# ...
done
And there's an even better alternative. The current program will not work for deeply nested directories. If the only purpose of the find
command is to find directories directly under $working_directory
, then a better way to write this loop:
for D in */; do
# ...
done
The */
pattern will match only directories, and this will work well even if some directories contain spaces.
And btw, don't use `cmd`
style command substitution (it's deprecated), use the modern way: $(cmd)
Formatting
Deeply nested code can be hard to read. I recommend to skip the "." directory this way instead:
find . -type d | while read D; do
[[ $D == . ]] && continue
# ...
done
That is, instead of nesting deeper under an if
, just revert the condition and do a continue
instead.
And you don't need to quote .
.
However, if you use the for D in */
tip in the previous point, you will no longer need to worry about the .
directory anymore, so you can drop the check completely.
Simplify
Instead of this:
folder_name=${D##*/}
Better to get the folder name like this:
folder_name=$(basename "$D")
Changing the working directory
Instead of this:
cd "$D" # do something ... cd ..
I would recommend this:
pushd "$D"
# do something ...
popd
The end result is the same, with the difference that you don't need to specify where to go back, it will go back to the right place. It's more robust that way.
UPDATE
As you pointed out, if you want to prevent pushd
and popd
from producing output, you have to redirect their output to /dev/null
.
pushd "$D" >/dev/null
# do something ...
popd >/dev/null
Also note that because of the cd ..
, the script will not work if there are deeply nested directories, as cd ..
may not go back enough. (This kind of problem is part of the reason why it's better to avoid changing directories in the middle of a script.) Using pushd
and popd
help in this situation, because popd
will bring you back to the directory of the last pushd
, no matter how deep you went. A similar alternative can be using cd $OLDPWD
.
Commenting
Finally, most of the comments are pointless, the code you commented already explains itself, doesn't it?
Suggested implementation
This is equivalent to your implementation but shorter and simpler:
#!/bin/bash
working_directory="$HOME/MEGAsync/downloads/books"
cd $working_directory
for D in */; do
folder_name=$(basename "$D")
set -- "$D"/*.pdf
if [ -f "1ドル" ]; then
declare -i i=0
for f in "$D"/*.pdf; do
i+=1
if [[ $i == 1 ]]; then
new_filename="$folder_name.pdf";
else
new_filename="${folder_name}_$i.pdf";
fi
cp "$f" "$new_filename"
done
rm -R "$D"
fi
done