File handling script

Question 1

This is my first bash script so I ask to you to say what you think.

The script must check every directory and if it finds a pdf file in this directory: move all the pdf files to parent directory, change the name of the files and remove directory.

So if I had this folder structure:

folder1
 file.pdf
 file.txt
folder2
 some.pdf
 another.pdf
folder3
 some.txt

the script must change it to:

 folder1.pdf
 folder2.pdf
 folder2_2.pdf
 folder3
 some.txt

Here is a script code:

#!/bin/bash
working_directory="$HOME/MEGAsync/downloads/books"
cd $working_directory
# for each directory in working directory
for D in `find . -type d`
do
 # skip '.'
 if [[ $D != "." ]]; then
 # get folder name 
 folder_name=${D##*/}
 # and cd to folder
 cd $D
 # number of pdf files in the current directory
 declare -i pdf_count=0
 # for each file in the current directory
 for f in *; do
 # get file without path
 file=$(basename "$f")
 extension="${file##*.}"
 filename="${file%.*}"
 if [[ $extension == 'pdf' ]]; then
 pdf_count=pdf_count+1
 if [[ pdf_count < 2 ]]; then
 new_filename="${working_directory}/${folder_name}.pdf";
 else
 new_filename="${working_directory}/${folder_name}_${pdf_count}.pdf";
 fi
 echo cp $working_directory/$folder_name/$f $new_filename
 cp $working_directory/$folder_name/$f $new_filename
 fi
 done
 # return to parent folder
 cd ..
 # delete directory if pdf-file was found
 if [[ $pdf_count != "0" ]]; then
 echo rm -R $working_directory/$folder_name
 rm -R $working_directory/$folder_name
 fi
 fi
done

Question 2

For a first script, this is great, and I am impressed.

your structure is neat
you are using loops properly
you are using variable substitution which is 'advanced'

The script appears that it will work for all the standard cases, and, I have seen much worse in 'production' code.

So, as a beginner, well done.

What's next?

the edge cases are going to kill this script - spaces in file names.
the logging and error handling could be better
I recommend basename and dirname instead of the variable regexes you use:
- dirname: http://linux.die.net/man/1/dirname
- basename: http://linux.die.net/man/1/basename (note that basename path/to/file.pdf .pdf returns file)
consistency on for->do loop - you have do on the next line once, and then for ... ; do the next time. Either is fine, but be consistent.

All of the error-handling, logging, and the spaces-in-files can be solved easily by using a function for running commands. Also, the "$@" construct in bash is 'special'

Consider the bash function:

function runcmd {
 echo "$@"
 # Fail if the command fails...
 "$@" || exit 1
}

So, this function will print the command, and then will run it, and exit the script if the command fails.

Now, instead of structures like:

 echo cp $working_directory/$folder_name/$f $new_filename
 cp $working_directory/$folder_name/$f $new_filename

you can do:

 runcmd cp "$working_directory/$folder_name/$f" $new_filename

and you will still get the echo, you will get the new error handling, and you will handle spaces in names just fine.

Question 3

Safety

As @rolfl said, the script will fail if the directories contain spaces. In any case, a for loop over the output of the find command is really not pretty. You can get around the issue by changing the outermost for loop to a while loop instead:

find . -type d | while read D; do
 # ...
done

And there's an even better alternative. The current program will not work for deeply nested directories. If the only purpose of the find command is to find directories directly under $working_directory, then a better way to write this loop:

for D in */; do
 # ...
done

The */ pattern will match only directories, and this will work well even if some directories contain spaces.

And btw, don't use `cmd` style command substitution (it's deprecated), use the modern way: $(cmd)

Formatting

Deeply nested code can be hard to read. I recommend to skip the "." directory this way instead:

find . -type d | while read D; do
 [[ $D == . ]] && continue
 # ...
done

That is, instead of nesting deeper under an if, just revert the condition and do a continue instead.

And you don't need to quote ..

However, if you use the for D in */ tip in the previous point, you will no longer need to worry about the . directory anymore, so you can drop the check completely.

Simplify

Instead of this:

folder_name=${D##*/}

Better to get the folder name like this:

folder_name=$(basename "$D")

Changing the working directory

Instead of this:

cd "$D"
# do something ...
cd ..

I would recommend this:

pushd "$D"
# do something ...
popd

The end result is the same, with the difference that you don't need to specify where to go back, it will go back to the right place. It's more robust that way.

UPDATE

As you pointed out, if you want to prevent pushd and popd from producing output, you have to redirect their output to /dev/null.

pushd "$D" >/dev/null
# do something ...
popd >/dev/null

Also note that because of the cd .., the script will not work if there are deeply nested directories, as cd .. may not go back enough. (This kind of problem is part of the reason why it's better to avoid changing directories in the middle of a script.) Using pushd and popd help in this situation, because popd will bring you back to the directory of the last pushd, no matter how deep you went. A similar alternative can be using cd $OLDPWD.

Commenting

Finally, most of the comments are pointless, the code you commented already explains itself, doesn't it?

Suggested implementation

This is equivalent to your implementation but shorter and simpler:

#!/bin/bash
working_directory="$HOME/MEGAsync/downloads/books"
cd $working_directory
for D in */; do
 folder_name=$(basename "$D")
 set -- "$D"/*.pdf
 if [ -f "1ドル" ]; then
 declare -i i=0
 for f in "$D"/*.pdf; do
 i+=1
 if [[ $i == 1 ]]; then
 new_filename="$folder_name.pdf";
 else
 new_filename="${folder_name}_$i.pdf";
 fi
 cp "$f" "$new_filename"
 done
 rm -R "$D"
 fi
done

rolfl rolfl 98.1k17 gold badges219 silver badges419 bronze badges · Accepted Answer · 2014-07-26 15:22:26Z

For a first script, this is great, and I am impressed.

your structure is neat
you are using loops properly
you are using variable substitution which is 'advanced'

The script appears that it will work for all the standard cases, and, I have seen much worse in 'production' code.

So, as a beginner, well done.

What's next?

the edge cases are going to kill this script - spaces in file names.
the logging and error handling could be better
I recommend basename and dirname instead of the variable regexes you use:
- dirname: http://linux.die.net/man/1/dirname
- basename: http://linux.die.net/man/1/basename (note that basename path/to/file.pdf .pdf returns file)
consistency on for->do loop - you have do on the next line once, and then for ... ; do the next time. Either is fine, but be consistent.

All of the error-handling, logging, and the spaces-in-files can be solved easily by using a function for running commands. Also, the "$@" construct in bash is 'special'

Consider the bash function:

function runcmd {
 echo "$@"
 # Fail if the command fails...
 "$@" || exit 1
}

So, this function will print the command, and then will run it, and exit the script if the command fails.

Now, instead of structures like:

 echo cp $working_directory/$folder_name/$f $new_filename
 cp $working_directory/$folder_name/$f $new_filename

you can do:

 runcmd cp "$working_directory/$folder_name/$f" $new_filename

and you will still get the echo, you will get the new error handling, and you will handle spaces in names just fine.

Stack Exchange Network

File handling script

2 Answers 2

Safety

Formatting

Simplify

Changing the working directory

Commenting

Suggested implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

File handling script

2 Answers 2

Safety

Formatting

Simplify

Changing the working directory

Commenting

Suggested implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions