Filetype backup script

Question 1

This was one of my assignments in which I had to create a backup script that will individually compress all files (of an arbitrary number) of file-types (indicated by their .extension).

Example: backup a b c will take all files ending in .a, .b and .c and compress them using tar, keeping the original file, and naming the new compressed file the same as before except with .tar.gz at the end.

The -t flag can be used to specify the target directory.

The -d flag can be used to specify the destination directory.

Example: backup -t ~/Desktop/stuff -d goesHere gif where goesHere is a directory in your current path and all files ending with .gif come from ~/Desktop/stuff.

$ ls
. .. backup cat.gif dog.gif funny_video.mov media
$ ./backup -d media gif mov
backup: compressed .gif files
backup: compressed .mov files
backup: files saved to "media"
$ cd media
$ ls
. .. cat.gif.tar.gz dog.gif.tar.gz funny_video.mov.tar.gz
$

Both flags are optional, allowing you to specify either one or both. If you specify both flags you must use the order -t <directory> -d <directory>.

Granted I really don't like the specifications of the assignment, but that is what they are. The fact we're asked to compress all files by their .extension is kind of anti-*nix anyways (everyone knows cat.png could be a plain text file).

#!/bin/bash
list_begin=1 # where the filetype arguments begins
list_end=$ # where the filetype arguments end
target_directory=$(pwd)
destination_directory=$(pwd)
if [[ $# < 1 ]]; then
 echo "backup: not enough arguments"
 exit
fi
# determine if any flags are used and if so update directories and $list_begin
if [[ 1ドル == "-t" ]]; then # -t
 target_directory=2ドル
 list_begin=3
 if [[ 3ドル == "-d" ]]; then # -t, -td
 destination_directory=4ドル
 list_begin=5
 fi
elif [[ 1ドル == "-d" ]]; then # -d
 destination_directory=2ドル
 list_begin=3
fi
# confirm directories are real
if [[ ! -d "${target_directory}" ]]; then # invalid -t
 echo "backup: \"${target_directory}\" is not valid"
 exit
elif [[ ! -d "${destination_directory}" ]]; then # invalid -d
 echo "backup: \"${destination_directory}\" is not valid"
 exit
fi
declare -a filetypes # array to store filetypes
for ((i=${list_begin}, j=0; i<=${list_end}; i++, j++)); do # fetch all filetype arguments and add to array
 eval dir=\${$i}
 filetypes[${j}]=${dir}
done
for i in "${filetypes[@]}"; do # for each filetype
 f_list="${target_directory}/*.${i}" # directory to iterate through
 for file in ${f_list}; do # for each file of that filetype in the target directory
 if [[ -f "${file}" ]]; then
 tar -czf ${destination_directory}/$(basename "$file").tar.gz -P ${file} # compress the file in the destination directory
 fi
 done
 echo "backup: finished .${i} files"
done
echo "backup: files saved to \"${destination_directory}\""

Question 2

You have to archive them with tar, but only one file per archive? I'm not surprised you're unimpressed with the requirements!

Question 3

@TobySpeight Yep. To be fair we could use any compression tool but I just picked tar.

Question 4

tar isn't a compression tool. Yes, it has options to pass the result through gzip or bzip2 or xz etc., but that's just a bonus of GNU tar, so if compression's all you want, it's simpler to stick to the tool-for-the-job, and you'll get simpler code that way. I'll look at adapting my answer in the knowledge that tar isn't part of the requirement.

Question 5

As you say, the requirements are, let's say, unconventional. But in some ways, that's good preparation for crazy requirements you'll get in working life! Except that as a student, you might not get to ask the "five whys" to determine whether the requirements are actually grounded, and to get them changed.

There are still some clarifications of the requirements:

Should only regular files be archived, or should we include directories and devices, too?
Should the archive contain full pathnames of the contained files, or relative names, or just basenames? If only basenames, then how should we deal with conflicts?
Should the destination directory be created if it doesn't already exist?

As this is a homework assignment, you may not be in a position to ask these questions - in that case, you should assume a position and state it clearly in your answer. If possible, localise the effects of a wrong assumption, to make it easy to change when you get more information (again, that's a good reflection of working life).

Working through the code from the beginning:

list_begin=1 # where the filetype arguments begins
list_end=$ # where the filetype arguments end
target_directory=$(pwd)
destination_directory=$(pwd)

It looks like you can save two invocations of pwd by using . as the default directory name - you never need an absolute path.

A more common approach is to remove option arguments using shift as they are parsed; then all of $* are your inputs. I'm assuming $ there is a typo for $#?

Here's the outline:

while test -n "${1+:}"
do
 case "1ドル" in
 -t|--target)
 target="2ドル"
 shift 2
 ;;
 -d|--destination)
 destination="2ドル"
 shift 2
 ;;
 *)
 process_file "1ドル"
 shift
 ;;
 esac
done

I've included long-name alternatives just to show how it's done, not because that's a requirement. This approach allows the user to specify -t and -d more than once, to change those settings during a run. It's not clear whether you are required to enforce the ordering constraints you were given, or if that's just limiting the problem space to make it easier. If the former, you can just process them in turn:

if [ "-t" = "1ドル" ]
then
 target="2ドル"
 shift 2
fi
if [ "-d" = "1ドル" ]
then
 destination="2ドル"
 shift 2
fi

and so on.

I've also fixed the quoting above - you'll find that if you miss it out, then people will want at target or destination with whitespace in the name!

# confirm directories are real
if [[ ! -d "${target_directory}" ]]; then # invalid -t
 echo "backup: \"${target_directory}\" is not valid"
 exit
elif [[ ! -d "${destination_directory}" ]]; then # invalid -d
 echo "backup: \"${destination_directory}\" is not valid"
 exit
fi

You don't need to use Bash [[ here - it's more portable to stick to plain old [ (aka test). You'll want your error messages to go to the error stream (>&2), and you'll want to exit with a non-zero value, so that anything executing your program will know it didn't succeed. I like to define a small function to do that:

die() {
 echo "$@" >&2
 exit 1
}

Then I can write:

test -d "$target_directory" || die "0ドル: $target_directory is not a directory"
test -r "$target_directory" && test -x "$target_directory" || die "0ドル: $target_directory cannot be read"
test -d "$destination_directory" || die "0ドル: $destination_directory is not a directory"
test -w "$destination_directory" || die "0ドル: $destination_directory is not writable"

declare -a filetypes # array to store filetypes
for ((i=${list_begin}, j=0; i<=${list_end}; i++, j++)); do # fetch all filetype arguments and add to array
 eval dir=\${$i}
 filetypes[${j}]=${dir}
done

I don't think it's necessary to do all this copying. You can simply process each argument as you come to it.

for i in "${filetypes[@]}"; do # for each filetype
 f_list="${target_directory}/*.${i}" # directory to iterate through

You shouldn't be quoting the * there, because you want the shell to expand it. If you expand it later, you may well end up splitting $i.

My version

#!/bin/sh
set -e
die() {
 echo "$@" >&2
 exit 1
}
process_suffix() {
 for file in "${source-.}"/*."1ドル"
 do
 # Warning: may overwrite existing archive
 test -e "$file" \
 && tar cfz "${destination:-.}/${file##*/}.tar.gz" -C "${source-.}" -- "${file#${source-.}/}"
 done
 echo "0ドル: finished *.1ドル files"
}
while test -n "${1+:}"
do
 case "1ドル" in
 -t|--source)
 test -d "2ドル" || die "0ドル: 2ドル is not a directory"
 test -r "2ドル" && test -x "2ドル" || die "0ドル: 2ドル cannot be read"
 source="2ドル"
 shift 2
 ;;
 -d|--destination)
 test -d "2ドル" || die "0ドル: 2ドル is not a directory"
 test -w "2ドル" || die "0ドル: 2ドル is not writable"
 destination="2ドル"
 shift 2
 ;;
 *)
 process_suffix "1ドル"
 shift
 ;;
 esac
done
echo "0ドル: files saved to $destination"

Additional notes:

Everything is now POSIX shell, so we can use /bin/sh which may be faster, more available, etc (bash, ksh, dash etc. are all sufficiently POSIX-compliant).
I've given -- to tar to indicate that subsequent arguments are filenames, even if they begin with -.
I default uninitialised source and destination directories - see the Parameter Expansion section of the shell man page.
Archive filenames are all relative to the source directory.
I use "0ドル" in messages to show the actual name the program was invoked as.
Test cases should include file and directory names containing spaces or beginning with - (that's a good general rule for testing shell scripts).
Remember that (by default) a non-matching glob is treated literally. So if there are no matching files, we'll want to ignore *.suffix when archiving (but still archive a file that really does have * in the name).

Question 6

Clarifications of the requirements: Yes, not specified, no.

Question 7

"I'm assuming $ there is a typo for $#?" Yes, whoops.

Question 8

And the assignment wanted us to use bash strictly. Good feedback, thanks!

Question 9

In case it's not clear - Bash will execute this standard shell script; other shells will also execute it (which wasn't the case with the original /bin/bash script in the question).

Toby Speight 89k14 gold badges104 silver badges327 bronze badges · Accepted Answer · 2016-10-11 09:10:31Z

As you say, the requirements are, let's say, unconventional. But in some ways, that's good preparation for crazy requirements you'll get in working life! Except that as a student, you might not get to ask the "five whys" to determine whether the requirements are actually grounded, and to get them changed.

There are still some clarifications of the requirements:

Should only regular files be archived, or should we include directories and devices, too?
Should the archive contain full pathnames of the contained files, or relative names, or just basenames? If only basenames, then how should we deal with conflicts?
Should the destination directory be created if it doesn't already exist?

As this is a homework assignment, you may not be in a position to ask these questions - in that case, you should assume a position and state it clearly in your answer. If possible, localise the effects of a wrong assumption, to make it easy to change when you get more information (again, that's a good reflection of working life).

Working through the code from the beginning:

list_begin=1 # where the filetype arguments begins
list_end=$ # where the filetype arguments end
target_directory=$(pwd)
destination_directory=$(pwd)

It looks like you can save two invocations of pwd by using . as the default directory name - you never need an absolute path.

A more common approach is to remove option arguments using shift as they are parsed; then all of $* are your inputs. I'm assuming $ there is a typo for $#?

Here's the outline:

while test -n "${1+:}"
do
 case "1ドル" in
 -t|--target)
 target="2ドル"
 shift 2
 ;;
 -d|--destination)
 destination="2ドル"
 shift 2
 ;;
 *)
 process_file "1ドル"
 shift
 ;;
 esac
done

I've included long-name alternatives just to show how it's done, not because that's a requirement. This approach allows the user to specify -t and -d more than once, to change those settings during a run. It's not clear whether you are required to enforce the ordering constraints you were given, or if that's just limiting the problem space to make it easier. If the former, you can just process them in turn:

if [ "-t" = "1ドル" ]
then
 target="2ドル"
 shift 2
fi
if [ "-d" = "1ドル" ]
then
 destination="2ドル"
 shift 2
fi

and so on.

I've also fixed the quoting above - you'll find that if you miss it out, then people will want at target or destination with whitespace in the name!

# confirm directories are real
if [[ ! -d "${target_directory}" ]]; then # invalid -t
 echo "backup: \"${target_directory}\" is not valid"
 exit
elif [[ ! -d "${destination_directory}" ]]; then # invalid -d
 echo "backup: \"${destination_directory}\" is not valid"
 exit
fi

You don't need to use Bash [[ here - it's more portable to stick to plain old [ (aka test). You'll want your error messages to go to the error stream (>&2), and you'll want to exit with a non-zero value, so that anything executing your program will know it didn't succeed. I like to define a small function to do that:

die() {
 echo "$@" >&2
 exit 1
}

Then I can write:

test -d "$target_directory" || die "0ドル: $target_directory is not a directory"
test -r "$target_directory" && test -x "$target_directory" || die "0ドル: $target_directory cannot be read"
test -d "$destination_directory" || die "0ドル: $destination_directory is not a directory"
test -w "$destination_directory" || die "0ドル: $destination_directory is not writable"

declare -a filetypes # array to store filetypes
for ((i=${list_begin}, j=0; i<=${list_end}; i++, j++)); do # fetch all filetype arguments and add to array
 eval dir=\${$i}
 filetypes[${j}]=${dir}
done

I don't think it's necessary to do all this copying. You can simply process each argument as you come to it.

for i in "${filetypes[@]}"; do # for each filetype
 f_list="${target_directory}/*.${i}" # directory to iterate through

You shouldn't be quoting the * there, because you want the shell to expand it. If you expand it later, you may well end up splitting $i.

My version

#!/bin/sh
set -e
die() {
 echo "$@" >&2
 exit 1
}
process_suffix() {
 for file in "${source-.}"/*."1ドル"
 do
 # Warning: may overwrite existing archive
 test -e "$file" \
 && tar cfz "${destination:-.}/${file##*/}.tar.gz" -C "${source-.}" -- "${file#${source-.}/}"
 done
 echo "0ドル: finished *.1ドル files"
}
while test -n "${1+:}"
do
 case "1ドル" in
 -t|--source)
 test -d "2ドル" || die "0ドル: 2ドル is not a directory"
 test -r "2ドル" && test -x "2ドル" || die "0ドル: 2ドル cannot be read"
 source="2ドル"
 shift 2
 ;;
 -d|--destination)
 test -d "2ドル" || die "0ドル: 2ドル is not a directory"
 test -w "2ドル" || die "0ドル: 2ドル is not writable"
 destination="2ドル"
 shift 2
 ;;
 *)
 process_suffix "1ドル"
 shift
 ;;
 esac
done
echo "0ドル: files saved to $destination"

Additional notes:

Everything is now POSIX shell, so we can use /bin/sh which may be faster, more available, etc (bash, ksh, dash etc. are all sufficiently POSIX-compliant).
I've given -- to tar to indicate that subsequent arguments are filenames, even if they begin with -.
I default uninitialised source and destination directories - see the Parameter Expansion section of the shell man page.
Archive filenames are all relative to the source directory.
I use "0ドル" in messages to show the actual name the program was invoked as.
Test cases should include file and directory names containing spaces or beginning with - (that's a good general rule for testing shell scripts).
Remember that (by default) a non-matching glob is treated literally. So if there are no matching files, we'll want to ignore *.suffix when archiving (but still archive a file that really does have * in the name).

And the assignment wanted us to use bash strictly. Good feedback, thanks!
In case it's not clear - Bash will execute this standard shell script; other shells will also execute it (which wasn't the case with the original /bin/bash script in the question).

Stack Exchange Network

Filetype backup script

1 Answer 1

My version

You must log in to answer this question.

Hot Network Questions

Filetype backup script

1 Answer 1

My version

You must log in to answer this question.

Related

Hot Network Questions