This is a Bash script that copies files stored inside disk images to a directory, using a defined structure provided via a JSON file. I've included the external programs it requires and the test I used so that you can test it too.
Any comments regarding programming style and improvements are welcome.
Overview
The following is a Bash shell script that copies files stored inside disk images into a directory in the filesystem.
The script takes two parameters:
- The first one is optional and defines a root directory (existing or not) that will contain the files being copied.
- The second one, optional when the first one is given, is a path to a valid JSON-formatted file that describes:
- which disk images will be opened,
- which files inside each disk image will be copied, and
- which path inside the directory root will be used as the destination for the files being copied.
The first parameter defaults to the current directory when not given. The second one defaults to a file named steps.json
located at the current directory. If the first parameter is not given, the second one can't be either.
Prerequisites
This script requires the following external programs to work correctly (installation instructions for Ubuntu are between parentheses):
- The JSON parsing program
jq
(sudo apt install jq
). - The disk image manipulation utility
udisksctl
(sudo apt install udisks2
)
Script
The complete script is below. It's name is imgdisk-copy.sh and should be marked as executable. It can be in any directory where it can be executed. For the purpose of the test below, it is placed in a directory where it can read and write.
#!/bin/bash
# Copying files contained inside disk images via JSON recipe.
# Aura Lesse Programmer
# December 12th, 2018
# Is a string contained in another? Return 0 if so; 1 if not.
# By fjarlq, from https://stackoverflow.com/a/8811800/5397930
contains() {
string="1ドル"
substring="2ドル"
if test "${string#*$substring}" != "$string"; then
return 0
else
return 1
fi
}
# Obtain the absolute path of a given directory.
# By dogbane, from https://stackoverflow.com/a/3915420
abspath() {
dir="1ドル"
echo "$(cd "$(dirname "$dir")"; pwd -P)/$(basename "$dir")"
}
# The main script starts here.
# If no first parameter is given, assume current directory.
if [ -z "1ドル" ]; then
DESTROOT="."
else
# Omit any trailing slash
DESTROOT=$(abspath "${1%/}")
fi
# If no second parameter is given, assume file "steps.json".
# If no first parameter is given, this can't be either.
if [ -z "2ドル" ]; then
CONF="./steps.json"
else
CONF="2ドル"
fi
# Create the root directory where the files will the put.
mkdir -p "$DESTROOT"
# How many disks will be processed?
LIMIT=$(cat "$CONF" | jq -r length)
i=0
while [ "$i" -lt "$LIMIT" ]; do
# For each disk, get its file name.
DISK=$(cat "$CONF" | jq -r .["$i"].disk)
echo "$DISK"
# Setup a loop device for the disk and get its name.
RES=$(udisksctl loop-setup -f "$DISK")
LOOP=$(echo "$RES" | cut -f5 -d' ' | head -c -2)
# Using the loop device obtained, mount the disk.
# Obtain the mount root directory afterwards.
RES=$(udisksctl mount -b "$LOOP")
SRCDIR=$(echo "$RES" | sed -nE 's|.*at (.*)\.|1円|p')
# How many file sets will be copied?
NOITEMS=$(cat "$CONF" | jq -r ".["$i"].files | length")
j=0
while [ "$j" -lt "$NOITEMS" ]; do
# For each file set, obtain which files will be copied and where.
FSRC=$(cat "$CONF" | jq -r .["$i"].files["$j"].src)
FDEST=$(cat "$CONF" | jq -r .["$i"].files["$j"].dest)
# Make the destination directory.
mkdir -p "$DESTROOT"/"$FDEST"
echo " ""$FSRC"
if contains "$FSRC" "\*"; then
# If a wildcard is used in the file set, copy by file expansion (option -t).
pushd "$SRCDIR" > /dev/null
cp -t "$DESTROOT"/"$FDEST" $FSRC
popd > /dev/null
else
# Else, copy normally.
cp "$SRCDIR"/"$FSRC" "$DESTROOT"/"$FDEST"
fi
j=$(($j + 1))
done
# Once all the file sets are copied, unmount the disk
# and delete its associated loop device.
udisksctl unmount -b "$LOOP" > /dev/null
udisksctl loop-delete -b "$LOOP"
i=$(($i + 1))
done
Test set
This script was tested with the following disk set: Microsoft C Compiler 4.0. The first 3 .img
disks inside the ZIP (disk01.img
, disk02.img
and disk03.img
) should be placed in the same directory the script is.
The corresponding JSON recipe used for the test is below. It's named steps.json and placed in the same directory the script is for convenience.
[
{
"disk": "disk01.img",
"files": [
{ "src": "*", "dest": "bin" }
]
},
{
"disk": "disk02.img",
"files": [
{ "src": "*.EXE", "dest": "bin" }
]
},
{
"disk": "disk03.img",
"files": [
{ "src": "LINK.EXE", "dest": "bin" },
{ "src": "*.H", "dest": "include" },
{ "src": "SYS/*.H", "dest": "include/sys" },
{ "src": "SLIBC.LIB", "dest": "lib" },
{ "src": "SLIBFP.LIB", "dest": "lib" },
{ "src": "EM.LIB", "dest": "lib" },
{ "src": "LIBH.LIB", "dest": "lib" }
]
}
]
The test is performed by opening a terminal and executing the following command:
./imgdisk-copy.sh testing/
The command will output each disk image name as it is mounted, and under it the names of the files being copied (unexpanded), as follows:
disk01.img
*
disk02.img
*.EXE
disk03.img
LINK.EXE
*.H
SYS/*.H
SLIBC.LIB
SLIBFP.LIB
EM.LIB
LIBH.LIB
The result will be a directory testing
under where the script is with the following structure:
testing/
├── bin
│ ├── C1.EXE
│ ├── C2.EXE
│ ├── C3.EXE
│ ├── CL.EXE
│ ├── CV.EXE
│ ├── EXEMOD.EXE
│ ├── EXEPACK.EXE
│ ├── LIB.EXE
│ ├── LINK.EXE
│ ├── MAKE.EXE
│ ├── MSC.EXE
│ └── SETENV.EXE
├── include
│ ├── sys
│ │ ├── LOCKING.H
│ │ ├── STAT.H
│ │ ├── TIMEB.H
│ │ ├── TYPES.H
│ │ └── UTIME.H
│ ├── ASSERT.H
│ ├── CONIO.H
│ ├── CTYPE.H
│ ├── DIRECT.H
│ ├── DOS.H
│ ├── ERRNO.H
│ ├── FCNTL.H
│ ├── FLOAT.H
│ ├── IO.H
│ ├── LIMITS.H
│ ├── MALLOC.H
│ ├── MATH.H
│ ├── MEMORY.H
│ ├── PROCESS.H
│ ├── SEARCH.H
│ ├── SETJMP.H
│ ├── SHARE.H
│ ├── SIGNAL.H
│ ├── STDARG.H
│ ├── STDDEF.H
│ ├── STDIO.H
│ ├── STDLIB.H
│ ├── STRING.H
│ ├── TIME.H
│ ├── V2TOV3.H
│ └── VARARGS.H
└── lib
├── EM.LIB
├── LIBH.LIB
├── SLIBC.LIB
└── SLIBFP.LIB
3 Answers 3
Instead of cat "$x" | command
or echo "$x" | command
, use command <$x
(vs cat) or command <<<$x
(vs echo): it saves a fork and removes the need to quote.
Instead of if [ x -lt y ]
use if [[ x -lt y ]]
: it saves a fork ([[
is a bash builtin; help test
for details) and adds some functionality.
Functions return their last exit value already so contains()
can be shortened to contains() { test "${1#*2ドル}" != "1ドル"; }
Whether you prefer this is up to you.
Use bash defaulting mechanism instead of if [[ -z
, as in CONF=${2:-./steps.json}
Use for ((i=0; i<$LIMIT; i++))
instead of i=0; while ...
Test the exit values of things that shouldn't fail, as in mkdir -p "$DESTROOT" || exit 1
. Any invocation of cd
or pushd
should be checked for success, always! A general purpose DIE()
function can replace the naked exit and take an error message as an argument. If nothing should fail, set -e
or trap DIE ERR
(the first argument is a function name) does this globally.
Constructions like jq -r ".["$i"].files | length")
and echo " ""$FSRC"
are kind of weird and the inner double quotes probably should be removed.
In a language where every variable is a global, it's a good habit to use fewer variables. For example, RES=$(foo); LOOP=$( echo "$RES" | ...)
can just be LOOP=$( foo | ...)
Your get-conf pattern should be in a function like get_conf() { jq -r 1ドル<<<$CONF; }
Pruning code paths is important in an interpreted language. Since the wildcard copy method works for regular copies too, just use that one unconditionally and remove if contains ... "\*"
You don't need to escape wildcards like *
in double quotes. When in doubt about what will be interpolated, use single quotes. Quoting in bash can be very complex and take a long time to learn; an advanced understanding of it will help to avoid common bugs.
Since you are using commands that aren't standard, it's a good idea to set PATH in the script, or as an optional config directive, and to check that they're there before you begin, as in require() { for cmd in "$@"; do type $cmd >/dev/null || exit 1; done; }
followed by require jq udisksctl
Read CONF just once, into a variable: conf=$(<$CONF)
, and query that. Then you can edit the config while the script runs.
The other answer gave some really good advice; this is intended as a complementary answer with still more things to think about.
Put default arguments at the top of the script
If someone wanted to change the default arguments, they'd have to hunt through the code to find them. I typically prefer to put them at the top of the script and then only overwrite them if command line arguments are passed. For example:
#!/bin/bash
# default arguments
TARGET=./target
JSON=steps.json
# Command line args are both optional: TARGET JSON
if [[ -z "1ドル" ]] ; then
TARGET="1ドル"
fi
if [[ -z "2ドル" ]] ; then
JSON="2ドル"
fi
Use install
to copy files
DOS archives may or may not have proper permissions bits set and may need to have a complex path created before copying the file. We can manage all of this easily with install
which is also a basic part of every Linux installation:
echo "installing $src on $disk to $dst"
install -p --mode=664 -D "$TMPDIR"/$src -t "$TARGET"/$dst/
With the -p
argument we preserve the original timestamp. The mode
argument explictly sets the mode for each file (you could, of course change this to something else if you cared to). The combination of -D
and -t
tells install to create the destination directory if it doesn't already exist.
Do more with jq
Since you're already requiring a dependency on jq
, it makes sense to use its capabilities more thoroughly. As you know, it has the ability to apply one or more filters sequentially to the result of the previous step. We can use this to great advantage and only call jq
once like this:
# use jq to create disk, src, dst triplets to feed to inst
jq -r -c '.[] | {disk, file: .files[]} | {disk, src: .file.src, dst: .file.dest} | [.disk,.src,.dst] |@sh ' "$JSON" | while read line
do inst ${line}
done
As you can see from the comment, this extracts disk, src, dst triplets.
Create a function to do the work
Given the above advice, what we need is the inst
routine to actually do the work. Here's one way to write that:
# working variables
TMPDIR=
LASTDISK=
# given disk, src, dst triplet
# mount the disk in a temporary dir
# (if not already mounted)
# and install from src to dst
# src may contain wildcards
function inst () {
disk=$(eval echo 1ドル)
src=$(eval echo 2ドル)
dst=$(eval echo 3ドル)
if [[ "$disk" != "$LASTDISK" ]] ; then
cleanup
TMPDIR="$(mktemp -d)"
echo "mounting $disk on $TMPDIR"
if sudo mount -r "$disk" "$TMPDIR" ; then
LASTDISK="$disk"
else
echo "Failed to mount $disk"
sudo rmdir "$TMPDIR"
fi
fi
echo "installing $src on $disk to $dst"
install -p --mode=664 -D "$TMPDIR"/$src -t "$TARGET"/$dst/
}
Notice that I've used a number of bash
-isms here that make this non-portable, but since you've explicitly called out bash
, I'm assuming this is OK. I've also chosen to use sudo mount
and sudo umount
instead of udiskctl
. Either could work, of course; it's a matter of preference as to which is used. On one hand, mount
is always available but on the other, it requires sudo
privileges. Most of this will be self-explanatory, except for cleanup
which is described in the next suggestion.
Use a cleanup function
It's annoying when a script fails for some reason and then leaves temporary files or other junk lying around as a result. One technique that's handy for this is to use bash
's TRAP
feature.
# un mount and remove bind dir TMPDIR if
# TMPDIR is not empty
function cleanup {
if [[ ! -z "$TMPDIR" ]] ; then
sudo umount "$TMPDIR"
sudo rm -rf "$TMPDIR"
fi
}
# rest of script ...
trap cleanup EXIT
This tells bash
that no matter how we get to the exit (either normally or via some fatal error) it needs to invoke the specified function, which I typically name cleanup
for obvious reasons.
Consider plain POSIX shell
We're not using any of Bash's features other than pushd
/popd
(which I'll comment on later), so we can reduce overhead by using /bin/sh
as interpreter. That makes it more portable as well as more efficient.
Naming
Convention says that shell variables are normally all lower-case, with the exception of those that are exported into the environment to affect the behaviour of sub-processes.
Quoting
Here, we're using $substring
as a wildcard pattern:
if test "${string#*$substring}" != "$string"; then return 0 else return 1 fi
But we want it to match literally (i.e. if we have *
or ?
in the substring, it should only match *
or ?
in the string), so we need to write "${string#*"$substring"}"
.
Also, if $command; then return 0; else return 1; fi
is an antipattern. Shell functions return the status of the last command executed, so we can pass that on:
contains() {
string="1ドル"
substring="2ドル"
test "${string#*"$substring"}" != "$string"
}
We can then use it more naturally:
if contains "$FSRC" '*' # a wildcard is used in the file set
then
We have $i
wrongly outside of quotes here:
NOITEMS=$(cat "$CONF" | jq -r ".["$i"].files | length")
I think that was intended to be
jq -r ".[\"$i\"].files | length"
Or better, (avoiding problems when $i
contains jq syntax):
jq --arg i "$i" -r '.[$i].files | length'
Arithmetic
j=$(($j + 1))
i=$(($i + 1))
Within arithmetic expansion, variables can be expanded without writing $
:
j=$((j + 1))
i=$((i + 1))
We might even use ++
- e.g. : $((++i))
Useless cat
Shellcheck reports these that are easily fixed:
LIMIT=$(cat "$CONF" | jq -r length) DISK=$(cat "$CONF" | jq -r .["$i"].disk) NOITEMS=$(cat "$CONF" | jq --arg i "$i" -r '.[$i].files | length') FSRC=$(cat "$CONF" | jq -r .["$i"].files["$j"].src) FDEST=$(cat "$CONF" | jq -r .["$i"].files["$j"].dest)
They become simply:
limit=$(<"$conf" jq -r length)
disk=$(<"$conf" jq --arg i "$i" -r '.[$i].disk')
noitems=$(<"$conf" jq --arg i "$i" -r '.[$i].files | length')
fsrc=$(<"$conf" jq --arg i "$i" --arg j "$j" -r '.[$i].files[$j].src')
fdest=$(<"$conf" jq --arg i "$i" --arg j "$j" -r '.[$i].files[$j].dest')
Also, we have places where we capture output into a variable and then use echo
to send that to another command:
res=$(udisksctl loop-setup -f "$disk") loop=$(echo "$res" cut -f5 -d' ' | head -c -2)
There's no need to wait for one command to finish before starting the next, so just pipeline them:
loop=$(udisksctl loop-setup -f "$disk" | cut -f5 -d' ' | head -c -2)
Working directory
Always think about what happens when changing working directory fails. Usually when this happens, we want to abort, but sometimes there's a possible recovery action. Consider this line:
echo "$(cd "$(dirname "$dir")"; pwd -P)/$(basename "$dir")"
If cd
fails here, we get completely the wrong result. We need to test its result before using pwd
:
(cd "$(dirname "$dir")" || exit; echo "$(pwd -P)/$(basename "$dir")")
That said, if your target has realpath
, you could use that instead of this function.
The other place we change directory is here:
pushd "$SRCDIR" > /dev/null cp -t "$DESTROOT"/"$FDEST" $FSRC popd > /dev/null
The directory stack functions are better suited to interactive use (which is why you had to redirect to null), and they can fail just like cd
. Better to use cd
in a subshell:
(cd "$srcdir" && cp -t "$destroot"/"$FDEST" $FSRC)
Error handling
There are many actions in this script that can fail (for a number of different reasons), but we blindly assume that everything succeeds and make no effort to stop processing and return an appropriate status. A lot more thought needs to be put into dealing with errors, and into ensuring (with trap
) that the cleanup actions are always done, even when we exit early due to error.