Serial copying from disk images to folder in Bash

Question 1

This is a Bash script that copies files stored inside disk images to a directory, using a defined structure provided via a JSON file. I've included the external programs it requires and the test I used so that you can test it too.

Any comments regarding programming style and improvements are welcome.

Overview

The following is a Bash shell script that copies files stored inside disk images into a directory in the filesystem.

The script takes two parameters:

The first one is optional and defines a root directory (existing or not) that will contain the files being copied.
The second one, optional when the first one is given, is a path to a valid JSON-formatted file that describes:
1. which disk images will be opened,
2. which files inside each disk image will be copied, and
3. which path inside the directory root will be used as the destination for the files being copied.

The first parameter defaults to the current directory when not given. The second one defaults to a file named steps.json located at the current directory. If the first parameter is not given, the second one can't be either.

Prerequisites

This script requires the following external programs to work correctly (installation instructions for Ubuntu are between parentheses):

The JSON parsing program jq (sudo apt install jq).
The disk image manipulation utility udisksctl (sudo apt install udisks2)

Script

The complete script is below. It's name is imgdisk-copy.sh and should be marked as executable. It can be in any directory where it can be executed. For the purpose of the test below, it is placed in a directory where it can read and write.

#!/bin/bash
# Copying files contained inside disk images via JSON recipe.
# Aura Lesse Programmer
# December 12th, 2018
# Is a string contained in another? Return 0 if so; 1 if not.
# By fjarlq, from https://stackoverflow.com/a/8811800/5397930
contains() {
 string="1ドル"
 substring="2ドル"
 if test "${string#*$substring}" != "$string"; then
 return 0
 else
 return 1
 fi
}
# Obtain the absolute path of a given directory.
# By dogbane, from https://stackoverflow.com/a/3915420
abspath() {
 dir="1ドル"
 echo "$(cd "$(dirname "$dir")"; pwd -P)/$(basename "$dir")"
}
# The main script starts here.
# If no first parameter is given, assume current directory.
if [ -z "1ドル" ]; then
 DESTROOT="."
else
 # Omit any trailing slash
 DESTROOT=$(abspath "${1%/}")
fi
# If no second parameter is given, assume file "steps.json".
# If no first parameter is given, this can't be either.
if [ -z "2ドル" ]; then
 CONF="./steps.json"
else
 CONF="2ドル"
fi
# Create the root directory where the files will the put.
mkdir -p "$DESTROOT"
# How many disks will be processed?
LIMIT=$(cat "$CONF" | jq -r length)
i=0
while [ "$i" -lt "$LIMIT" ]; do
 # For each disk, get its file name.
 DISK=$(cat "$CONF" | jq -r .["$i"].disk)
 echo "$DISK"
 # Setup a loop device for the disk and get its name.
 RES=$(udisksctl loop-setup -f "$DISK")
 LOOP=$(echo "$RES" | cut -f5 -d' ' | head -c -2)
 # Using the loop device obtained, mount the disk.
 # Obtain the mount root directory afterwards.
 RES=$(udisksctl mount -b "$LOOP")
 SRCDIR=$(echo "$RES" | sed -nE 's|.*at (.*)\.|1円|p')
 # How many file sets will be copied?
 NOITEMS=$(cat "$CONF" | jq -r ".["$i"].files | length")
 j=0
 while [ "$j" -lt "$NOITEMS" ]; do
 # For each file set, obtain which files will be copied and where.
 FSRC=$(cat "$CONF" | jq -r .["$i"].files["$j"].src)
 FDEST=$(cat "$CONF" | jq -r .["$i"].files["$j"].dest)
 # Make the destination directory.
 mkdir -p "$DESTROOT"/"$FDEST"
 echo " ""$FSRC"
 if contains "$FSRC" "\*"; then
 # If a wildcard is used in the file set, copy by file expansion (option -t).
 pushd "$SRCDIR" > /dev/null
 cp -t "$DESTROOT"/"$FDEST" $FSRC
 popd > /dev/null
 else
 # Else, copy normally.
 cp "$SRCDIR"/"$FSRC" "$DESTROOT"/"$FDEST"
 fi
 j=$(($j + 1))
 done
 # Once all the file sets are copied, unmount the disk
 # and delete its associated loop device.
 udisksctl unmount -b "$LOOP" > /dev/null
 udisksctl loop-delete -b "$LOOP"
 i=$(($i + 1))
done

Test set

This script was tested with the following disk set: Microsoft C Compiler 4.0. The first 3 .img disks inside the ZIP (disk01.img, disk02.img and disk03.img) should be placed in the same directory the script is.

The corresponding JSON recipe used for the test is below. It's named steps.json and placed in the same directory the script is for convenience.

[
 {
 "disk": "disk01.img",
 "files": [
 { "src": "*", "dest": "bin" }
 ]
 },
 {
 "disk": "disk02.img",
 "files": [
 { "src": "*.EXE", "dest": "bin" }
 ]
 },
 {
 "disk": "disk03.img",
 "files": [
 { "src": "LINK.EXE", "dest": "bin" },
 { "src": "*.H", "dest": "include" },
 { "src": "SYS/*.H", "dest": "include/sys" },
 { "src": "SLIBC.LIB", "dest": "lib" },
 { "src": "SLIBFP.LIB", "dest": "lib" },
 { "src": "EM.LIB", "dest": "lib" },
 { "src": "LIBH.LIB", "dest": "lib" }
 ]
 }
]

The test is performed by opening a terminal and executing the following command:

./imgdisk-copy.sh testing/

The command will output each disk image name as it is mounted, and under it the names of the files being copied (unexpanded), as follows:

disk01.img
 *
disk02.img
 *.EXE
disk03.img
 LINK.EXE
 *.H
 SYS/*.H
 SLIBC.LIB
 SLIBFP.LIB
 EM.LIB
 LIBH.LIB

The result will be a directory testing under where the script is with the following structure:

testing/
├── bin
│  ├── C1.EXE
│  ├── C2.EXE
│  ├── C3.EXE
│  ├── CL.EXE
│  ├── CV.EXE
│  ├── EXEMOD.EXE
│  ├── EXEPACK.EXE
│  ├── LIB.EXE
│  ├── LINK.EXE
│  ├── MAKE.EXE
│  ├── MSC.EXE
│  └── SETENV.EXE
├── include
│  ├── sys
│  │  ├── LOCKING.H
│  │  ├── STAT.H
│  │  ├── TIMEB.H
│  │  ├── TYPES.H
│  │  └── UTIME.H
│  ├── ASSERT.H
│  ├── CONIO.H
│  ├── CTYPE.H
│  ├── DIRECT.H
│  ├── DOS.H
│  ├── ERRNO.H
│  ├── FCNTL.H
│  ├── FLOAT.H
│  ├── IO.H
│  ├── LIMITS.H
│  ├── MALLOC.H
│  ├── MATH.H
│  ├── MEMORY.H
│  ├── PROCESS.H
│  ├── SEARCH.H
│  ├── SETJMP.H
│  ├── SHARE.H
│  ├── SIGNAL.H
│  ├── STDARG.H
│  ├── STDDEF.H
│  ├── STDIO.H
│  ├── STDLIB.H
│  ├── STRING.H
│  ├── TIME.H
│  ├── V2TOV3.H
│  └── VARARGS.H
└── lib
 ├── EM.LIB
 ├── LIBH.LIB
 ├── SLIBC.LIB
 └── SLIBFP.LIB

Question 2

Instead of cat "$x" | command or echo "$x" | command, use command <$x (vs cat) or command <<<$x (vs echo): it saves a fork and removes the need to quote.

Instead of if [ x -lt y ] use if [[ x -lt y ]]: it saves a fork ([[ is a bash builtin; help test for details) and adds some functionality.

Functions return their last exit value already so contains() can be shortened to contains() { test "${1#*2ドル}" != "1ドル"; } Whether you prefer this is up to you.

Use bash defaulting mechanism instead of if [[ -z, as in CONF=${2:-./steps.json}

Use for ((i=0; i<$LIMIT; i++)) instead of i=0; while ...

Test the exit values of things that shouldn't fail, as in mkdir -p "$DESTROOT" || exit 1. Any invocation of cd or pushd should be checked for success, always! A general purpose DIE() function can replace the naked exit and take an error message as an argument. If nothing should fail, set -e or trap DIE ERR (the first argument is a function name) does this globally.

Constructions like jq -r ".["$i"].files | length") and echo " ""$FSRC" are kind of weird and the inner double quotes probably should be removed.

In a language where every variable is a global, it's a good habit to use fewer variables. For example, RES=$(foo); LOOP=$( echo "$RES" | ...) can just be LOOP=$( foo | ...)

Your get-conf pattern should be in a function like get_conf() { jq -r 1ドル<<<$CONF; }

Pruning code paths is important in an interpreted language. Since the wildcard copy method works for regular copies too, just use that one unconditionally and remove if contains ... "\*"

You don't need to escape wildcards like * in double quotes. When in doubt about what will be interpolated, use single quotes. Quoting in bash can be very complex and take a long time to learn; an advanced understanding of it will help to avoid common bugs.

Since you are using commands that aren't standard, it's a good idea to set PATH in the script, or as an optional config directive, and to check that they're there before you begin, as in require() { for cmd in "$@"; do type $cmd >/dev/null || exit 1; done; } followed by require jq udisksctl

Read CONF just once, into a variable: conf=$(<$CONF), and query that. Then you can edit the config while the script runs.

Question 3

The other answer gave some really good advice; this is intended as a complementary answer with still more things to think about.

Put default arguments at the top of the script

If someone wanted to change the default arguments, they'd have to hunt through the code to find them. I typically prefer to put them at the top of the script and then only overwrite them if command line arguments are passed. For example:

#!/bin/bash
# default arguments
TARGET=./target 
JSON=steps.json
# Command line args are both optional: TARGET JSON
if [[ -z "1ドル" ]] ; then
 TARGET="1ドル"
fi
if [[ -z "2ドル" ]] ; then
 JSON="2ドル"
fi

Use `install` to copy files

DOS archives may or may not have proper permissions bits set and may need to have a complex path created before copying the file. We can manage all of this easily with install which is also a basic part of every Linux installation:

echo "installing $src on $disk to $dst"
install -p --mode=664 -D "$TMPDIR"/$src -t "$TARGET"/$dst/

With the -p argument we preserve the original timestamp. The mode argument explictly sets the mode for each file (you could, of course change this to something else if you cared to). The combination of -D and -t tells install to create the destination directory if it doesn't already exist.

Do more with `jq`

Since you're already requiring a dependency on jq, it makes sense to use its capabilities more thoroughly. As you know, it has the ability to apply one or more filters sequentially to the result of the previous step. We can use this to great advantage and only call jq once like this:

# use jq to create disk, src, dst triplets to feed to inst
jq -r -c '.[] | {disk, file: .files[]} | {disk, src: .file.src, dst: .file.dest} | [.disk,.src,.dst] |@sh ' "$JSON" | while read line 
 do inst ${line}
done

As you can see from the comment, this extracts disk, src, dst triplets.

Create a function to do the work

Given the above advice, what we need is the inst routine to actually do the work. Here's one way to write that:

# working variables
TMPDIR=
LASTDISK=
# given disk, src, dst triplet
# mount the disk in a temporary dir
# (if not already mounted)
# and install from src to dst
# src may contain wildcards
function inst () {
 disk=$(eval echo 1ドル)
 src=$(eval echo 2ドル)
 dst=$(eval echo 3ドル)
 if [[ "$disk" != "$LASTDISK" ]] ; then 
 cleanup
 TMPDIR="$(mktemp -d)"
 echo "mounting $disk on $TMPDIR"
 if sudo mount -r "$disk" "$TMPDIR" ; then 
 LASTDISK="$disk"
 else 
 echo "Failed to mount $disk"
 sudo rmdir "$TMPDIR"
 fi
 fi
 echo "installing $src on $disk to $dst"
 install -p --mode=664 -D "$TMPDIR"/$src -t "$TARGET"/$dst/
}

Notice that I've used a number of bash-isms here that make this non-portable, but since you've explicitly called out bash, I'm assuming this is OK. I've also chosen to use sudo mount and sudo umount instead of udiskctl. Either could work, of course; it's a matter of preference as to which is used. On one hand, mount is always available but on the other, it requires sudo privileges. Most of this will be self-explanatory, except for cleanup which is described in the next suggestion.

Use a cleanup function

It's annoying when a script fails for some reason and then leaves temporary files or other junk lying around as a result. One technique that's handy for this is to use bash's TRAP feature.

# un mount and remove bind dir TMPDIR if
# TMPDIR is not empty
function cleanup {
 if [[ ! -z "$TMPDIR" ]] ; then
 sudo umount "$TMPDIR"
 sudo rm -rf "$TMPDIR"
 fi
}
# rest of script ...
trap cleanup EXIT

This tells bash that no matter how we get to the exit (either normally or via some fatal error) it needs to invoke the specified function, which I typically name cleanup for obvious reasons.

Question 4

Consider plain POSIX shell

We're not using any of Bash's features other than pushd/popd (which I'll comment on later), so we can reduce overhead by using /bin/sh as interpreter. That makes it more portable as well as more efficient.

Naming

Convention says that shell variables are normally all lower-case, with the exception of those that are exported into the environment to affect the behaviour of sub-processes.

Quoting

Here, we're using $substring as a wildcard pattern:

if test "${string#*$substring}" != "$string"; then
 return 0
else
 return 1
fi

But we want it to match literally (i.e. if we have * or ? in the substring, it should only match * or ? in the string), so we need to write "${string#*"$substring"}".

Also, if $command; then return 0; else return 1; fi is an antipattern. Shell functions return the status of the last command executed, so we can pass that on:

contains() {
 string="1ドル"
 substring="2ドル"
 test "${string#*"$substring"}" != "$string"
}

We can then use it more naturally:

 if contains "$FSRC" '*' # a wildcard is used in the file set
 then

We have $i wrongly outside of quotes here:

NOITEMS=$(cat "$CONF" | jq -r ".["$i"].files | length")

I think that was intended to be

jq -r ".[\"$i\"].files | length"

Or better, (avoiding problems when $i contains jq syntax):

jq --arg i "$i" -r '.[$i].files | length'

Arithmetic

 j=$(($j + 1))

i=$(($i + 1))

Within arithmetic expansion, variables can be expanded without writing $:

 j=$((j + 1))

 i=$((i + 1))

We might even use ++ - e.g. : $((++i))

Useless cat

Shellcheck reports these that are easily fixed:

LIMIT=$(cat "$CONF" | jq -r length)
DISK=$(cat "$CONF" | jq -r .["$i"].disk)
NOITEMS=$(cat "$CONF" | jq --arg i "$i" -r '.[$i].files | length')
FSRC=$(cat "$CONF" | jq -r .["$i"].files["$j"].src)
FDEST=$(cat "$CONF" | jq -r .["$i"].files["$j"].dest)

They become simply:

limit=$(<"$conf" jq -r length)
disk=$(<"$conf" jq --arg i "$i" -r '.[$i].disk')
noitems=$(<"$conf" jq --arg i "$i" -r '.[$i].files | length')
fsrc=$(<"$conf" jq --arg i "$i" --arg j "$j" -r '.[$i].files[$j].src')
fdest=$(<"$conf" jq --arg i "$i" --arg j "$j" -r '.[$i].files[$j].dest')

Also, we have places where we capture output into a variable and then use echo to send that to another command:

res=$(udisksctl loop-setup -f "$disk")
loop=$(echo "$res" cut -f5 -d' ' | head -c -2)

There's no need to wait for one command to finish before starting the next, so just pipeline them:

 loop=$(udisksctl loop-setup -f "$disk" | cut -f5 -d' ' | head -c -2)

Working directory

Always think about what happens when changing working directory fails. Usually when this happens, we want to abort, but sometimes there's a possible recovery action. Consider this line:

echo "$(cd "$(dirname "$dir")"; pwd -P)/$(basename "$dir")"

If cd fails here, we get completely the wrong result. We need to test its result before using pwd:

 (cd "$(dirname "$dir")" || exit; echo "$(pwd -P)/$(basename "$dir")")

That said, if your target has realpath, you could use that instead of this function.

The other place we change directory is here:

 pushd "$SRCDIR" > /dev/null
 cp -t "$DESTROOT"/"$FDEST" $FSRC
 popd > /dev/null

The directory stack functions are better suited to interactive use (which is why you had to redirect to null), and they can fail just like cd. Better to use cd in a subshell:

 (cd "$srcdir" && cp -t "$destroot"/"$FDEST" $FSRC)

Error handling

There are many actions in this script that can fail (for a number of different reasons), but we blindly assume that everything succeeds and make no effort to stop processing and return an appropriate status. A lot more thought needs to be put into dealing with errors, and into ensuring (with trap) that the cleanup actions are always done, even when we exit early due to error.

Oh My Goodness Oh My Goodness 4,3461 gold badge12 silver badges26 bronze badges · Accepted Answer · 2018-12-24 09:04:06Z