I have written a BASH script to parse and compile HTML code into a single page. The script works as expected, although my code is not completely re-usable and the standard output messages could be more helpful.
Besides my intended improvements, are there any other problems with my code?
#!/bin/bash
#Append each section of the book from each retrieved webpage
function help
{
cat << EOF
This script compiles a book from the 'content' element found in multiple pages.
Written by Ben Cottrell.
Usage: 0ドル [pages] [output]
EOF
}
function compile
{
LIST=$(cat 1ドル)
FILENAME=""
OUTPUT=2ドル
cat << HEADER > $OUTPUT
<!DOCTYPE html>
<!-- Generated at $(date) -->
<html>
<head>
<meta charset="UTF-8"></meta>
<title>Template</title>
<link rel="stylesheet" href="default.css"/>
</head>
<body>
HEADER
mkdir _pages
for PAGE in $LIST; do
echo $PAGE
wget -q --directory-prefix="_pages/" $PAGE
#Append the entry
if [ $? == 0 ]; then
#Extracts the title from a tag within a retrieved HTML document
FILENAME=$(basename $PAGE)
cat << CONTENT>> $OUTPUT
<h3>$(xmllint --html --xpath '/html/head/title/text()' "_pages/$FILENAME")</h3>
<a href="$PAGE">Retrieved from $PAGE</a>
<div class="section">
$(xmllint --html --xpath "//div[@id='content']/node()" "_pages/$FILENAME")
</div>
CONTENT
else
cat << SECTION >> $OUTPUT
<h3>$(basename $PAGE)</h3>
<div class="section">
Unavailable
</div>
SECTION
fi
done
cat << FINAL >> $OUTPUT
</body>
</html>
FINAL
}
if [ $# == 2 ]; then
#Check if the 'xmllint' executable and the files exist
if [ ! $(which xmllint) ]; then
cat << EOF
The xmllint executable is missing from your system.
Please install the program first, before using this script.
EOF
fi
if [ -x 1ドル ]; then
echo "A file containing a list of URL's is missing."
fi
compile 1ドル 2ドル
else
help
fi
-
\$\begingroup\$ This is kind of a cool concept, and I understand what's happening... I feel good reading this :P \$\endgroup\$Canadian Luke– Canadian Luke2015年07月27日 17:18:23 +00:00Commented Jul 27, 2015 at 17:18
2 Answers 2
Copy paste your code on shellcheck.net, it will give you some interesting recommendations.
Among other things, interestingly, it points out a parsing error for this:
cat << CONTENT>> $OUTPUT
Though it actually works for me as it is, it's better to add a space after "CONTENT" to make it clear, like this:
cat << CONTENT >> $OUTPUT
When declaring functions, instead of this:
function help {
The modern convention is this writing style:
help() {
The here-documents disrupt the logical flow of the script, for example here:
if [ ! $(which xmllint) ]; then cat << EOF The xmllint executable is missing from your system. Please install the program first, before using this script. EOF fi
You can mitigate that disruption by moving the printing logic to a helper function.
In addition to @janos's points:
cat
vs echo
:
You've used quite a few HEREDOCs in this script, some of them pretty short. I'd tend to use echo
for these instead, which not only makes it more explicit that the output is a string, but keeps it all inline, e.g.:
cat << EOF
The xmllint executable is missing from your system.
Please install the program first, before using this script.
EOF
could be converted to:
echo "The xmllint executable is missing from your system."
echo "Please install the program first, before using this script."
You could also move these messages out into separate variables, which would also minimise their impact on the script.
This also has the questionable benefit that you're not relying on an external program to do the printing, since echo
is a bash built-in.
Tests
Strictly speaking, when used with [
, the use of ==
is a bashism which is not portable to other shells. Even in bash, it's the operator intended for use when comparing strings, rather than numbers.
Use =
for strings, and -eq
for bare numbers, and check help test
for other test operators.
This brings me on to:
if [ -x 1ドル ]; then
echo "A file containing a list of URL's is missing."
fi
I'm not quite sure why you're checking whether your file of URLs is executable. If you just want to check whether it exists, you want the -a
operator, but it's a little more effective to check whether it exists and is readable, which is -r
.
As shellcheck says, you should also double-quote these variables: this stops things like filenames with spaces breaking your script, and also forces empty variables to expand to empty strings, rather than expanding to nothing at all and causing syntax errors.
Errors
In a few places you've printed out error messages, but the script doesn't stop, it carries on regardless. Is this intentional?
Also, error messages should, in general, go to stderr
.
I tend to define some exit codes and do something like:
# Define these all in one place
ERR_INVALID_ARGS=5
#Then later, if args are invalid
echo "Error: invalid arguments provided" >&2 # redirects to stderr
exit $ERR_INVALID_ARGS
More general stuff
You could, if you wanted, make the output a command-line option using getopt
, allow any number of input files as an argument using a while loop and shift
, and output to stdout
by default.
You might also want to check whether the file exists before you write to it, and you could do something like read
a Y/N check whether you want to overwrite, rather than always appending.
None of the above is really a big problem here. The script is reasonably structured, and your biggest worry (IMO) is that you could expand an empty variable and get a syntax error, or accidentally screw up an existing file.
-
\$\begingroup\$
echo -e
is not a good recommendation, because the flags ofecho
are not portable.printf
is more portable. \$\endgroup\$janos– janos2015年08月04日 11:58:09 +00:00Commented Aug 4, 2015 at 11:58 -
\$\begingroup\$ @janos: Good point, POSIX standards don't require the
-e
option (it's in both the GNU version of/bin/echo
and thebash
version, but not thedash
version, for example). Removed it. \$\endgroup\$Aesin– Aesin2015年08月04日 12:28:01 +00:00Commented Aug 4, 2015 at 12:28