9
\$\begingroup\$

I have written a BASH script to parse and compile HTML code into a single page. The script works as expected, although my code is not completely re-usable and the standard output messages could be more helpful.

Besides my intended improvements, are there any other problems with my code?

#!/bin/bash
#Append each section of the book from each retrieved webpage
function help
{
 cat << EOF
This script compiles a book from the 'content' element found in multiple pages. 
Written by Ben Cottrell.
Usage: 0ドル [pages] [output]
EOF
}
function compile
{
 LIST=$(cat 1ドル)
 FILENAME=""
 OUTPUT=2ドル
 cat << HEADER > $OUTPUT
<!DOCTYPE html>
<!-- Generated at $(date) -->
<html>
 <head>
 <meta charset="UTF-8"></meta>
 <title>Template</title>
 <link rel="stylesheet" href="default.css"/>
 </head>
 <body>
HEADER
 mkdir _pages
 for PAGE in $LIST; do
 echo $PAGE
 wget -q --directory-prefix="_pages/" $PAGE
 #Append the entry
 if [ $? == 0 ]; then
 #Extracts the title from a tag within a retrieved HTML document 
 FILENAME=$(basename $PAGE)
 cat << CONTENT>> $OUTPUT
 <h3>$(xmllint --html --xpath '/html/head/title/text()' "_pages/$FILENAME")</h3>
 <a href="$PAGE">Retrieved from $PAGE</a>
 <div class="section">
 $(xmllint --html --xpath "//div[@id='content']/node()" "_pages/$FILENAME")
 </div>
CONTENT
 else
 cat << SECTION >> $OUTPUT
 <h3>$(basename $PAGE)</h3>
 <div class="section">
 Unavailable
 </div>
SECTION
 fi
 done
 cat << FINAL >> $OUTPUT
 </body>
</html> 
FINAL
}
if [ $# == 2 ]; then
 #Check if the 'xmllint' executable and the files exist
 if [ ! $(which xmllint) ]; then
 cat << EOF
 The xmllint executable is missing from your system.
 Please install the program first, before using this script.
EOF
 fi
 if [ -x 1ドル ]; then
 echo "A file containing a list of URL's is missing."
 fi
 compile 1ドル 2ドル
else
 help
fi
asked Jul 27, 2015 at 11:08
\$\endgroup\$
1
  • \$\begingroup\$ This is kind of a cool concept, and I understand what's happening... I feel good reading this :P \$\endgroup\$ Commented Jul 27, 2015 at 17:18

2 Answers 2

7
\$\begingroup\$

Copy paste your code on shellcheck.net, it will give you some interesting recommendations.

Among other things, interestingly, it points out a parsing error for this:

cat << CONTENT>> $OUTPUT

Though it actually works for me as it is, it's better to add a space after "CONTENT" to make it clear, like this:

cat << CONTENT >> $OUTPUT

When declaring functions, instead of this:

function help
{

The modern convention is this writing style:

help() {

The here-documents disrupt the logical flow of the script, for example here:

 if [ ! $(which xmllint) ]; then
 cat << EOF
 The xmllint executable is missing from your system.
 Please install the program first, before using this script.
EOF
 fi

You can mitigate that disruption by moving the printing logic to a helper function.

answered Jul 27, 2015 at 11:22
\$\endgroup\$
0
6
\$\begingroup\$

In addition to @janos's points:

cat vs echo:

You've used quite a few HEREDOCs in this script, some of them pretty short. I'd tend to use echo for these instead, which not only makes it more explicit that the output is a string, but keeps it all inline, e.g.:

 cat << EOF
 The xmllint executable is missing from your system.
 Please install the program first, before using this script.
EOF

could be converted to:

echo "The xmllint executable is missing from your system."
echo "Please install the program first, before using this script."

You could also move these messages out into separate variables, which would also minimise their impact on the script.

This also has the questionable benefit that you're not relying on an external program to do the printing, since echo is a bash built-in.

Tests

Strictly speaking, when used with [, the use of == is a bashism which is not portable to other shells. Even in bash, it's the operator intended for use when comparing strings, rather than numbers.

Use = for strings, and -eq for bare numbers, and check help test for other test operators.

This brings me on to:

if [ -x 1ドル ]; then
 echo "A file containing a list of URL's is missing."
fi

I'm not quite sure why you're checking whether your file of URLs is executable. If you just want to check whether it exists, you want the -a operator, but it's a little more effective to check whether it exists and is readable, which is -r.

As shellcheck says, you should also double-quote these variables: this stops things like filenames with spaces breaking your script, and also forces empty variables to expand to empty strings, rather than expanding to nothing at all and causing syntax errors.

Errors

In a few places you've printed out error messages, but the script doesn't stop, it carries on regardless. Is this intentional?

Also, error messages should, in general, go to stderr.

I tend to define some exit codes and do something like:

# Define these all in one place
ERR_INVALID_ARGS=5
#Then later, if args are invalid
echo "Error: invalid arguments provided" >&2 # redirects to stderr
exit $ERR_INVALID_ARGS

More general stuff

You could, if you wanted, make the output a command-line option using getopt, allow any number of input files as an argument using a while loop and shift, and output to stdout by default.

You might also want to check whether the file exists before you write to it, and you could do something like read a Y/N check whether you want to overwrite, rather than always appending.

None of the above is really a big problem here. The script is reasonably structured, and your biggest worry (IMO) is that you could expand an empty variable and get a syntax error, or accidentally screw up an existing file.

answered Jul 27, 2015 at 17:57
\$\endgroup\$
2
  • \$\begingroup\$ echo -e is not a good recommendation, because the flags of echo are not portable. printf is more portable. \$\endgroup\$ Commented Aug 4, 2015 at 11:58
  • \$\begingroup\$ @janos: Good point, POSIX standards don't require the -e option (it's in both the GNU version of /bin/echo and the bash version, but not the dash version, for example). Removed it. \$\endgroup\$ Commented Aug 4, 2015 at 12:28

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.