I learned Bash a million years ago. I just wrote this simple script used to get the first lot of HTML comments from a file, and spit it out in order to create a README.md
file.
It is just. So. Ugly. I read bits and pieces over the years, and I am sure it can be improved so much...
Here we go:
#!/bin/bash
IFS=''
active='0';
cat hot-form-validator.html | while read "line";do
echo $line | grep '\-\->' > /dev/null
if [ $active = '1' -a $? = '0' ];then
exit 0;
fi;
suppress=0;
echo $line | grep '^ *@' > /dev/null
if [ $? = '0' ];then suppress='1'; fi;
if [ $active = '1' -a $suppress = '0' ];then echo $line;fi;
echo $line | grep "<!--" > /dev/null
if [ $? = '0' ];then active='1'; fi;
done
Questions:
Is there a better way to do
grep
and then check$?
? Back in the day it was the way to go, but...Should
active
be a proper number rather than a string with a number? I know, it could be anything... but having a string that can be0
or1
just feels wrong.Is there a better way to preserve spaces, rather than zapping
IFS
?Any more pearls of wisdom, other than quitting my (short lived) career of bash scripter?
3 Answers 3
A bug
If your html contains a backspace character e.g
<p>The special character \n is a way to include read line</p>
your code interprets it a
The special character n is a way to include read line.
to avoid this, add "-r" switch to your read line
cat hot-form-validator.html | while read -r "line";do
- You want to check if your file exists before reading it
if [ -f $filename ]; then #do something else #do something fi
A Bash loop is not the best approach. If you are doing a lot of line-by-line processing, and invoking grep
a lot, then awk
would be a more appropriate tool.
awk '/<!--/ { ACTIVE = 1; next }
/-->/ { exit }
ACTIVE { print }' < hot-form-validator.html
But line-by-line processing is not an appropriate way to parse HTML. xsltproc
, for example, could do a proper job of extracting the first comment in an HTML file:
xsltproc --html first-comment.xsl hot-form-validator.html
... where first-comment.xml
contains:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/"><xsl:value-of select="//comment()[1]"/></xsl:template>
</xsl:stylesheet>
-
\$\begingroup\$ I don't really want turn xsltproc a requirement to generate the documentation -- especially since I control the contents of the starting file. But, I am SOLD on AWK! What's the best resource to learn it? \$\endgroup\$Merc– Merc2016年07月19日 02:14:20 +00:00Commented Jul 19, 2016 at 2:14
-
\$\begingroup\$ But also... what about the script in the original post? What if I really wanted to improve it? \$\endgroup\$Merc– Merc2016年07月19日 02:14:40 +00:00Commented Jul 19, 2016 at 2:14
-
\$\begingroup\$ This tutorial seems good. Or, if you are a masochist, you could try to figure it all out from the man page. \$\endgroup\$200_success– 200_success2016年07月19日 02:31:17 +00:00Commented Jul 19, 2016 at 2:31
Note that you can get a bunch of Bash tips instantly by pasting your code on http://www.shellcheck.net/
A couple of things jump in the eye:
- Too many unnecessary
;
at the end of lines - Instead of
grep ... > /dev/null
, it would be shorter to writegrep -q ...
- Too many statements on a line when it would be more readable if split to multiple lines
- Instead of
cat file | while ...
, it would be better to writewhile ... < file
- Instead of
echo ... | grep ...
, it would be better to use here-strings,grep ... <<< ...
- Instead of
grep ...
and then checking the value of$?
in anif
, you could move thegrep
inside theif
- Quoting where it's not needed
- Not quoting where it's needed
With the above points improved:
#!/bin/bash
IFS=
active=0
while read "line"; do
if [ $active = 1 ] && grep -q '\-\->' <<< "$line"; then
exit 0
fi
suppress=0
grep -q '^ *@' <<< "$line" && suppress=1
[ $active = 1 -a $suppress = 0 ] && echo $line
grep -q "<!--" <<< "$line" && active=1
done < hot-form-validator.html
I rewrote using grep -q ... <<< ...
only for the sake of an example of using grep
inside if
statements.
But a better solution is to use native Bash pattern matching with [[ ... ]]
, for example instead of:
grep -q '^ *@' <<< "$line" && suppress=1
This is better, as it doesn't spawn a grep
process:
[[ "$line" =~ ^\ *@ ]] && suppress=1