I learned Bash a million years ago. I just wrote this simple script used to get the first lot of HTML comments from a file, and spit it out in order to create a README.md file.
It is just. So. Ugly. I read bits and pieces over the years, and I am sure it can be improved so much...
Here we go:
#!/bin/bash
IFS=''
active='0';
cat hot-form-validator.html | while read "line";do
echo $line | grep '\-\->' > /dev/null
if [ $active = '1' -a $? = '0' ];then
exit 0;
fi;
suppress=0;
echo $line | grep '^ *@' > /dev/null
if [ $? = '0' ];then suppress='1'; fi;
if [ $active = '1' -a $suppress = '0' ];then echo $line;fi;
echo $line | grep "<!--" > /dev/null
if [ $? = '0' ];then active='1'; fi;
done
Questions:
Is there a better way to do
grepand then check$?? Back in the day it was the way to go, but...Should
activebe a proper number rather than a string with a number? I know, it could be anything... but having a string that can be0or1just feels wrong.Is there a better way to preserve spaces, rather than zapping
IFS?Any more pearls of wisdom, other than quitting my (short lived) career of bash scripter?
3 Answers 3
A bug
If your html contains a backspace character e.g
<p>The special character \n is a way to include read line</p>
your code interprets it a
The special character n is a way to include read line.
to avoid this, add "-r" switch to your read line
cat hot-form-validator.html | while read -r "line";do
- You want to check if your file exists before reading it
if [ -f $filename ]; then #do something else #do something fi
A Bash loop is not the best approach. If you are doing a lot of line-by-line processing, and invoking grep a lot, then awk would be a more appropriate tool.
awk '/<!--/ { ACTIVE = 1; next }
/-->/ { exit }
ACTIVE { print }' < hot-form-validator.html
But line-by-line processing is not an appropriate way to parse HTML. xsltproc, for example, could do a proper job of extracting the first comment in an HTML file:
xsltproc --html first-comment.xsl hot-form-validator.html
... where first-comment.xml contains:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/"><xsl:value-of select="//comment()[1]"/></xsl:template>
</xsl:stylesheet>
-
\$\begingroup\$ I don't really want turn xsltproc a requirement to generate the documentation -- especially since I control the contents of the starting file. But, I am SOLD on AWK! What's the best resource to learn it? \$\endgroup\$Merc– Merc2016年07月19日 02:14:20 +00:00Commented Jul 19, 2016 at 2:14
-
\$\begingroup\$ But also... what about the script in the original post? What if I really wanted to improve it? \$\endgroup\$Merc– Merc2016年07月19日 02:14:40 +00:00Commented Jul 19, 2016 at 2:14
-
\$\begingroup\$ This tutorial seems good. Or, if you are a masochist, you could try to figure it all out from the man page. \$\endgroup\$200_success– 200_success2016年07月19日 02:31:17 +00:00Commented Jul 19, 2016 at 2:31
Note that you can get a bunch of Bash tips instantly by pasting your code on http://www.shellcheck.net/
A couple of things jump in the eye:
- Too many unnecessary
;at the end of lines - Instead of
grep ... > /dev/null, it would be shorter to writegrep -q ... - Too many statements on a line when it would be more readable if split to multiple lines
- Instead of
cat file | while ..., it would be better to writewhile ... < file - Instead of
echo ... | grep ..., it would be better to use here-strings,grep ... <<< ... - Instead of
grep ...and then checking the value of$?in anif, you could move thegrepinside theif - Quoting where it's not needed
- Not quoting where it's needed
With the above points improved:
#!/bin/bash
IFS=
active=0
while read "line"; do
if [ $active = 1 ] && grep -q '\-\->' <<< "$line"; then
exit 0
fi
suppress=0
grep -q '^ *@' <<< "$line" && suppress=1
[ $active = 1 -a $suppress = 0 ] && echo $line
grep -q "<!--" <<< "$line" && active=1
done < hot-form-validator.html
I rewrote using grep -q ... <<< ... only for the sake of an example of using grep inside if statements.
But a better solution is to use native Bash pattern matching with [[ ... ]], for example instead of:
grep -q '^ *@' <<< "$line" && suppress=1
This is better, as it doesn't spawn a grep process:
[[ "$line" =~ ^\ *@ ]] && suppress=1