7
\$\begingroup\$

I learned Bash a million years ago. I just wrote this simple script used to get the first lot of HTML comments from a file, and spit it out in order to create a README.md file.

It is just. So. Ugly. I read bits and pieces over the years, and I am sure it can be improved so much...

Here we go:

#!/bin/bash
IFS=''
active='0';
cat hot-form-validator.html | while read "line";do
 echo $line | grep '\-\->' > /dev/null
 if [ $active = '1' -a $? = '0' ];then 
 exit 0;
 fi;
 suppress=0;
 echo $line | grep '^ *@' > /dev/null
 if [ $? = '0' ];then suppress='1'; fi;
 if [ $active = '1' -a $suppress = '0' ];then echo $line;fi;
 echo $line | grep "<!--" > /dev/null
 if [ $? = '0' ];then active='1'; fi;
done

Questions:

  • Is there a better way to do grep and then check $?? Back in the day it was the way to go, but...

  • Should active be a proper number rather than a string with a number? I know, it could be anything... but having a string that can be 0 or 1 just feels wrong.

  • Is there a better way to preserve spaces, rather than zapping IFS?

  • Any more pearls of wisdom, other than quitting my (short lived) career of bash scripter?

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jul 18, 2016 at 15:51
\$\endgroup\$

3 Answers 3

4
\$\begingroup\$

A bug

  • If your html contains a backspace character e.g

    <p>The special character \n is a way to include read line</p>

your code interprets it a

The special character n is a way to include read line.

to avoid this, add "-r" switch to your read line

cat hot-form-validator.html | while read -r "line";do

  • You want to check if your file exists before reading it
if [ -f $filename ]; then
#do something
else
#do something
fi
answered Jul 18, 2016 at 18:03
\$\endgroup\$
2
\$\begingroup\$

A Bash loop is not the best approach. If you are doing a lot of line-by-line processing, and invoking grep a lot, then awk would be a more appropriate tool.

awk '/<!--/ { ACTIVE = 1; next }
 /-->/ { exit }
 ACTIVE { print }' < hot-form-validator.html

But line-by-line processing is not an appropriate way to parse HTML. xsltproc, for example, could do a proper job of extracting the first comment in an HTML file:

xsltproc --html first-comment.xsl hot-form-validator.html

... where first-comment.xml contains:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>
 <xsl:template match="/"><xsl:value-of select="//comment()[1]"/></xsl:template>
</xsl:stylesheet>
answered Jul 18, 2016 at 19:01
\$\endgroup\$
3
  • \$\begingroup\$ I don't really want turn xsltproc a requirement to generate the documentation -- especially since I control the contents of the starting file. But, I am SOLD on AWK! What's the best resource to learn it? \$\endgroup\$ Commented Jul 19, 2016 at 2:14
  • \$\begingroup\$ But also... what about the script in the original post? What if I really wanted to improve it? \$\endgroup\$ Commented Jul 19, 2016 at 2:14
  • \$\begingroup\$ This tutorial seems good. Or, if you are a masochist, you could try to figure it all out from the man page. \$\endgroup\$ Commented Jul 19, 2016 at 2:31
1
\$\begingroup\$

Note that you can get a bunch of Bash tips instantly by pasting your code on http://www.shellcheck.net/

A couple of things jump in the eye:

  • Too many unnecessary ; at the end of lines
  • Instead of grep ... > /dev/null, it would be shorter to write grep -q ...
  • Too many statements on a line when it would be more readable if split to multiple lines
  • Instead of cat file | while ..., it would be better to write while ... < file
  • Instead of echo ... | grep ..., it would be better to use here-strings, grep ... <<< ...
  • Instead of grep ... and then checking the value of $? in an if, you could move the grep inside the if
  • Quoting where it's not needed
  • Not quoting where it's needed

With the above points improved:

#!/bin/bash
IFS=
active=0
while read "line"; do
 if [ $active = 1 ] && grep -q '\-\->' <<< "$line"; then 
 exit 0
 fi
 suppress=0
 grep -q '^ *@' <<< "$line" && suppress=1
 [ $active = 1 -a $suppress = 0 ] && echo $line
 grep -q "<!--" <<< "$line" && active=1 
done < hot-form-validator.html

I rewrote using grep -q ... <<< ... only for the sake of an example of using grep inside if statements. But a better solution is to use native Bash pattern matching with [[ ... ]], for example instead of:

 grep -q '^ *@' <<< "$line" && suppress=1

This is better, as it doesn't spawn a grep process:

 [[ "$line" =~ ^\ *@ ]] && suppress=1
answered Jul 30, 2016 at 21:45
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.