I'm adding self tests to C++ code that ensures there are no NDEBUG
and Posix assert dependencies (the back story below). The first test looks for inclusion of <assert.h>
and <cassert>
:
FAILED=0
COUNT=$($EGREP -c '(assert.h|cassert)' *.h *.cpp)
if [[ "$COUNT" -ne "0" ]]; then
FAILED=1
echo "Found Posix assert headers" | tee -a "$TEST_RESULTS"
fi
Its producing:
************************************
Testing: No Posix assert
./cryptest.sh: line 1130: [[: 3way: value too great for base (error token is "3way")
...
When I debug it I see:
bash -x ./cryptest.sh
...
++ egrep -c '(assert.h|cassert)' 3way.h adler32.h aes.h ...
+ COUNT='3way.h:0
adler32.h:0
aes.h:0
...
So each file gets its own line and own count.
The grep
man page states the following. It does not discuss multi-line output.
-c, --count
Only a count of selected lines is written to standard output.
The behavior appears to have something to do with Output Control (form the man page) and -l, --files-with-matches
. I also tried the -L, --files-without-match
option. It produces a similar error.
My question is, how can I have grep
fold the results into one count?
Or maybe I should ask, is grep and egrep the right tool for the job? If grep and egrep are not the right tool, then what should I use?
This is a Bash shell script that executes on every platform we support. Every platform includes BSDs, Linux, OS X, Solaris and Unix (and all the mobile variants, like Android and iOS). We have to work to get what we need in terms of tools like grep
and egrep
:
GREP=grep
EGREP=egrep
SED=sed
AWK=awk
DISASS=objdump
DISASSARGS=("--disassemble")
...
# Fixup
if [[ "$IS_SOLARIS" -ne "0" ]]; then
IS_X64=$(isainfo 2>/dev/null | "$GREP" -i -c "amd64")
if [[ "$IS_X64" -ne "0" ]]; then
IS_X86=0
fi
# Need something more powerful than the non-Posix versions
if [[ (-e "/usr/gnu/bin/grep") ]]; then
GREP=/usr/gnu/bin/grep;
fi
if [[ (-e "/usr/gnu/bin/egrep") ]]; then
EGREP=/usr/gnu/bin/egrep;
fi
if [[ (-e "/usr/gnu/bin/sed") ]]; then
SED=/usr/gnu/bin/sed;
fi
if [[ (-e "/usr/gnu/bin/awk") ]]; then
AWK=/usr/gnu/bin/awk;
else
AWK=nawk;
fi
DISASS=dis
DISASSARGS=()
fi
...
Back story
Our project recently took CVE-2016-7420 due to users building the project with other tools, like Autotools and CMake. The CVE is a direct result of omitting -DNDEBUG
for release/production builds. The other tools don't configure the way we do, and we did not tell users either (1) they can't use other build tools, or (2) users must define -DNDEBUG
for release/production.
Our remediations are cutting much deeper than "simply define NDEBUG
for release/production" in documentation. We are gutting all dependencies on NDEBUG
and Posix assert
so folks cannot accidentally get into the configuration. We are also requiring users ask for a debug configuration by defining DEBUG
or _DEBUG
; otherwise, they get the release configuration.
While an assert
and the SIGART
that follows is usually annoying in release builds, considered benign in debug build, and taken for granted, we observe:
- We are a security library (we handle sensitive information)
- A failed assert egresses sensitive information to the file system (core files and crash reports)
- A failed assert egresses sensitive information to platform vendors like Apple (CrashReporter), Apport (Ubuntu), Microsoft (Windows Error Reporting)
- Companies like Apple, Google and Microsoft cooperate with government to mine the sensitive information
2 Answers 2
Note: the following is based on the GNU implementation of grep
, however I think it should apply in your case as well
As noted in the GNU grep
manual (emphasis mine)
grep searches the named input FILEs for lines containing a match to the
given PATTERN. If no files are specified, or if the file "-" is given,
grep searches standard input. By default, grep prints the matching
lines.
Also,
-c, --count
Suppress normal output; instead print a count of matching lines
**for each input file**. With the -v, --invert-match option (see
below), count non-matching lines.
(and the default behavior is to prefix such output with the file name - although that can be suppressed using the -h
option).
By concatenating your target files into a single input stream and piping that to grep
, you should be able to override both these behaviors and get a single count without prefix:
COUNT=$(cat *.h *.cpp | $EGREP -c '(assert.h|cassert)')
IMHO this would qualify as a useful use of cat; probably what you have been advised against is Useless Use of Cat
-
Perfect; thank you very much. My apologies for asking for the answer. There's too much I don't know about bash, so I try to follow the rules others tell me.user56041– user560412016年09月17日 18:26:13 +00:00Commented Sep 17, 2016 at 18:26
-
Having multiple file arguments is a good indicator that a
cat
is useful. But you should saycat -- *.h *.cpp
to prevent a filename that begins with-
from being interpreted as an option string.Scott - Слава Україні– Scott - Слава Україні2016年09月23日 01:20:13 +00:00Commented Sep 23, 2016 at 1:20
steeldriver’s answer
(do cat files | grep -c <token>
)
was my first thought when I read your question title.
But I see that, in your script snippet, you aren’t using the count,
beyond comparing it to zero — i.e.,
you’re asking "how many are there?" when you want to know "are there any?".
Consider using -q
:
if "$EGREP" -q -- 'assert\.h|cassert' *.h *.cpp
then
FAILED=1
echo "Found Posix assert headers" ...
fi
Notes:
- You should always quote your shell variable references
(e.g.,
"$EGREP"
) unless you have a good reason not to, and you’re sure you know what you’re doing. If you have definedEGREP=grep -e
, that would be a reasonably good reason to say$EGREP
without quotes, but see this answer to Security implications of forgetting to quote a variable in bash/POSIX shells. -q
(or, equivalently,--quiet
or--silent
) means "Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected." This not only gives you the functional behavior that you want (i.e., the same functional behavior as steeldriver’s answer), but with the performance benefit thatgrep
will exit as soon as it finds a match, and doesn’t need to read all the files.- It’s advised to put
--
between a command’s options and its arguments in order to prevent a filename that begins with-
from being interpreted as an option string. - You don’t need to have parentheses around your entire regular expression.
grep 'assert.h'
will matchassert h
,assert,h
,assert3h
,assertph
, etc. If you don’t care, that’s up to you. If you want to match onlyassert.h
, grep forassert\.h
.
egrep -c <whatever> *.h *.c
bycat *.h *.c | egrep -c <whatever>
?grep
produces and figure out what extra processing it needs. One common approach for summing columns involves awk. Please show some effort of trying to solve the problemcat <files> | grep
becausegrep <files>
is the preferred way to do things on Unix and Linux. If that's the solution, then please post it as an answer so others can critique it.