I search a way to grep on source code without having sometimes false-positive because of comments. For instance if I search on foo on this .c source code :
/*
* foo has changed [...] and is now a 2-parameters function
*/
// foo(24)
foo(42, 28);
A naive grep
will find 3 occurrences where I want only one. I have seen this way to do it on StackOverflow, but it does not fill my needs : PHP is not available on the platform. I have also found this way for one-line comments, but it only solves a part of my problem.
I need to use classical scripting tools (awk, sed, bash, grep, etc) and I need it to be fast even if there are thousands of files.
Do you now if and how it's possible to grep on source code, and only source code ?
-
3Building a tags table might be a better approach, depending on what you're doing.Gilles 'SO- stop being evil'– Gilles 'SO- stop being evil'2012年03月01日 12:57:18 +00:00Commented Mar 1, 2012 at 12:57
3 Answers 3
grep works on pure text and does not know anything about the underlying syntax of your C program. Therefore, in order not search inside comments you have several options:
Strip C-comments before the search, you can do this using
gcc -fpreprocessed -dD -E yourfile.c
For details, please see https://stackoverflow.com/questions/2394017/remove-comments-from-c-c-codeWrite/use some hacky half-working scripts like you have already found (e.g. they work by skipping lines starting with
//
or/*
) in order to handle the details of all possible C/C++ comments (again, see the previous link for some scary testcases). Then you still may have false positives, but you do not have to preprocess anything.Use more advanced tools for doing "semantic search" in the code. I have found "coccigrep": http://home.regit.org/software/coccigrep/ This kind of tools allows search for some specific language statements (i.e. an update of a structure with given name) and certainly they drop the comments.
You can try a naive approach to match non-comments like this:
$ egrep -v "^(//|/\*| \*)" sourcecode
This will only inverse match against prefixed comments - that is lines starting with either //
, /*
, *
or */
- and hence it'll not leave out blocks that are commented out with the /*
and */
pair.
-
1Modified slightly to work for indented comments: $ egrep -v "^[[:space:]]*((//|/*| *)" sourcecodembonness– mbonness2019年11月20日 21:49:56 +00:00Commented Nov 20, 2019 at 21:49
Here is a specific variation for all of the rest of us late-comers to this question:
ls -1 src/*.c | xargs -i sh -c "echo;gcc -fpreprocessed -dD -E {} 2>&1 | grep -wi -e one -e two -e three -n | sed 's:^:{}\::'" | cat -s
A list if C source files
ls -1 src/*.c
are piped to xargs, which executes the preprocessor in a child shell
gcc -fpreprocessed -dD -E {} 2>&1
which is subsequently piped into a desired grep command
grep -wi -e one -e two -e three -n
which is then piped into sed to prefix each line with the current file name
sed 's:^:{}\::'
Finally, all the repeated blank lines are collapsed to single lines using cat:
cat -s
This works on a RHEL6 system, but I assume it is general enough for other *nix systems.
-
Good idea to use the tokenizer to remove comments. However this has the mistake of parsing
ls
phuclv– phuclv2023年06月17日 14:13:58 +00:00Commented Jun 17, 2023 at 14:13