4

I setup a directory structure as follows:

mkdir -p /tmp/test/build/lib/aaa/
cd /tmp/test
mkdir rar
echo "Hello" > foo.py
echo "Hello" > bar.py
echo "Hello" > rar/foo.py
echo "Hello" > rar/bar.py
echo "Hello" > build/lib/aaa/foo.py
echo "Hello" > build/lib/aaa/bar.py

I wish to exclude any py files under build/lib, ideally using the glob build/lib/**/*.py. I do not wish to --exclude-dir build/lib as there are potential collisions with non-python projects, and because python files may be nested further down.

If I try and do this with --exclude in grep, it fails:

grep -r "Hello" --exclude foo.py
# Note: --exclude manual suggests excluding any file in any subdir matching `foo.py`
# is excluded as expected behavior, which is NOT how globbing should work.
# > bar.py:Hello
# > rar/bar.py:Hello
# > build/lib/aaa/bar.py:Hello
grep -r "Hello" --exclude build/lib/*/foo.py
# > bar.py:Hello
# > foo.py:Hello
# > rar/bar.py:Hello
# > rar/foo.py:Hello
# > build/lib/aaa/bar.py:Hello
# > build/lib/aaa/foo.py:Hello
grep -r "Hello" --exclude build/lib/aaa/foo.py
# > bar.py:Hello
# > foo.py:Hello
# > rar/bar.py:Hello
# > rar/foo.py:Hello
# > build/lib/aaa/bar.py:Hello
# > build/lib/aaa/foo.py:Hello

Am I doing something wrong or is this a bug?

asked Aug 13 at 4:53
1
  • AFAIK, the exclusion pattern (which should be quoted) is only ever applied to the filenames portion of the pathname. If you want a more advanced selection of files to apply the utility on, use find. Commented Aug 13 at 5:13

3 Answers 3

10

You can't with grep alone. From info -- grep --exclude on a GNU system (emphasis mine):

--exclude=GLOB
Skip any command-line file with a name suffix that matches the pattern GLOB, using wildcard matching; a name suffix is either the whole name, or a trailing part that starts with a non-slash character immediately after a slash (‘/’) in the name. When searching recursively, skip any subfile whose base name matches GLOB; the base name is the part after the last slash. A pattern can use ‘*’, ‘?’, and ‘[’...‘]’ as wildcards, and ‘\’ to quote a wildcard or backslash character literally.

Same applies to --exclude-dir, which can only exclude directories based on their base name (here lib), not their full path relative to the starting point (build/lib).

Use find as you would with grep implementations that don't support those -r/--exclude non-standard GNU extensions.

To skip the whole of build/lib altogether (not even descend into it):

find . -path ./build/lib -prune -o -type f -size +4c \
 -exec grep Hello /dev/null {} +

Or if it's only the *.py files in there you want to exclude:

find . ! -path './build/lib/*.py' -type f -size +4c \
 -exec grep Hello /dev/null {} +

Or:

find . -path './build/lib/*.py' -prune -o -type f -size +4c \
 -exec grep Hello /dev/null {} +

To also exclude ./build/lib/dir.py/any/file/underneath/that/misleadingly-named/directory.

/dev/null is to make sure the file path is printed even if there's only one of them passed; with GNU grep or compatible, you can use the -H option instead.

-size +4c as an optimisation to skip those files with 4 or fewer bytes which can't possibly contain Hello. Will likely not gain much if any, so can be omitted, but shows that using the right tool for the task (find to find files, grep to grep them) allows you to be more thorough in your filter criteria.

-path pattern does a fnmatch() pattern matching against each file path while **/ is a zsh globbing operator (now supported by some other shells though often not by default). Both fnmatch("./build/lib/*.py", "./build/lib/file.py", flags) and fnmatch("./build/lib/*.py", "./build/lib/dir/file.py", flags) will return true as * matches any sequence of characters, / included. fnmatch("./build/lib/**/*.py", "./build/lib/file.py", flags) where ** is interpreted the same as * would return false. If you wanted to only exclude .py files in build/lib and not those in subdirs, you'd need ! '(' -path './build/lib/*.py' ! -path './build/lib/*/*.py' ')' or use the -regex non-standard extension of some find implementations: ! -regex '\./build/lib/[^/]*\.py'.

In zsh, you could do:

set -o extendedglob # best in ~/.zshrc
grep -- Hello /dev/null **/*~build/lib/*.py(D.L+4)

But note that:

  • you may run into the execve() limit on the size of arguments (though the zargs function could be used to work around that).
  • the list of files is sorted. If you don't care about the order, you can add the oN qualifier to skip that sorting.
  • it makes it easier to skip hidden files: just remove the D qualifier.
  • it avoids a ./ prefix being added to file paths (which is also why we need that --; you can change the glob to ./**/*~./build... if you do want that ./ prefix).
  • with some find/fnmatch() implementations, ! -path './build/lib/*.py' may fail to skip some .py files if part of their path cannot be decoded as valid text in the user's locale. zsh globs don't have that issue.
answered Aug 13 at 5:13
1

The GNU folks giving grep options like -r to find files when there's a perfectly good standard Unix tool for that with an extremely obvious name, find, was a bad idea as it leads to confusing code and confused users..

Keep it simple and use find to find files and grep to g/re/p (Globally match a Regular Expression and Print the result) within the related text (the list of file names output by find and then the contents of those files), e.g.:

Print the list of files under this directory:

$ find . -type f
./bar.py
./build/lib/aaa/bar.py
./build/lib/aaa/foo.py
./foo.py
./rar/bar.py
./rar/foo.py

Remove the .py files under ./build/lib from the above output:

$ find . -type f | grep -v '^\./build/lib/.*\.py$'
./bar.py
./foo.py
./rar/bar.py
./rar/foo.py

Search for Hello within the remaining files:

$ find . -type f | grep -v '^\./build/lib/.*\.py$' | xargs grep -H 'Hello'
./bar.py:Hello
./foo.py:Hello
./rar/bar.py:Hello
./rar/foo.py:Hello
answered Aug 16 at 14:28
1
  • That assumes (the last command at least) that file paths don't contain blanks, newlines, single quotes, double quotes, backslashes, that they're made of valid text in the user's locale and the GNU implementation of grep (or compatible) for that -H. grep works on lines not file paths, find's output is not post-processable without -print0 and xargs is unreliable unless used with -0 (and you generally also want to use -r) Commented yesterday
1

The famous ripgrep is able to do this

$ rg -g'!/build/**/*.py' Hell /tmp/test/
/tmp/test/bar.py
1:Hello
/tmp/test/rar/bar.py
1:Hello
/tmp/test/rar/foo.py
1:Hello
/tmp/test/foo.py
1:Hello
$ rg -g'!/rar/**/*.py' Hell /tmp/test/
/tmp/test/bar.py
1:Hello
/tmp/test/foo.py
1:Hello
/tmp/test/build/lib/aaa/foo.py
1:Hello
/tmp/test/build/lib/aaa/bar.py
1:Hello

From it's man page:

 -g GLOB, --glob=GLOB
 Include or exclude files and directories for searching that match the
 given glob. This always overrides any other ignore logic. Multiple glob
 flags may be used. Globbing rules match .gitignore globs. Precede a
 glob with a ! to exclude it. If multiple globs match a file or
 directory, the glob given later in the command line takes precedence.
 As an extension, globs support specifying alternatives: "-g'"ab{c,d}*'
 is equivalent to "-g""abc""-g"abd. Empty alternatives like "-g'"ab{,c}'
 are not currently supported. Note that this syntax extension is also
 currently enabled in gitignore files, even though this syntax isn't
 supported by git itself. ripgrep may disable this syntax extension in
 gitignore files, but it will always remain available via the -g/--glob
 flag.
 When this flag is set, every file and directory is applied to it to
 test for a match. For example, if you only want to search in a
 particular directory foo, then "-g"foo is incorrect because foo/bar
 does not match the glob foo. Instead, you should use "-g'"foo/**'.
answered Aug 13 at 17:15
1
  • 1
    Worth noting that rg will by default ignore many files of its own like hidden ones or the ones git would ignore and more. Commented Aug 13 at 18:41

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.