I setup a directory structure as follows:
mkdir -p /tmp/test/build/lib/aaa/
cd /tmp/test
mkdir rar
echo "Hello" > foo.py
echo "Hello" > bar.py
echo "Hello" > rar/foo.py
echo "Hello" > rar/bar.py
echo "Hello" > build/lib/aaa/foo.py
echo "Hello" > build/lib/aaa/bar.py
I wish to exclude any py files under build/lib, ideally using the glob build/lib/**/*.py
. I do not wish to --exclude-dir build/lib
as there are potential collisions with non-python projects, and because python files may be nested further down.
If I try and do this with --exclude in grep, it fails:
grep -r "Hello" --exclude foo.py
# Note: --exclude manual suggests excluding any file in any subdir matching `foo.py`
# is excluded as expected behavior, which is NOT how globbing should work.
# > bar.py:Hello
# > rar/bar.py:Hello
# > build/lib/aaa/bar.py:Hello
grep -r "Hello" --exclude build/lib/*/foo.py
# > bar.py:Hello
# > foo.py:Hello
# > rar/bar.py:Hello
# > rar/foo.py:Hello
# > build/lib/aaa/bar.py:Hello
# > build/lib/aaa/foo.py:Hello
grep -r "Hello" --exclude build/lib/aaa/foo.py
# > bar.py:Hello
# > foo.py:Hello
# > rar/bar.py:Hello
# > rar/foo.py:Hello
# > build/lib/aaa/bar.py:Hello
# > build/lib/aaa/foo.py:Hello
Am I doing something wrong or is this a bug?
3 Answers 3
You can't with grep
alone. From info -- grep --exclude
on a GNU system (emphasis mine):
--exclude=GLOB
Skip any command-line file with a name suffix that matches the pattern GLOB, using wildcard matching; a name suffix is either the whole name, or a trailing part that starts with a non-slash character immediately after a slash (‘/’) in the name. When searching recursively, skip any subfile whose base name matches GLOB; the base name is the part after the last slash. A pattern can use ‘*’, ‘?’, and ‘[’...‘]’ as wildcards, and ‘\’ to quote a wildcard or backslash character literally.
Same applies to --exclude-dir
, which can only exclude directories based on their base name (here lib
), not their full path relative to the starting point (build/lib
).
Use find
as you would with grep
implementations that don't support those -r
/--exclude
non-standard GNU extensions.
To skip the whole of build/lib
altogether (not even descend into it):
find . -path ./build/lib -prune -o -type f -size +4c \
-exec grep Hello /dev/null {} +
Or if it's only the *.py
files in there you want to exclude:
find . ! -path './build/lib/*.py' -type f -size +4c \
-exec grep Hello /dev/null {} +
Or:
find . -path './build/lib/*.py' -prune -o -type f -size +4c \
-exec grep Hello /dev/null {} +
To also exclude ./build/lib/dir.py/any/file/underneath/that/misleadingly-named/directory
.
/dev/null
is to make sure the file path is printed even if there's only one of them passed; with GNU grep
or compatible, you can use the -H
option instead.
-size +4c
as an optimisation to skip those files with 4 or fewer bytes which can't possibly contain Hello
. Will likely not gain much if any, so can be omitted, but shows that using the right tool for the task (find
to find files, grep
to grep them) allows you to be more thorough in your filter criteria.
-path pattern
does a fnmatch()
pattern matching against each file path while **/
is a zsh globbing operator (now supported by some other shells though often not by default). Both fnmatch("./build/lib/*.py", "./build/lib/file.py", flags)
and fnmatch("./build/lib/*.py", "./build/lib/dir/file.py", flags)
will return true as *
matches any sequence of characters, /
included. fnmatch("./build/lib/**/*.py", "./build/lib/file.py", flags)
where **
is interpreted the same as *
would return false. If you wanted to only exclude .py
files in build/lib
and not those in subdirs, you'd need ! '(' -path './build/lib/*.py' ! -path './build/lib/*/*.py' ')'
or use the -regex
non-standard extension of some find
implementations: ! -regex '\./build/lib/[^/]*\.py'
.
In zsh, you could do:
set -o extendedglob # best in ~/.zshrc
grep -- Hello /dev/null **/*~build/lib/*.py(D.L+4)
But note that:
- you may run into the
execve()
limit on the size of arguments (though thezargs
function could be used to work around that). - the list of files is sorted. If you don't care about the order, you can add the
oN
qualifier to skip that sorting. - it makes it easier to skip hidden files: just remove the
D
qualifier. - it avoids a
./
prefix being added to file paths (which is also why we need that--
; you can change the glob to./**/*~./build...
if you do want that./
prefix). - with some
find
/fnmatch()
implementations,! -path './build/lib/*.py'
may fail to skip some.py
files if part of their path cannot be decoded as valid text in the user's locale.zsh
globs don't have that issue.
The GNU folks giving grep
options like -r
to find files when there's a perfectly good standard Unix tool for that with an extremely obvious name, find
, was a bad idea as it leads to confusing code and confused users..
Keep it simple and use find
to find files and grep
to g/re/p (Globally match a Regular Expression and Print the result) within the related text (the list of file names output by find
and then the contents of those files), e.g.:
Print the list of files under this directory:
$ find . -type f
./bar.py
./build/lib/aaa/bar.py
./build/lib/aaa/foo.py
./foo.py
./rar/bar.py
./rar/foo.py
Remove the .py
files under ./build/lib
from the above output:
$ find . -type f | grep -v '^\./build/lib/.*\.py$'
./bar.py
./foo.py
./rar/bar.py
./rar/foo.py
Search for Hello
within the remaining files:
$ find . -type f | grep -v '^\./build/lib/.*\.py$' | xargs grep -H 'Hello'
./bar.py:Hello
./foo.py:Hello
./rar/bar.py:Hello
./rar/foo.py:Hello
-
That assumes (the last command at least) that file paths don't contain blanks, newlines, single quotes, double quotes, backslashes, that they're made of valid text in the user's locale and the GNU implementation of
grep
(or compatible) for that-H
.grep
works on lines not file paths,find
's output is not post-processable without-print0
andxargs
is unreliable unless used with-0
(and you generally also want to use-r
)Stéphane Chazelas– Stéphane Chazelas2025年09月09日 07:58:04 +00:00Commented yesterday
The famous ripgrep is able to do this
$ rg -g'!/build/**/*.py' Hell /tmp/test/
/tmp/test/bar.py
1:Hello
/tmp/test/rar/bar.py
1:Hello
/tmp/test/rar/foo.py
1:Hello
/tmp/test/foo.py
1:Hello
$ rg -g'!/rar/**/*.py' Hell /tmp/test/
/tmp/test/bar.py
1:Hello
/tmp/test/foo.py
1:Hello
/tmp/test/build/lib/aaa/foo.py
1:Hello
/tmp/test/build/lib/aaa/bar.py
1:Hello
From it's man page:
-g GLOB, --glob=GLOB
Include or exclude files and directories for searching that match the
given glob. This always overrides any other ignore logic. Multiple glob
flags may be used. Globbing rules match .gitignore globs. Precede a
glob with a ! to exclude it. If multiple globs match a file or
directory, the glob given later in the command line takes precedence.
As an extension, globs support specifying alternatives: "-g'"ab{c,d}*'
is equivalent to "-g""abc""-g"abd. Empty alternatives like "-g'"ab{,c}'
are not currently supported. Note that this syntax extension is also
currently enabled in gitignore files, even though this syntax isn't
supported by git itself. ripgrep may disable this syntax extension in
gitignore files, but it will always remain available via the -g/--glob
flag.
When this flag is set, every file and directory is applied to it to
test for a match. For example, if you only want to search in a
particular directory foo, then "-g"foo is incorrect because foo/bar
does not match the glob foo. Instead, you should use "-g'"foo/**'.
-
1Worth noting that
rg
will by default ignore many files of its own like hidden ones or the onesgit
would ignore and more.Stéphane Chazelas– Stéphane Chazelas2025年08月13日 18:41:32 +00:00Commented Aug 13 at 18:41
find
.