I am wanting to do the following:
- Define an array of globs that specify a base collection of files to include in a process.
- Define an array of globs that specify files to exclude from that process. It doesn't matter to me if this array of globs specifies files not even in the above collection.
- Build an array of files (not globs) that takes all files specified by the include glob array with any file belonging to the exclude glob array removed.
I have been struggling with this. Just to show some explicit examples of progress and what I have attempted, I have tried something like:
# List all files to potentially include in the process
files_to_include=(
'utils/*.txt'
)
# List any files here that should be excluded from the above list
files_to_exclude=(
'*dont-use.txt'
'utils/README.md'
)
# Empty array of files
files=()
for file in ${files_to_exclude[@]}; do
temp=find $files_to_include -type f \( -name '*.txt' -and -not -name $file \)
files+=$temp
done
# I want this to be the total collection of files that I care about
echo ${files[@]}
Obviously, this for loop logic doesn't work, but it's at least something that's got me started, but I'm still struggling with the appropriate way to do this. (I also get weird permission denied messages only when trying to assign the output of find
to temp
that I don't know why they occur.)
I like find
because from what I understand, its performance is going to be much better than grep
. That is an actual concern here because there are a lot of files in my real use case. There are likely several different ways to do this, but I would like to have as little "magic" in my script as possible. So please help to make the script performant but also very understandable.
As far as I can tell, I need a process that expands all the globs in the include array, expands all the globs in the exclude array, and then subtracts the exclude from the include array. This is at a high-level though, and implementing this has been a challenge for me.
Thank you!
2 Answers 2
Looks like you want files_to_include
to be globs while files_to_exclude
should be just patterns as otherwise as a glob *dont-use.txt
would not generate (filename generation or pathname expansion being other names for globbing) a utils/whatev-dont-use.txt
so wouldn't exclude that file, and if utils/*.txt
was just a pattern, it would also match on utils/.git/foo/bar/.txt
for instance.
zsh
has a ~
exclude by pattern glob operator, so there, you could do
set -o extendedglob
globs_to_include=(
'utils/*.txt'
)
patterns_to_exclude=(
'*dont-use.txt'
'utils/README.md'
)
typeset -U files=(
$~^globs_to_include~(${(j[|])~patterns_to_exclude})(ND.)
)
Or without the need for extendedglob
, do the filtering of the patterns_to_exclude
afterwards using the ${array:#pattern}
parameter expansion operator:
typeset -U files=( $~^globs_to_include(N.) )
files=( ${files:#(${(j[|])~patterns_to_exclude})} )
If both arrays were meant to be patterns and you wanted to match them against the paths of every regular file in or below the current working directory, then that could be:
() {
files=( ${${(M)@:#(${(j[|])~patterns_to_include})}:#(${(j[|])~patterns_to_exclude})} )
} **/*(ND.)
Or in separate steps to make it more legible:
pattern_to_include="(${(j[|])patterns_to_include})"
pattern_to_exclude="(${(j[|])patterns_to_exclude})"
files=( **/*(ND.) )
files=( ${(M)files:#$~pattern_to_include} )
files=( ${files:#$~pattern_to_exclude} )
If they're both meant to be globs, that would just be:
typeset -U files_to_include=(
utils/*.txt(ND.)
)
typeset -U files_to_exclude=(
*don-use.txt(ND.)
utils/README.md(ND.)
)
files=( ${files_to_include:|files_to_exclude} )
using the ${A:|B}
array subtraction operator.
Explanation of some of the zsh-specific syntax in there:
array=( elements )
: array declaration, as copied by a few shells since including bash when it eventually added array support in 2.0. Similar to theset -A array -- elements
of the Korn shell.**/
: any level of directory for recursive globbing.extendedglob
option: needed for the~
operatortypeset -U array
: makes the array elements unique$~var
: makes the contents of$var
considered as a pattern$^array/more
: makes so that the expansion becomeselement1/more
element2/more
in csh-style{element1,element2}/more
fashion${(...)param}
those are parameter expansion flags.j[|]
toj
oin the elements of the array with|
.(ND.)
: those are glob qualifiers,N
to enable nullglob for that glob,D
dot globdot,.
to restrict to files of type regular.${array:#pattern}
to filter out the elements matching the pattern. With the(M)
flag, that becomes filter in.() { body; } args
: anonymous function being passed some arguments (available in the body in$@
aka$argv
and1ドル
,2ドル
... as in regular named functions).
Let the quoting work for you rather than against you. Don't quote globs but let the shell try to expand them. Do double-quote variables to prevent them being treated as globs. Do remember to put array specials involving @
in double quotes:
includes=( utils/*.txt )
excludes=( *dont-use.txt utils/README.md )
# Convert array to hash so we can easily index it
declare -A excludes_hash
for i in "${excludes[@]}"
do
excludes_hash["$i"]=1
done
# Build list of files
files=()
for i in "${includes[@]}"
do
[ -z "${excludes_hash[$i]}" ] && files+=("$i")
done
# Total collection of files that I care about
printf "%s\n" "${files[@]}"
You must log in to answer this question.
Explore related questions
See similar questions with these tags.
find . -type f \( -path "$include1" -o -path "$include2" ... \) ! \( -path "$exclude1" -o -path "$exclude2" \)
, though building the list of args tofind
is a pain (but there's posts on that on the site).grep
would be slow, though. You need to do the comparisons in each case, either infind
or ingrep
, so something likefind . -type f | grep -e ... -e ... | grep -v -e ... -e ...
shouldnt be too bad (barring issues with newlines in filenames and the fact thatgrep
takes regexes instead of globs)