I would like to process every directory in my parent directory but only on a condition that these directories have no subdirectories in them.
Right now I have a following directory structure:
Music
Band_A
Record_A1
Record_A2
Band_B
Record_B1
CD_1
CD_2
And the following script
while read -r dir
do
echo "$dir"
/Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log
done < <(find . -type d)
All it does is checks if every music file in a directory was encoded using MQA and logs its output like this:
1 MQA 44.1K 10 - Bury A Friend [MQA 24_44].flac
It works and creates the logs I want in directories like Record_A1 and CD_1.
But it also creates a lot of redundant files. For example, it creates a log in Band_A directory, containing output for all files in all subdirectories in Band_A directory or it creates a log in Band_B and then Record_B1, again containing output for all files in respective subdirectories.
So how can I run a script and generate the logs ONLY for those directories that have no nested directories?
EDIT: also I think this script processes every subdirectory as many times as it is nested inside the parent top most directory. Not that it is critical, but still not efficient.
3 Answers 3
The basic idea would be to have go through all directories, test if there are subdirectories and if so, run a part of the script.
while read -r dir ; do
subdircount=$(find "$dir" -maxdepth 1 -type d | wc -l)
if [[ "$subdircount" -eq 1 ]] ; then
echo "$dir"
/Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log
fi
done < <(find . -type d)
-
-maxdepthis not portable. Additionally pathnames with newlines will misleadwc -l(but in this case they will misleadreadas well). Still with GNUfindand reasonably sane directory tree (like in the OP's example) this will work. +1.Kamil Maciorowski– Kamil Maciorowski2021年02月03日 09:34:22 +00:00Commented Feb 3, 2021 at 9:34
You can use the following to find leaf directories:
find . -type d -links 2
The -links 2 options looks for files (directories in your case) that have exactly 2 hard links.
A directory has:
- a link to itself
- a link from the parent directory
- a link from each subdirectory
So a directory without subdirectories will have 2 hard links, which is what you want.
-
For some filesystems this is not true. E.g. in Btrfs every directory matches
-links 1. OK, I'm not fully sure if every directory always matches this but I'm sure it's possible and very common in Btrfs. So you cannot use this approach to solve the OP's problem in general.Kamil Maciorowski– Kamil Maciorowski2021年02月03日 09:29:07 +00:00Commented Feb 3, 2021 at 9:29 -
1That's a pretty neat trick in itself to find leaf nodes in a directory tree!ruslaniv– ruslaniv2021年02月03日 14:08:37 +00:00Commented Feb 3, 2021 at 14:08
This command
find . -name . -o -type d -print -prune
will generate non-empty output iff there is at least one directory in the current working directory (the core idea came from this answer). You can include it in your shell code as a test. Something like
if ( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] ); then
...
fi
where the subshell ( ) prevents cd from changing the current working directory of the main script. You could use find "$dir" ... without cd but what if $dir expands to something starting with -? Double dash works with cd, not with find. Well, your main find (the one in <()) starts in ., so one may think all pathnames generated by it will start with .. It seems not all implementations of find behave like this though, so it's still good to code defensively.
Alternatively you can build the test into the main find command (inside <()):
find . -type d -exec sh -c '
cd -- "1ドル" && [ -z "$(find . -name . -o -type d -print -prune)" ]
' find-sh {} \; -print
find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?
Note this approach will run one additional sh per directory (with or without subdirectories), this is sub-optimal. I'm introducing it as a small step towards a bigger improvement, because piping to read is not the best way (if you must then it's good you use -r; but consider IFS= as well).
When piping find to read like you did, names containing newlines will make the code fail. A good practice is to run everything from within find, if possible. In your case it seems possible. The following code is standalone (not inside <()).
find . -type d -exec sh -c '
cd -- "1ドル" && [ -z "$(find . -name . -o -type d -print -prune)" ] && do_something_more_with "1ドル"
' find-sh {} \;
The above still runs one sh per directory. Now it can be improved by:
find . -type d -exec sh -c '
for dir do
( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] ) && do_something_more_with "$dir"
done
' find-sh {} +
Here I used ... && do_something_more_with "$dir" but you can choose if ... then ... instead.
In your case do_something_more_with "$dir" will be
{ echo "$dir"; /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log; }
where { } group commands so the preceding && makes the entire group run conditionally.
Maybe instead of do_something_more_with "$dir" it's better to do_something_more_with . in the $dir directory. The relevant line will be:
( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] && do_something_more_with . )
But it's your decision. You can still echo "$dir" (or better printf "%s\n" "$dir"; note the entire shell code run by find -exec is single-quoted, so don't type printf '%s\n').
In case there are extremely many subdirectories in some directory, we really don't need the inner find to list them all. To tell whether the output is empty or not it's enough to break after the first line:
[ -z "$(find . -name . -o -type d -print -prune | head -n 1)" ]
(With head -c 1 we could break after the first character, but I think head -c is not portable.)
So the optimized code may look like this:
find . -type d -exec sh -c '
for dir do
( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune | head -n 1)" ] ) \
&& { printf "%s\n" "$dir"; /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log; }
done
' find-sh {} +
also I think this script processes every subdirectory as many times as it is nested inside the parent top most directory. Not that it is critical, but still not efficient.
Unable to reproduce. find . -type d prints every directory just once. Maybe MQA_identifier works recursively. If it does then it (but not strictly the script) will process subdirectories multiple times (with tee writing to a file on a different level in the directory tree each time).
-
Maybe MQA_identifier works recursively- yes, you're absolutely right, it is the app itself that traverses the complete directory tree from every directory. And sorry for not accepting your excellent answer, I really wish I could accept both answers!ruslaniv– ruslaniv2021年02月03日 14:32:56 +00:00Commented Feb 3, 2021 at 14:32