How to run a bash script only on directories that have no subdirectories

Question 1

I would like to process every directory in my parent directory but only on a condition that these directories have no subdirectories in them.

Right now I have a following directory structure:

Music
 Band_A
 Record_A1
 Record_A2
 Band_B
 Record_B1
 CD_1
 CD_2

And the following script

while read -r dir
do
 echo "$dir"
 /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log
done < <(find . -type d)

All it does is checks if every music file in a directory was encoded using MQA and logs its output like this:

1 MQA 44.1K 10 - Bury A Friend [MQA 24_44].flac

It works and creates the logs I want in directories like Record_A1 and CD_1.

But it also creates a lot of redundant files. For example, it creates a log in Band_A directory, containing output for all files in all subdirectories in Band_A directory or it creates a log in Band_B and then Record_B1, again containing output for all files in respective subdirectories.

So how can I run a script and generate the logs ONLY for those directories that have no nested directories?

EDIT: also I think this script processes every subdirectory as many times as it is nested inside the parent top most directory. Not that it is critical, but still not efficient.

Question 2

The basic idea would be to have go through all directories, test if there are subdirectories and if so, run a part of the script.

while read -r dir ; do
 subdircount=$(find "$dir" -maxdepth 1 -type d | wc -l)
 if [[ "$subdircount" -eq 1 ]] ; then
 echo "$dir"
 /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log
 fi
done < <(find . -type d)

Question 3

-maxdepth is not portable. Additionally pathnames with newlines will mislead wc -l (but in this case they will mislead read as well). Still with GNU find and reasonably sane directory tree (like in the OP's example) this will work. +1.

Question 4

You can use the following to find leaf directories:

find . -type d -links 2

The -links 2 options looks for files (directories in your case) that have exactly 2 hard links.

A directory has:

a link to itself
a link from the parent directory
a link from each subdirectory

So a directory without subdirectories will have 2 hard links, which is what you want.

Question 5

For some filesystems this is not true. E.g. in Btrfs every directory matches -links 1. OK, I'm not fully sure if every directory always matches this but I'm sure it's possible and very common in Btrfs. So you cannot use this approach to solve the OP's problem in general.

Question 6

That's a pretty neat trick in itself to find leaf nodes in a directory tree!

Question 7

This command

find . -name . -o -type d -print -prune

will generate non-empty output iff there is at least one directory in the current working directory (the core idea came from this answer). You can include it in your shell code as a test. Something like

if ( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] ); then
 ...
fi

where the subshell ( ) prevents cd from changing the current working directory of the main script. You could use find "$dir" ... without cd but what if $dir expands to something starting with -? Double dash works with cd, not with find. Well, your main find (the one in <()) starts in ., so one may think all pathnames generated by it will start with .. It seems not all implementations of find behave like this though, so it's still good to code defensively.

Alternatively you can build the test into the main find command (inside <()):

find . -type d -exec sh -c '
 cd -- "1ドル" && [ -z "$(find . -name . -o -type d -print -prune)" ]
' find-sh {} \; -print

find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?

Note this approach will run one additional sh per directory (with or without subdirectories), this is sub-optimal. I'm introducing it as a small step towards a bigger improvement, because piping to read is not the best way (if you must then it's good you use -r; but consider IFS= as well).

When piping find to read like you did, names containing newlines will make the code fail. A good practice is to run everything from within find, if possible. In your case it seems possible. The following code is standalone (not inside <()).

find . -type d -exec sh -c '
 cd -- "1ドル" && [ -z "$(find . -name . -o -type d -print -prune)" ] && do_something_more_with "1ドル"
' find-sh {} \;

The above still runs one sh per directory. Now it can be improved by:

find . -type d -exec sh -c '
 for dir do
 ( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] ) && do_something_more_with "$dir"
 done
' find-sh {} +

Here I used ... && do_something_more_with "$dir" but you can choose if ... then ... instead.

In your case do_something_more_with "$dir" will be

{ echo "$dir"; /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log; }

where { } group commands so the preceding && makes the entire group run conditionally.

Maybe instead of do_something_more_with "$dir" it's better to do_something_more_with . in the $dir directory. The relevant line will be:

( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] && do_something_more_with . )

But it's your decision. You can still echo "$dir" (or better printf "%s\n" "$dir"; note the entire shell code run by find -exec is single-quoted, so don't type printf '%s\n').

In case there are extremely many subdirectories in some directory, we really don't need the inner find to list them all. To tell whether the output is empty or not it's enough to break after the first line:

[ -z "$(find . -name . -o -type d -print -prune | head -n 1)" ]

(With head -c 1 we could break after the first character, but I think head -c is not portable.)

So the optimized code may look like this:

find . -type d -exec sh -c '
 for dir do
 ( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune | head -n 1)" ] ) \
 && { printf "%s\n" "$dir"; /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log; }
 done
' find-sh {} +

also I think this script processes every subdirectory as many times as it is nested inside the parent top most directory. Not that it is critical, but still not efficient.

Unable to reproduce. find . -type d prints every directory just once. Maybe MQA_identifier works recursively. If it does then it (but not strictly the script) will process subdirectories multiple times (with tee writing to a file on a different level in the directory tree each time).

Question 8

Maybe MQA_identifier works recursively - yes, you're absolutely right, it is the app itself that traverses the complete directory tree from every directory. And sorry for not accepting your excellent answer, I really wish I could accept both answers!

Ljm Dullaart 3,06311 silver badges20 bronze badges · Accepted Answer · 2021-02-03 09:22:19Z

2

The basic idea would be to have go through all directories, test if there are subdirectories and if so, run a part of the script.

while read -r dir ; do
 subdircount=$(find "$dir" -maxdepth 1 -type d | wc -l)
 if [[ "$subdircount" -eq 1 ]] ; then
 echo "$dir"
 /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log
 fi
done < <(find . -type d)

Share

Improve this answer

edited Feb 3, 2021 at 9:31

Kamil Maciorowski's user avatar

Kamil Maciorowski

83k25 gold badges169 silver badges264 bronze badges

answered Feb 3, 2021 at 9:22

Ljm Dullaart's user avatar

Ljm Dullaart

3,06311 silver badges20 bronze badges

1

-maxdepth is not portable. Additionally pathnames with newlines will mislead wc -l (but in this case they will mislead read as well). Still with GNU find and reasonably sane directory tree (like in the OP's example) this will work. +1.

Kamil Maciorowski
– Kamil Maciorowski

2021年02月03日 09:34:22 +00:00
Commented Feb 3, 2021 at 9:34

Add a comment |

Stack Exchange Network

How to run a bash script only on directories that have no subdirectories

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How to run a bash script only on directories that have no subdirectories

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions