list directory recursively, with subpath, and leaf nodes (files) first (for batch renaming part of filenames)?

Question 1

Before declaring this a duplicate, consider that I'd need this for a specific reason: batch renaming (or copying to a new name) of a tree structure that contains a common string in file and directory names. Here's an example (tried on Ubuntu 14.04, so GNU tools):

cd /tmp
mkdir myproj
mkdir -p myproj/myproj_AA/myproj_BB
touch myproj/myproj_AA/myproj_BB/myproj_CC.dat
mkdir myproj/myproj_AA/myproj_DD
touch myproj/myproj_AA/myproj_DD/myproj_EE.dat
mkdir -p myproj/myproj_XX/myproj_YY
touch myproj/myproj_XX/myproj_YY/myproj_ZZ.dat
mkdir -p myproj/myproj_XX/myproj_WW
touch myproj/myproj_XX/myproj_WW/myproj_QQ.dat
tree myproj # to visualise

This directory structure's tree looks like this:

myproj
├── myproj_AA
│  ├── myproj_BB
│  │  └── myproj_CC.dat
│  └── myproj_DD
│  └── myproj_EE.dat
└── myproj_XX
 ├── myproj_WW
 │  └── myproj_QQ.dat
 └── myproj_YY
 └── myproj_ZZ.dat
6 directories, 4 files

So, I'd want all the entries in myproj/, including myproj itself, renamed to myTESTproj instead of myproj (wherever it may occur as a name). So, first I need to obtain a listing with relative paths in respect to the current directory - and then I need to have it sorted such that the outermost children (I think this is equivalent to files with the longest relative pathnames, but not sure) are first (because if I rename/mv the directory first, and then try to rename a file in it, it will likely use the old dir name as first argument, and fail since the name is now changed).

I'm aware there is ls -R --group-directories-first myproj/ to use ls recursively and group directories first, but its output is like this:

$ ls -R --group-directories-first myproj/
myproj/:
myproj_AA myproj_XX
myproj/myproj_AA:
myproj_BB myproj_DD
myproj/myproj_AA/myproj_BB:
myproj_CC.dat
myproj/myproj_AA/myproj_DD:
myproj_EE.dat
myproj/myproj_XX:
myproj_WW myproj_YY
myproj/myproj_XX/myproj_WW:
myproj_QQ.dat
myproj/myproj_XX/myproj_YY:
myproj_ZZ.dat

... that is, it is not a plain list with subpaths, that I could easily feed to while read f; do ...

Closest I came to is to use find instead:

$ find myproj/
myproj/
myproj/myproj_AA
myproj/myproj_AA/myproj_DD
myproj/myproj_AA/myproj_DD/myproj_EE.dat
myproj/myproj_AA/myproj_BB
myproj/myproj_AA/myproj_BB/myproj_CC.dat
myproj/myproj_XX
myproj/myproj_XX/myproj_YY
myproj/myproj_XX/myproj_YY/myproj_ZZ.dat
myproj/myproj_XX/myproj_WW
myproj/myproj_XX/myproj_WW/myproj_QQ.dat

So, here I do have a plain list of subpaths, however it is sorted root node first towards leaf nodes - and I need leaf nodes first. And I'm trying stuff like find myproj/ | sort -n, but it seems to make no difference. So if I do something like:

$ find myproj/ | sort -n | while read f; do mv -v $f $(echo $f | sed 's/myproj/myTESTproj/g'); done
‘myproj/’ -> ‘myTESTproj/’
mv: cannot stat ‘myproj/myproj_AA’: No such file or directory
mv: cannot stat ‘myproj/myproj_AA/myproj_BB’: No such file or directory
mv: cannot stat ‘myproj/myproj_AA/myproj_BB/myproj_CC.dat’: No such file or directory
...

... then the intended recursive rename fails immediately, as the root node (directory) is renamed first, and thus all further references to it are invalid.

So, how can I obtain a proper recursive listing of a subdirectory with leaf nodes first, to use it in a batch rename like this?

Question 2

If you're aiming on just renaming, isn't it enough that the contents of each directory are processed before the directory itself, that is, you don't need all leaves (from all directories) first? find -depth does exactly that.

$ mkdir -p a/b c/d
$ find -depth
./a/b
./a
./c/d
./c
.

Then you could use find -exec and Bash to rename the files:

$ find -depth ! -name . -name "*myproj*" -execdir bash -c '
 for f; do mv "$f" "${f/myproj/myTESTproj}" ; done' bash {} +

Question 3

Thanks @ilkkachu : "you don't need all leaves (from all directories) first?" - you know what, turns out I don't really ; didn't think about that at all :) This answer works grep for the OP Q example; for my A example (where the root node might be the only instance having the "needle" substring) I get a mv: cannot move ‘./someotherdir’ to a subdirectory of itself, warning, but it does what it should. Thanks again!

Question 4

@sdaau, ah of course, the replacement doesn't change anything if the keyword isn't there, so it's useless to try to rename. Ok, we could add a test inside the shell snippet, or just add a filter on the find... (edited)

Question 5

If you have the Perl version of the rename command installed (sometimes known as prename) this will work for you

find myproj -depth -name '*myproj*' -exec rename -n 's!(.*)myproj!1ドルmyTESTproj!' {} +

The -depth option to find ensures that children in any directory are listed before the directory itself. The + suffix to the -exec action allows multiple {} insertions for a single invocation of the specified command. At the cost of reduced efficiency you can replace it with \;.

When you are sure it will do what you want, remove the -n or replace it with -v.

Question 6

Did you mean for the intervening directory names to be eaten by the (.*)? It doesn't work if the final part of the filename doesn't contain myproj, i.e. it would try to rename ./myproj/foo to ./myTESTproj/foo (you'll get errors)

Question 7

Thanks @roaima - I indeed have Perl rename; your answer works great for the OP Q example, for my A example (where the root node might be the only instance having the "needle" substring) I don't get any errors or warnings printed, but seeing (with -v) myproj/somespecdir renamed as myTESTproj/somespecdir printed before myproj renamed as myTESTproj doesn't look quite correct - and in the end, it actually seems to delete the entire myproj (or renamed myTESTproj) folder!? Still, good to keep this in mind, thanks!

Question 8

@ilkkachu oh I see what you mean. Fix on its way...

Question 9

@sdaau fixed for you

Question 10

I remembered what to look for once I posted the question - if the leaf nodes are those with the longest relative pathnames (which I'm not sure if it's true always, but seems to be in the OP example at least), then one simply needs a way to sort a list of strings by string length; unfortunately sort does not seem to have such an option.

But, I found https://stackoverflow.com/questions/5917576/sort-a-text-file-by-line-length-including-spaces - and from there, chose the perl solution:

$ find myproj/ | perl -e 'print sort { length($b) <=> length($a) } <>'
myproj/myproj_AA/myproj_DD/myproj_EE.dat
myproj/myproj_AA/myproj_BB/myproj_CC.dat
myproj/myproj_XX/myproj_YY/myproj_ZZ.dat
myproj/myproj_XX/myproj_WW/myproj_QQ.dat
myproj/myproj_AA/myproj_DD
myproj/myproj_AA/myproj_BB
myproj/myproj_XX/myproj_YY
myproj/myproj_XX/myproj_WW
myproj/myproj_AA
myproj/myproj_XX
myproj/

However, the trivial sed 's/myproj/myTESTproj/g' replacement does not work here either:

$ find myproj/ | perl -e 'print sort { length($b) <=> length($a) } <>' \
> | while read f; do mv -v $f $(echo $f | sed 's/myproj/myTESTproj/g'); done
‘myproj/myproj_AA/myproj_DD/myproj_EE.dat’ -> ‘myTESTproj/myTESTproj_AA/myTESTproj_DD/myTESTproj_EE.dat’
mv: cannot move ‘myproj/myproj_AA/myproj_DD/myproj_EE.dat’ to ‘myTESTproj/myTESTproj_AA/myTESTproj_DD/myTESTproj_EE.dat’: No such file or directory
...

... so we need a sed to replace only last match in a line, which is sed -E 's/(.*)myproj/1円myTESTproj/g':

$ find myproj/ | perl -e 'print sort { length($b) <=> length($a) } <>' \
| while read f; do mv -v $f $(echo $f | sed -E 's/(.*)myproj/1円myTESTproj/g'); done
‘myproj/myproj_AA/myproj_DD/myproj_EE.dat’ -> ‘myproj/myproj_AA/myproj_DD/myTESTproj_EE.dat’
‘myproj/myproj_AA/myproj_BB/myproj_CC.dat’ -> ‘myproj/myproj_AA/myproj_BB/myTESTproj_CC.dat’
‘myproj/myproj_XX/myproj_YY/myproj_ZZ.dat’ -> ‘myproj/myproj_XX/myproj_YY/myTESTproj_ZZ.dat’
‘myproj/myproj_XX/myproj_WW/myproj_QQ.dat’ -> ‘myproj/myproj_XX/myproj_WW/myTESTproj_QQ.dat’
‘myproj/myproj_AA/myproj_DD’ -> ‘myproj/myproj_AA/myTESTproj_DD’
‘myproj/myproj_AA/myproj_BB’ -> ‘myproj/myproj_AA/myTESTproj_BB’
‘myproj/myproj_XX/myproj_YY’ -> ‘myproj/myproj_XX/myTESTproj_YY’
‘myproj/myproj_XX/myproj_WW’ -> ‘myproj/myproj_XX/myTESTproj_WW’
‘myproj/myproj_AA’ -> ‘myproj/myTESTproj_AA’
‘myproj/myproj_XX’ -> ‘myproj/myTESTproj_XX’
‘myproj/’ -> ‘myTESTproj/’
$ tree myTESTproj/
myTESTproj/
├── myTESTproj_AA
│  ├── myTESTproj_BB
│  │  └── myTESTproj_CC.dat
│  └── myTESTproj_DD
│  └── myTESTproj_EE.dat
└── myTESTproj_XX
 ├── myTESTproj_WW
 │  └── myTESTproj_QQ.dat
 └── myTESTproj_YY
 └── myTESTproj_ZZ.dat
6 directories, 4 files

I guess this does what I want it to - but, I'm not sure if the assumption of longest pathname == leaf file node is always correct; and even if it is - is there an easier way of doing this?

EDIT: this definitely fails in a case of structure like this:

myproj/somespecdir/someotherdir/myproj_CC.dat
myproj/myproj_AA/myproj_DD/myproj_EE.dat
myproj/somespecdir/someotherdir
myproj/myproj_AA/myproj_DD
myproj/somespecdir
myproj/myproj_AA
myproj/

... that is, if the first occurrence of the substring to search for and replace in the renamed path is also the last (the only one); and it occurs in the list before a path that has multiple occurrences of the substring.

ilkkachu ilkkachu 148k16 gold badges268 silver badges440 bronze badges · Accepted Answer · 2017-12-06 20:35:30Z

2

If you're aiming on just renaming, isn't it enough that the contents of each directory are processed before the directory itself, that is, you don't need all leaves (from all directories) first? find -depth does exactly that.

$ mkdir -p a/b c/d
$ find -depth
./a/b
./a
./c/d
./c
.

Then you could use find -exec and Bash to rename the files:

$ find -depth ! -name . -name "*myproj*" -execdir bash -c '
 for f; do mv "$f" "${f/myproj/myTESTproj}" ; done' bash {} +

Share

Improve this answer

edited Dec 6, 2017 at 21:57

answered Dec 6, 2017 at 20:35

ilkkachu's user avatar

ilkkachu ilkkachu

148k16 gold badges268 silver badges440 bronze badges

2

Thanks @ilkkachu : "you don't need all leaves (from all directories) first?" - you know what, turns out I don't really ; didn't think about that at all :) This answer works grep for the OP Q example; for my A example (where the root node might be the only instance having the "needle" substring) I get a mv: cannot move ‘./someotherdir’ to a subdirectory of itself, warning, but it does what it should. Thanks again!

sdaau
– sdaau

2017年12月06日 21:23:14 +00:00
Commented Dec 6, 2017 at 21:23
1

@sdaau, ah of course, the replacement doesn't change anything if the keyword isn't there, so it's useless to try to rename. Ok, we could add a test inside the shell snippet, or just add a filter on the find... (edited)

ilkkachu
– ilkkachu

2017年12月06日 22:00:17 +00:00
Commented Dec 6, 2017 at 22:00

Add a comment |

Stack Exchange Network

list directory recursively, with subpath, and leaf nodes (files) first (for batch renaming part of filenames)?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

list directory recursively, with subpath, and leaf nodes (files) first (for batch renaming part of filenames)?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions