Before declaring this a duplicate, consider that I'd need this for a specific reason: batch renaming (or copying to a new name) of a tree structure that contains a common string in file and directory names. Here's an example (tried on Ubuntu 14.04, so GNU tools):
cd /tmp
mkdir myproj
mkdir -p myproj/myproj_AA/myproj_BB
touch myproj/myproj_AA/myproj_BB/myproj_CC.dat
mkdir myproj/myproj_AA/myproj_DD
touch myproj/myproj_AA/myproj_DD/myproj_EE.dat
mkdir -p myproj/myproj_XX/myproj_YY
touch myproj/myproj_XX/myproj_YY/myproj_ZZ.dat
mkdir -p myproj/myproj_XX/myproj_WW
touch myproj/myproj_XX/myproj_WW/myproj_QQ.dat
tree myproj # to visualise
This directory structure's tree
looks like this:
myproj
├── myproj_AA
│ ├── myproj_BB
│ │ └── myproj_CC.dat
│ └── myproj_DD
│ └── myproj_EE.dat
└── myproj_XX
├── myproj_WW
│ └── myproj_QQ.dat
└── myproj_YY
└── myproj_ZZ.dat
6 directories, 4 files
So, I'd want all the entries in myproj/
, including myproj
itself, renamed to myTESTproj
instead of myproj
(wherever it may occur as a name). So, first I need to obtain a listing with relative paths in respect to the current directory - and then I need to have it sorted such that the outermost children (I think this is equivalent to files with the longest relative pathnames, but not sure) are first (because if I rename/mv the directory first, and then try to rename a file in it, it will likely use the old dir name as first argument, and fail since the name is now changed).
I'm aware there is ls -R --group-directories-first myproj/
to use ls
recursively and group directories first, but its output is like this:
$ ls -R --group-directories-first myproj/
myproj/:
myproj_AA myproj_XX
myproj/myproj_AA:
myproj_BB myproj_DD
myproj/myproj_AA/myproj_BB:
myproj_CC.dat
myproj/myproj_AA/myproj_DD:
myproj_EE.dat
myproj/myproj_XX:
myproj_WW myproj_YY
myproj/myproj_XX/myproj_WW:
myproj_QQ.dat
myproj/myproj_XX/myproj_YY:
myproj_ZZ.dat
... that is, it is not a plain list with subpaths, that I could easily feed to while read f; do ...
Closest I came to is to use find
instead:
$ find myproj/
myproj/
myproj/myproj_AA
myproj/myproj_AA/myproj_DD
myproj/myproj_AA/myproj_DD/myproj_EE.dat
myproj/myproj_AA/myproj_BB
myproj/myproj_AA/myproj_BB/myproj_CC.dat
myproj/myproj_XX
myproj/myproj_XX/myproj_YY
myproj/myproj_XX/myproj_YY/myproj_ZZ.dat
myproj/myproj_XX/myproj_WW
myproj/myproj_XX/myproj_WW/myproj_QQ.dat
So, here I do have a plain list of subpaths, however it is sorted root node first towards leaf nodes - and I need leaf nodes first. And I'm trying stuff like find myproj/ | sort -n
, but it seems to make no difference. So if I do something like:
$ find myproj/ | sort -n | while read f; do mv -v $f $(echo $f | sed 's/myproj/myTESTproj/g'); done
‘myproj/’ -> ‘myTESTproj/’
mv: cannot stat ‘myproj/myproj_AA’: No such file or directory
mv: cannot stat ‘myproj/myproj_AA/myproj_BB’: No such file or directory
mv: cannot stat ‘myproj/myproj_AA/myproj_BB/myproj_CC.dat’: No such file or directory
...
... then the intended recursive rename fails immediately, as the root node (directory) is renamed first, and thus all further references to it are invalid.
So, how can I obtain a proper recursive listing of a subdirectory with leaf nodes first, to use it in a batch rename like this?
3 Answers 3
If you're aiming on just renaming, isn't it enough that the contents of each directory are processed before the directory itself, that is, you don't need all leaves (from all directories) first? find -depth
does exactly that.
$ mkdir -p a/b c/d
$ find -depth
./a/b
./a
./c/d
./c
.
Then you could use find -exec
and Bash to rename the files:
$ find -depth ! -name . -name "*myproj*" -execdir bash -c '
for f; do mv "$f" "${f/myproj/myTESTproj}" ; done' bash {} +
-
Thanks @ilkkachu : "you don't need all leaves (from all directories) first?" - you know what, turns out I don't really ; didn't think about that at all
:)
This answer works grep for the OP Q example; for my A example (where the root node might be the only instance having the "needle" substring) I get amv: cannot move ‘./someotherdir’ to a subdirectory of itself,
warning, but it does what it should. Thanks again!sdaau– sdaau2017年12月06日 21:23:14 +00:00Commented Dec 6, 2017 at 21:23 -
1@sdaau, ah of course, the replacement doesn't change anything if the keyword isn't there, so it's useless to try to rename. Ok, we could add a test inside the shell snippet, or just add a filter on the find... (edited)ilkkachu– ilkkachu2017年12月06日 22:00:17 +00:00Commented Dec 6, 2017 at 22:00
If you have the Perl version of the rename
command installed (sometimes known as prename
) this will work for you
find myproj -depth -name '*myproj*' -exec rename -n 's!(.*)myproj!1ドルmyTESTproj!' {} +
The -depth
option to find
ensures that children in any directory are listed before the directory itself. The +
suffix to the -exec
action allows multiple {}
insertions for a single invocation of the specified command. At the cost of reduced efficiency you can replace it with \;
.
When you are sure it will do what you want, remove the -n
or replace it with -v
.
-
2Did you mean for the intervening directory names to be eaten by the
(.*)
? It doesn't work if the final part of the filename doesn't containmyproj
, i.e. it would try to rename./myproj/foo
to./myTESTproj/foo
(you'll get errors)ilkkachu– ilkkachu2017年12月06日 21:03:28 +00:00Commented Dec 6, 2017 at 21:03 -
Thanks @roaima - I indeed have Perl
rename
; your answer works great for the OP Q example, for my A example (where the root node might be the only instance having the "needle" substring) I don't get any errors or warnings printed, but seeing (with-v
)myproj/somespecdir renamed as myTESTproj/somespecdir
printed beforemyproj renamed as myTESTproj
doesn't look quite correct - and in the end, it actually seems to delete the entiremyproj
(or renamedmyTESTproj
) folder!? Still, good to keep this in mind, thanks!sdaau– sdaau2017年12月06日 21:27:28 +00:00Commented Dec 6, 2017 at 21:27 -
1@ilkkachu oh I see what you mean. Fix on its way...Chris Davies– Chris Davies2017年12月06日 21:35:17 +00:00Commented Dec 6, 2017 at 21:35
-
2@sdaau fixed for youChris Davies– Chris Davies2017年12月06日 21:37:59 +00:00Commented Dec 6, 2017 at 21:37
I remembered what to look for once I posted the question - if the leaf nodes are those with the longest relative pathnames (which I'm not sure if it's true always, but seems to be in the OP example at least), then one simply needs a way to sort a list of strings by string length; unfortunately sort
does not seem to have such an option.
But, I found https://stackoverflow.com/questions/5917576/sort-a-text-file-by-line-length-including-spaces - and from there, chose the perl
solution:
$ find myproj/ | perl -e 'print sort { length($b) <=> length($a) } <>'
myproj/myproj_AA/myproj_DD/myproj_EE.dat
myproj/myproj_AA/myproj_BB/myproj_CC.dat
myproj/myproj_XX/myproj_YY/myproj_ZZ.dat
myproj/myproj_XX/myproj_WW/myproj_QQ.dat
myproj/myproj_AA/myproj_DD
myproj/myproj_AA/myproj_BB
myproj/myproj_XX/myproj_YY
myproj/myproj_XX/myproj_WW
myproj/myproj_AA
myproj/myproj_XX
myproj/
However, the trivial sed 's/myproj/myTESTproj/g'
replacement does not work here either:
$ find myproj/ | perl -e 'print sort { length($b) <=> length($a) } <>' \
> | while read f; do mv -v $f $(echo $f | sed 's/myproj/myTESTproj/g'); done
‘myproj/myproj_AA/myproj_DD/myproj_EE.dat’ -> ‘myTESTproj/myTESTproj_AA/myTESTproj_DD/myTESTproj_EE.dat’
mv: cannot move ‘myproj/myproj_AA/myproj_DD/myproj_EE.dat’ to ‘myTESTproj/myTESTproj_AA/myTESTproj_DD/myTESTproj_EE.dat’: No such file or directory
...
... so we need a sed
to replace only last match in a line, which is sed -E 's/(.*)myproj/1円myTESTproj/g'
:
$ find myproj/ | perl -e 'print sort { length($b) <=> length($a) } <>' \
| while read f; do mv -v $f $(echo $f | sed -E 's/(.*)myproj/1円myTESTproj/g'); done
‘myproj/myproj_AA/myproj_DD/myproj_EE.dat’ -> ‘myproj/myproj_AA/myproj_DD/myTESTproj_EE.dat’
‘myproj/myproj_AA/myproj_BB/myproj_CC.dat’ -> ‘myproj/myproj_AA/myproj_BB/myTESTproj_CC.dat’
‘myproj/myproj_XX/myproj_YY/myproj_ZZ.dat’ -> ‘myproj/myproj_XX/myproj_YY/myTESTproj_ZZ.dat’
‘myproj/myproj_XX/myproj_WW/myproj_QQ.dat’ -> ‘myproj/myproj_XX/myproj_WW/myTESTproj_QQ.dat’
‘myproj/myproj_AA/myproj_DD’ -> ‘myproj/myproj_AA/myTESTproj_DD’
‘myproj/myproj_AA/myproj_BB’ -> ‘myproj/myproj_AA/myTESTproj_BB’
‘myproj/myproj_XX/myproj_YY’ -> ‘myproj/myproj_XX/myTESTproj_YY’
‘myproj/myproj_XX/myproj_WW’ -> ‘myproj/myproj_XX/myTESTproj_WW’
‘myproj/myproj_AA’ -> ‘myproj/myTESTproj_AA’
‘myproj/myproj_XX’ -> ‘myproj/myTESTproj_XX’
‘myproj/’ -> ‘myTESTproj/’
$ tree myTESTproj/
myTESTproj/
├── myTESTproj_AA
│ ├── myTESTproj_BB
│ │ └── myTESTproj_CC.dat
│ └── myTESTproj_DD
│ └── myTESTproj_EE.dat
└── myTESTproj_XX
├── myTESTproj_WW
│ └── myTESTproj_QQ.dat
└── myTESTproj_YY
└── myTESTproj_ZZ.dat
6 directories, 4 files
I guess this does what I want it to - but, I'm not sure if the assumption of longest pathname == leaf file node is always correct; and even if it is - is there an easier way of doing this?
EDIT: this definitely fails in a case of structure like this:
myproj/somespecdir/someotherdir/myproj_CC.dat
myproj/myproj_AA/myproj_DD/myproj_EE.dat
myproj/somespecdir/someotherdir
myproj/myproj_AA/myproj_DD
myproj/somespecdir
myproj/myproj_AA
myproj/
... that is, if the first occurrence of the substring to search for and replace in the renamed path is also the last (the only one); and it occurs in the list before a path that has multiple occurrences of the substring.