The goal is to print the highest version and second highest version of the list of files. The version is based on the first three digit places of the version. For example, version 2.3.0.1 version is just 2.3.0 (ignoring the last digit).
Once it can print the highest version and the second highest version of the list, it should remove all other versions which will essentially clean up the folders in that location by only keeping the current and last version.
The EXAMPLE folders that I created are:
AAA_6.6.4.12.TEST AAA_7.6.4.12.TEST AAA_75.6.4.12.TEST AAA_75.7.4.12.TEST CCC_81.0.0.0.TEST CCC_81.2.0.0.TEST CCC_81.2.3.0.TEST DDD_1.0.0.0.TEST DDD_1.0.0.1.TEST DDD_1.0.0.6.TEST DDD_1.1.0.0.TEST DDD_2.0.0.0.TEST DDD_2.0.0.1.TEST DDD_2.0.0.3.TEST DDD_3.0.0.0.TEST
This is the array that I have to compute the highest and second highest version:
new_var=( $(for arr in "${var[@]}"
do
echo $arr
done | sort) )
for folder in *
do
if [[ $folder =~ ([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.[0-9]{1,3} ]]
then
if [[ "$new_var" < "${BASH_REMATCH[1]}.${BASH_REMATCH[2]}.${BASH_REMATCH[3]}" ]]
then
new_var="${BASH_REMATCH[1]}.${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
fi
else
echo "failed"
fi
done;
echo "The highest version is: $new_var"
echo "The second highest version is: ${new_var[-2]}"
This prints the highest version correctly. However, I don't know how to get the second highest version and I don't know how to go about removing the rest of the versions from the directory.
-
\$\begingroup\$ Can you show the expected/actual output for your test input? To me, this appears to be a simple job for sort+head, if the number of components in the version number remains constant. \$\endgroup\$Toby Speight– Toby Speight2017年09月18日 12:03:33 +00:00Commented Sep 18, 2017 at 12:03
3 Answers 3
Unnecessary code
This part is completely unnecessary, you can safely delete it:
new_var=( $(for arr in "${var[@]}" do echo $arr done | sort) )
Possible bug
Since the compared terms are strings, this will be a lexical comparison:
[[ "$new_var" < "${BASH_REMATCH[1]}.${BASH_REMATCH[2]}.${BASH_REMATCH[3]}" ]]
That means that 9.0.0 will be higher than 81.2.3. If you want to make it a numeric comparison, then you need to rewrite it, comparing each term appropriately, which is trickier:
w1=${BASH_REMATCH[1]}
w2=${BASH_REMATCH[2]}
w3=${BASH_REMATCH[3]}
if ((w1 > v1 || w1 == v1 && w2 > v2 || w1 == v1 && w2 == v2 && w3 > w3))
then
v1=$w1
v2=$w2
v3=$w3
fi
Next steps
For the steps you didn't implement yet, I recommend the following logic:
- Put the list of folder names into an array.
- Create a function that finds the index of the highest version, let's call it
hIndex
. Pass to the function the elements of the array. - Use the
hIndex
function to find the highest element, and then delete it. - Repeat the previous step. This will effectively remove the second highest element.
Something like this:
hIndex() {
local i=0 index v1 v2 v3 w1 w2 w3
for item; do
if [[ $item =~ ([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.[0-9]{1,3} ]]
then
w1=${BASH_REMATCH[1]}
w2=${BASH_REMATCH[2]}
w3=${BASH_REMATCH[3]}
if ((w1 > v1 || w1 == v1 && w2 > v2 || w1 == v1 && w2 >= v2 && w3 > w3))
then
v1=$w1
v2=$w2
v3=$w3
index=$i
fi
fi
((i++))
done
echo $index
}
index1=$(hIndex "${folders[@]}")
echo the highest is index=$index1, ${folders[$index1]}
folders[$index1]=dummy
index2=$(hIndex "${folders[@]}")
echo the highest is index=$index2, ${folders[$index2]}
unset folders[$index1]
unset folders[$index2]
Notice that instead of deleting the highest value, I replaced with a dummy value. I did it this way because deleting a value from a Bash array does not shift the remaining values. Although the hIndex "${folders[@]}"
will not see the deleted value, its index is still there, with a blank value, and the index returned by hIndex
may be incorrect. Filling the gap with a suitable placeholder is a bit lazy, but it's simple enough, and it works.
In the end, after the dummy values are removed with unset
,
you have the folder names in folders
except the highest and the 2nd highest. You could iterate over these elements to delete them.
-
\$\begingroup\$ The objective is NOT to delete the highest and second highest value, it is to KEEP them both and delete the rest of the versions. Essentially the highest version (current) and last version (second highest) will be the only ones kept. \$\endgroup\$user126270– user1262702016年12月21日 16:12:28 +00:00Commented Dec 21, 2016 at 16:12
-
\$\begingroup\$ @user126270 The goal here is to delete the highest and second highest values from the array. Then, you can iterate over the remaining elements and delete them in the filesystem. I modified the end part a bit to clarify. \$\endgroup\$janos– janos2016年12月21日 16:18:54 +00:00Commented Dec 21, 2016 at 16:18
I think your approach is too complicated and esoteric with the use of BASH_REMATCH
. Here goes my solution:
versions_to_keep=(
$(find -maxdepth 1 ! -path . -type d -printf "%f\n" |
sort -V -t '_' -k 2 | tail -n 2)
)
highest_version=${versions_to_keep[-1]}
second_highest_version=${versions_to_keep[-2]}
echo "The highest version is: ${highest_version:?}"
echo "The second highest version is: ${second_highest_version:?}"
find -maxdepth 1 ! -path . -type d ! -name "${highest_version}" \
! -name "${second_highest_version}" -exec rm -rf {} +
Explanation:
find -maxdepth 1 ! -path . -type d -printf "%f\n"
: find and print the basename of directories at most one level below and excluding the current directory.
.sort -V -t '_' -k 2
: do a version sort by field 2, fields delimited by_
.tail -n 2
: output the last 2 lines.${highest_version:?}
and${second_highest_version:?}
: if either of those variables is null or unset, print an error message and abort the script.find -maxdepth 1 ! -path . -type d ! -name "${highest_version}" ! -name "${second_highest_version}" -exec rm -rf {} +
: find directories at most one level below the current directory whose basename patterns match neither${highest_version}
nor${second_highest_version}
and delete them. If you only want to delete files or empty directories, you can replace-exec rm -rf {} +
with-delete
.
I do have a question though: wouldn't you want to keep the two highest versions per group of files, taking into account the parts preceding the version strings in the filenames? i.e. wouldn't the outcome below be more applicable to you?
(削除) AAA_6.6.4.12.TEST (削除ここまで)(削除) AAA_7.6.4.12.TEST (削除ここまで)AAA_75.6.4.12.TEST AAA_75.7.4.12.TEST(削除) CCC_81.0.0.0.TEST (削除ここまで)CCC_81.2.0.0.TEST CCC_81.2.3.0.TEST(削除) DDD_1.0.0.0.TEST (削除ここまで)(削除) DDD_1.0.0.1.TEST (削除ここまで)(削除) DDD_1.0.0.6.TEST (削除ここまで)(削除) DDD_1.1.0.0.TEST (削除ここまで)(削除) DDD_2.0.0.0.TEST (削除ここまで)(削除) DDD_2.0.0.1.TEST (削除ここまで)DDD_2.0.0.3.TEST DDD_3.0.0.0.TEST
Instead of doing this in pure Bash, I believe it's better to compose the solution from the standard tools (I'm going to assume the GNU tools for this answer).
We just need to extract the version numbers from the filename and sort them numerically, which we can do like this:
top_two()
{
printf '%s\n' "${@#*_}" |
cut -d . -f 1-3 |
sort -Vr |
head -n 2
}
Demo:
files=(
AAA_6.6.4.12.TEST
AAA_7.6.4.12.TEST
AAA_75.6.4.12.TEST
AAA_75.7.4.12.TEST
CCC_81.0.0.0.TEST
CCC_81.2.0.0.TEST
CCC_81.2.3.0.TEST
DDD_1.0.0.0.TEST
DDD_1.0.0.1.TEST
DDD_1.0.0.6.TEST
DDD_1.1.0.0.TEST
DDD_2.0.0.0.TEST
DDD_2.0.0.1.TEST
DDD_2.0.0.3.TEST
DDD_3.0.0.0.TEST
)
top_two "${files[@]}"
81.2.3
81.2.0
We can make it more robust by using null character instead of linefeed to delimit the filenames, and more useful by converting the result back into an array:
sorted_versions()
{
printf '%s0円' "${@#*_}" |
cut -z -d . -f 1-3 -s |
sort -z -Vr
}
readarray -d '' -t versions \
< <(sorted_versions "${files[@]}")
echo "The highest version is: ${versions[0]}"
echo "The second highest version is: ${versions[1]}"
The previous function returns the full array rather than just the two highest values. Given the intent to remove all versions except these two, the full array is useful, because it gives all the versions that should be cleaned:
cleanup "${versions[@]:2}"
You might want to retain the entire file name, rather than using ${#}
and cut
to trim to just the version part. That's easy enough, too, if we tell sort
to order by the part following _
:
sorted_versions()
{
printf '%s0円' "$@" |
sort -z -t _ -k 2Vr
}
-
\$\begingroup\$ Where is the
cleanup
function defined? Your solution suffers from the same problem as my old answer, so I'll repost my deleted comment here: try addingAAA_81.11.0.0.TEST
andDDD_81.2.10.1.TEST
to yourfiles
array and run it, these newly added files won't show up as the files with the highest and second highest versions by usingsort -n
. It's pretty hard to make a stupid and robust solution in POSIXly shell for this simple problem. \$\endgroup\$Gao– Gao2024年10月17日 17:37:09 +00:00Commented Oct 17, 2024 at 17:37 -
1\$\begingroup\$ @Gao, the
cleanup
is for OP to write - that's not included in the code for review. And thank you for spotting that I used-n
where I meant-V
- now fixed. \$\endgroup\$Toby Speight– Toby Speight2024年10月17日 18:02:03 +00:00Commented Oct 17, 2024 at 18:02