I have hundreds of .mp3 files in a single directory of the same naming format, title_YYYY-MM-DD.mp3
, with maybe 30 different title
s.
Here is an example two different title
s:
vision_am_2015年08月04日.mp3 vision_am_2015年08月03日.mp3 vision_am_2015年07月31日.mp3 vision_am_2015年07月30日.mp3 lum_pro_2015年08月04日.mp3 lum_pro_2015年08月03日.mp3 lum_pro_2015年08月01日.mp3 lum_pro_2015年07月31日.mp3 lum_pro_2015年07月30日.mp3 lum_pro_2015年07月29日.mp3 lum_pro_2015年07月28日.mp3 lum_pro_2015年07月27日.mp3
I need to keep X
number of most recent files for each title
. I figured that since the date format is YYYY-MM-DD
, after building a data structure for the files, I can make sure the files are sorted in descending order. Then iterate through them. Then safely delete with confidence each file after the X
th iteration.
Here is my idea:
my $num_to_keep = 2; # or get from @ARGV
$num_to_keep = $num_to_keep - 1;
my $dir = "/home/mp3files";
opendir my $DH, "$dir" or die "$! not open";
my $dateRE = qr/\d{4}-\d{2}-\d{2}/;
my $fileRE = qr/^.+_$dateRE\.mp3$/; # only mp3s
my @files = sort grep {/$fileRE/ && -f "$dir/$_"} readdir($DH);
close $DH;
my %hash = ();
for my $file (reverse @files) {
my ($fname) = $file =~ m/(.*)?_$dateRE/;
push(@{ $hash{$fname} }, $file);
}
for my $fname (sort keys %hash) {
my @files = @{$hash{$fname}};
print "\n\n\nFILE: $fname<<\n";
for my $i (0..$#files) {
if ($i > $num_to_keep) {
unlink "$dir/$files[$i]";
}else{
print "\t\t\t\tI will keep this file $files[$i]\n";
}
}
}
This is working as I expected, but since I am using this to delete large numbers of files regularly, I would like an expert take on this. I do not want to accidentally delete wrong files. Plus, I am interested in any general improvements or more elegant solutions.
1 Answer 1
You can reduce number of loops, sorts, and matches, so this should perform faster,
my $num_to_keep = 2;
my $dir = "/home/mp3files";
opendir my $DH, $dir or die "$! $dir";
# only mp3s
my $fileRE = qr/(.+) _ (\d{4}-\d{2}-\d{2}) \.mp3$/x;
my %count;
# files to delete
my @files = map {
my $basename = $_->[1];
(++$count{$basename} > $num_to_keep) ? $_->[0] : ();
}
sort {
$b->[2] cmp $a->[2] # sort descending by date
}
map {
my @match = /$fileRE/;
(@match && -f "$dir/$_") ? ["$dir/$_", @match] : ();
}
readdir($DH);
close $DH;
unlink(@files);
-
\$\begingroup\$
map { (bool) ? EXP : () }
acts asgrep
whenbool
isfalse
as()
is empty list. \$\endgroup\$mpapec– mpapec2015年08月05日 10:32:08 +00:00Commented Aug 5, 2015 at 10:32 -
\$\begingroup\$ very slick, This usage of
map
andsort
is more advanced than I know, if I can ask a few questions... I would think that because of what is being returned in the firstmap
:["$dir/$_", @match]
that@files
ends up being an array of arrays with index0
as the path and file name and index1
being the the matched date and extension? But why is thesort
using index2
? I would think that is out of range. \$\endgroup\$BryanK– BryanK2015年08月05日 22:05:29 +00:00Commented Aug 5, 2015 at 22:05 -
1\$\begingroup\$ actually never mind, I see now that the regex pattern has two sets of parens. I will play with this a bit. Thanks \$\endgroup\$BryanK– BryanK2015年08月05日 22:13:27 +00:00Commented Aug 5, 2015 at 22:13
-
1\$\begingroup\$ @BryanK you can also
my @files = map{..} readdir($DH); @files = sort {..} @files;
if that is easier to read. After each step you canuse Data::Dumper; print Dumper \@files;
to inspect what values are in the array. \$\endgroup\$mpapec– mpapec2015年08月06日 08:03:32 +00:00Commented Aug 6, 2015 at 8:03