Deleting most recent files by parsing filename

Question 1

I have hundreds of .mp3 files in a single directory of the same naming format, title_YYYY-MM-DD.mp3, with maybe 30 different titles. Here is an example two different titles:

vision_am_2015年08月04日.mp3
vision_am_2015年08月03日.mp3
vision_am_2015年07月31日.mp3
vision_am_2015年07月30日.mp3
lum_pro_2015年08月04日.mp3
lum_pro_2015年08月03日.mp3
lum_pro_2015年08月01日.mp3
lum_pro_2015年07月31日.mp3
lum_pro_2015年07月30日.mp3
lum_pro_2015年07月29日.mp3
lum_pro_2015年07月28日.mp3
lum_pro_2015年07月27日.mp3

I need to keep X number of most recent files for each title. I figured that since the date format is YYYY-MM-DD, after building a data structure for the files, I can make sure the files are sorted in descending order. Then iterate through them. Then safely delete with confidence each file after the Xth iteration.

Here is my idea:

my $num_to_keep = 2; # or get from @ARGV
$num_to_keep = $num_to_keep - 1;
my $dir = "/home/mp3files";
opendir my $DH, "$dir" or die "$! not open";
my $dateRE = qr/\d{4}-\d{2}-\d{2}/;
my $fileRE = qr/^.+_$dateRE\.mp3$/; # only mp3s
my @files = sort grep {/$fileRE/ && -f "$dir/$_"} readdir($DH);
close $DH;
my %hash = ();
for my $file (reverse @files) {
 my ($fname) = $file =~ m/(.*)?_$dateRE/;
 push(@{ $hash{$fname} }, $file);
}
for my $fname (sort keys %hash) {
 my @files = @{$hash{$fname}};
 print "\n\n\nFILE: $fname<<\n";
 for my $i (0..$#files) {
 if ($i > $num_to_keep) {
 unlink "$dir/$files[$i]"; 
 }else{
 print "\t\t\t\tI will keep this file $files[$i]\n";
 }
 }
}

This is working as I expected, but since I am using this to delete large numbers of files regularly, I would like an expert take on this. I do not want to accidentally delete wrong files. Plus, I am interested in any general improvements or more elegant solutions.

Question 2

You can reduce number of loops, sorts, and matches, so this should perform faster,

my $num_to_keep = 2;
my $dir = "/home/mp3files";
opendir my $DH, $dir or die "$! $dir";
# only mp3s
my $fileRE = qr/(.+) _ (\d{4}-\d{2}-\d{2}) \.mp3$/x;
my %count;
# files to delete
my @files = map {
 my $basename = $_->[1];
 (++$count{$basename} > $num_to_keep) ? $_->[0] : ();
}
sort {
 $b->[2] cmp $a->[2] # sort descending by date
}
map {
 my @match = /$fileRE/;
 (@match && -f "$dir/$_") ? ["$dir/$_", @match] : ();
}
readdir($DH);
close $DH;
unlink(@files);

Question 3

map { (bool) ? EXP : () } acts as grep when bool is false as () is empty list.

Question 4

very slick, This usage of map and sort is more advanced than I know, if I can ask a few questions... I would think that because of what is being returned in the first map: ["$dir/$_", @match] that @files ends up being an array of arrays with index 0 as the path and file name and index 1 being the the matched date and extension? But why is the sort using index 2? I would think that is out of range.

Question 5

actually never mind, I see now that the regex pattern has two sets of parens. I will play with this a bit. Thanks

Question 6

@BryanK you can also my @files = map{..} readdir($DH); @files = sort {..} @files; if that is easier to read. After each step you can use Data::Dumper; print Dumper \@files; to inspect what values are in the array.

mpapec mpapec 1,2266 silver badges11 bronze badges · Accepted Answer · 2015-08-05 10:20:17Z

4

\$\begingroup\$

You can reduce number of loops, sorts, and matches, so this should perform faster,

my $num_to_keep = 2;
my $dir = "/home/mp3files";
opendir my $DH, $dir or die "$! $dir";
# only mp3s
my $fileRE = qr/(.+) _ (\d{4}-\d{2}-\d{2}) \.mp3$/x;
my %count;
# files to delete
my @files = map {
 my $basename = $_->[1];
 (++$count{$basename} > $num_to_keep) ? $_->[0] : ();
}
sort {
 $b->[2] cmp $a->[2] # sort descending by date
}
map {
 my @match = /$fileRE/;
 (@match && -f "$dir/$_") ? ["$dir/$_", @match] : ();
}
readdir($DH);
close $DH;
unlink(@files);

Share

answered Aug 5, 2015 at 10:20

mpapec's user avatar

mpapec mpapec

1,2266 silver badges11 bronze badges

\$\endgroup\$

4

\$\begingroup\$ map { (bool) ? EXP : () } acts as grep when bool is false as () is empty list. \$\endgroup\$

mpapec
– mpapec

2015年08月05日 10:32:08 +00:00
Commented Aug 5, 2015 at 10:32
\$\begingroup\$ very slick, This usage of map and sort is more advanced than I know, if I can ask a few questions... I would think that because of what is being returned in the first map: ["$dir/$_", @match] that @files ends up being an array of arrays with index 0 as the path and file name and index 1 being the the matched date and extension? But why is the sort using index 2? I would think that is out of range. \$\endgroup\$

BryanK
– BryanK

2015年08月05日 22:05:29 +00:00
Commented Aug 5, 2015 at 22:05
1

\$\begingroup\$ actually never mind, I see now that the regex pattern has two sets of parens. I will play with this a bit. Thanks \$\endgroup\$

BryanK
– BryanK

2015年08月05日 22:13:27 +00:00
Commented Aug 5, 2015 at 22:13
1

\$\begingroup\$ @BryanK you can also my @files = map{..} readdir($DH); @files = sort {..} @files; if that is easier to read. After each step you can use Data::Dumper; print Dumper \@files; to inspect what values are in the array. \$\endgroup\$

mpapec
– mpapec

2015年08月06日 08:03:32 +00:00
Commented Aug 6, 2015 at 8:03

Add a comment |

Stack Exchange Network

Deleting most recent files by parsing filename

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Deleting most recent files by parsing filename

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions