4
\$\begingroup\$

I have hundreds of .mp3 files in a single directory of the same naming format, title_YYYY-MM-DD.mp3, with maybe 30 different titles. Here is an example two different titles:

vision_am_2015年08月04日.mp3
vision_am_2015年08月03日.mp3
vision_am_2015年07月31日.mp3
vision_am_2015年07月30日.mp3
lum_pro_2015年08月04日.mp3
lum_pro_2015年08月03日.mp3
lum_pro_2015年08月01日.mp3
lum_pro_2015年07月31日.mp3
lum_pro_2015年07月30日.mp3
lum_pro_2015年07月29日.mp3
lum_pro_2015年07月28日.mp3
lum_pro_2015年07月27日.mp3

I need to keep X number of most recent files for each title. I figured that since the date format is YYYY-MM-DD, after building a data structure for the files, I can make sure the files are sorted in descending order. Then iterate through them. Then safely delete with confidence each file after the Xth iteration.

Here is my idea:

my $num_to_keep = 2; # or get from @ARGV
$num_to_keep = $num_to_keep - 1;
my $dir = "/home/mp3files";
opendir my $DH, "$dir" or die "$! not open";
my $dateRE = qr/\d{4}-\d{2}-\d{2}/;
my $fileRE = qr/^.+_$dateRE\.mp3$/; # only mp3s
my @files = sort grep {/$fileRE/ && -f "$dir/$_"} readdir($DH);
close $DH;
my %hash = ();
for my $file (reverse @files) {
 my ($fname) = $file =~ m/(.*)?_$dateRE/;
 push(@{ $hash{$fname} }, $file);
}
for my $fname (sort keys %hash) {
 my @files = @{$hash{$fname}};
 print "\n\n\nFILE: $fname<<\n";
 for my $i (0..$#files) {
 if ($i > $num_to_keep) {
 unlink "$dir/$files[$i]"; 
 }else{
 print "\t\t\t\tI will keep this file $files[$i]\n";
 }
 }
}

This is working as I expected, but since I am using this to delete large numbers of files regularly, I would like an expert take on this. I do not want to accidentally delete wrong files. Plus, I am interested in any general improvements or more elegant solutions.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Aug 4, 2015 at 19:52
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

You can reduce number of loops, sorts, and matches, so this should perform faster,

my $num_to_keep = 2;
my $dir = "/home/mp3files";
opendir my $DH, $dir or die "$! $dir";
# only mp3s
my $fileRE = qr/(.+) _ (\d{4}-\d{2}-\d{2}) \.mp3$/x;
my %count;
# files to delete
my @files = map {
 my $basename = $_->[1];
 (++$count{$basename} > $num_to_keep) ? $_->[0] : ();
}
sort {
 $b->[2] cmp $a->[2] # sort descending by date
}
map {
 my @match = /$fileRE/;
 (@match && -f "$dir/$_") ? ["$dir/$_", @match] : ();
}
readdir($DH);
close $DH;
unlink(@files);
answered Aug 5, 2015 at 10:20
\$\endgroup\$
4
  • \$\begingroup\$ map { (bool) ? EXP : () } acts as grep when bool is false as () is empty list. \$\endgroup\$ Commented Aug 5, 2015 at 10:32
  • \$\begingroup\$ very slick, This usage of map and sort is more advanced than I know, if I can ask a few questions... I would think that because of what is being returned in the first map: ["$dir/$_", @match] that @files ends up being an array of arrays with index 0 as the path and file name and index 1 being the the matched date and extension? But why is the sort using index 2? I would think that is out of range. \$\endgroup\$ Commented Aug 5, 2015 at 22:05
  • 1
    \$\begingroup\$ actually never mind, I see now that the regex pattern has two sets of parens. I will play with this a bit. Thanks \$\endgroup\$ Commented Aug 5, 2015 at 22:13
  • 1
    \$\begingroup\$ @BryanK you can also my @files = map{..} readdir($DH); @files = sort {..} @files; if that is easier to read. After each step you can use Data::Dumper; print Dumper \@files; to inspect what values are in the array. \$\endgroup\$ Commented Aug 6, 2015 at 8:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.