I know a module like this already exists on CPAN, but I wanted to write my own simplified version that accepts wildcards in the input.
I don't write many Perl modules so I figured I would post the code here and see if anyone could give me advice on how to make this code more efficient or user friendly.
I also included some POD documentation so you can use perldoc for usage information.
Please let me know what you think and leave some feedback on how this code could be improved!
package File::Copy::Recursive;
# Use statements
use File::Basename;
use File::Copy;
use File::Path qw(make_path remove_tree);
use strict;
use warnings;
# Link the version number with CVS
our ($VERSION) = (q$Revision: 1.1 $ =~ m{([0-9.]+)});
# Exporter
require Exporter;
our @ISA = qw(
Exporter
);
# Declare exportable routines
our @EXPORT_OK = qw(
recursive_copy rcopy
recursive_move rmove
);
# Aliases to subroutines
sub rcopy { recursive_copy(@_); }
sub rmove { recursive_move(@_); }
# Recursive copy function
sub recursive_copy {
my ($source, $destination) = @_;
for my $path (glob $source) {
if (-d $path) {
make_path($destination.'/'.basename($path));
recursive_copy($path.'/*', $destination.'/'.basename($path));
}
else {
copy($path, $destination);
}
}
}
# Recursive move function
sub recursive_move {
my ($source, $destination) = @_;
for my $path (glob $source) {
if (-d $path) {
make_path($destination.'/'.basename($path));
recursive_move($path.'/*', $destination.'/'.basename($path));
remove_tree($path);
}
else {
move($path, $destination);
}
}
}
1;
__END__
=head1 NAME
File::Copy::Recursive - Recursive copy and move functions.
=head1 VERSION
$Revision: 1.1 $
=head1 SYNOPSIS
use File::Copy::Recursive;
# Recursively copy a directory
recursive_copy('input', 'output');
# Recursively copy all contents of a directory
recursive_copy('input/*', 'output');
# Recursively move a directory
recursive_move('input', 'output');
# Recursively move all contents of a directory
recursive_move('input/*', 'output');
=head1 DESCRIPTION
File::Copy::Recursive is designed to provide simple functions to recursively copy and move files.
These functions will accept wildcards (*) in the input.
=head1 SUBROUTINES/METHODS
=over
=item C<recursive_copy> or C<rcopy>
Recursive copy function.
# Recursively copy a directory
recursive_copy('input', 'output');
# Recursively copy all contents of a directory
recursive_copy('input/*', 'output');
=item C<recursive_move> or <rmove>
Recursive move function.
# Recursively move a directory
recursive_move('input', 'output');
# Recursively move all contents of a directory
recursive_move('input/*', 'output');
=back
2 Answers 2
You've got a security vulnerability here just waiting to be exploited based on the fact you use the same interface for "The human" and "The Filesystem".
All you need is for somebody to create a file with:
open my $fh, '>', 'path/*'
And that will get glob-expansion during traversal.
Which means you DON'T want to use glob at a level other than the user-facing function call for the sake of convenience.
The user facing function should probably use use Text::Glob::glob_to_regex
so you can generate a "match rule" and re-use it for subsequent operations,
instead of relying on glob
to return a list of files.
When the function calls itself, it should use opendir
+ readdir
which is not subject to glob expansion.
Your code also assumes a lot about OS File Path semantics.
So for Filesystem manipulation, I would probably suggest lots of File::Spec
or Path::Tiny
, as they're far more battle tested against Path handling, and the later provides a few helper functions that will make tree traversal and iteration much more obvious. ( And importantly, much safer )
As for the Exporter call, this style is more recommended:
use Exporter 5.57 qw( import );
This simply creates the import
sub on your package instead of fussing with @ISA
and thus eliminates the need for Exporter
to be part of your inheritance tree.
If you want to support versions of Exporter
older that 5.57, this is equally effective and can be expected to work on a stock 5.6 perl:
use Exporter ();
*import = \&Exporter::import;
Example
The following code constructs a diabolical directory that will result in graph traversal happening not once, but several times.
#!/usr/bin/env perl
use strict;
use warnings;
## Creating an example "Evil" directory to traverse.
use Test::TempDir::Tiny qw( tempdir );
use Path::Tiny qw( path );
my $dir = tempdir();
# Problem directories
path( $dir, "evil/*" )->mkpath;
path( $dir, "evil/*/starfile" )->touch;
path( $dir, "evil/**" )->mkpath;
path( $dir, "evil/**/doublestarfile" )->touch;
# Normal Files
path( $dir, "evil/child/deep" )->mkpath;
path( $dir, "evil/child" )->mkpath;
path( $dir, "evil/child/a" )->touch;
path( $dir, "evil/child/b" )->touch;
path( $dir, "evil/child/deep/a" )->touch;
path( $dir, "evil/child/deep/b" )->touch;
## Evil directory now constructed, emulated simplified recursion
## using glob
our $DEPTH = 0;
our $PAD = '';
sub list_files_recursively {
my ( $path ) = @_;
STDERR->print("${PAD}?: $path\n");
local $DEPTH = $DEPTH + 1;
local $PAD = ' ' x $DEPTH;
my @out;
###### DONT DO THIS ######
for my $leaf ( glob $path ) {
###### DONT DO THIS ######
if ( -d $leaf ) {
STDERR->print("${PAD}d: $leaf\n");
push @out, list_files_recursively( $leaf . '/*' );
}
else {
STDERR->print("${PAD}f: $leaf\n");
push @out, $leaf;
}
}
return @out;
}
for my $entry ( list_files_recursively( "$dir/*" ) ) {
print $entry . "\n";
}
print "Expected layout:\n";
system("find",$dir,"-type", "f" );
Executing this code will reveal list_files_recursively "visits" the same files multiple times.
Full output here omitted for brevity
The most important part is to note what happens when it tries to list the children of the directory called **
d: /tmp/7LYkL6zdrB/test_pl/default_1/evil/**
?: /tmp/7LYkL6zdrB/test_pl/default_1/evil/**/*
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/**/doublestarfile
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/*/starfile
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/child/a
f: /tmp/7LYkL6zdrB/test_pl/default_1/evil/child/b
d: /tmp/7LYkL6zdrB/test_pl/default_1/evil/child/deep
Yes, that's thinking "Hey, it looks like 'child/deep' is a child of **
".
Which it really really isn't.
And there are probably plenty more glob rules that can be exploited here, *
is just the most dangerous.
For instance, glob interprets spaces as token delimiters, so glob "a b" -> 2 entries, not 1.
-
\$\begingroup\$ Couldn't someone just say
open my $fh, '>', 'path/*'
regardless of whether they're using this module or not? \$\endgroup\$tjwrona– tjwrona2016年04月26日 14:14:21 +00:00Commented Apr 26, 2016 at 14:14 -
\$\begingroup\$ That's not the problem. The problem is when the authors code finds a file that was created that way, and passes said filename into
glob
.my $filename = "path/*" # from the filesystem for my $filename ( glob $filename ) {
# Expanded here even though it wasn't a user parameter \$\endgroup\$Kent Fredric– Kent Fredric2016年04月27日 01:30:07 +00:00Commented Apr 27, 2016 at 1:30 -
\$\begingroup\$ I can try to write an example test case using your code if it helps understand, but it requires a bit of effort. \$\endgroup\$Kent Fredric– Kent Fredric2016年04月27日 01:31:05 +00:00Commented Apr 27, 2016 at 1:31
-
\$\begingroup\$ So basically the problem can occur if the file name itself contains an asterisk? Is that even possible? If it is that is good to know! \$\endgroup\$tjwrona– tjwrona2016年04月27日 01:40:53 +00:00Commented Apr 27, 2016 at 1:40
-
\$\begingroup\$ The point is "You're using glob for 2 different things". The first is for asking the user for a list of glob expressions which can map to multiple files. The second is where you use it internally as a cheat for directory iteration. The problem is the latter. Because filenames can contain arbitrary characters that are part of the glob syntax, any character literals in the literal filename that are also globs result in "glob expansion", so instead of "copying/moving file X", you're "copying/moving file(s) based on the expansion of the filename X". \$\endgroup\$Kent Fredric– Kent Fredric2016年04月27日 01:46:27 +00:00Commented Apr 27, 2016 at 1:46
If make_path
fails, then statements are still attempted. Perhaps ensure path creation succeeds, quickly failing rather than continuing to attempt I/O.
recursive_copy
, recursive_move
arguments are the same in the two sub
. Could consolidate those values, storing them in temporary variables used by either. String literals, concatenation would instead be in only one line of code.
Could consolidate the two sub
into one with the goal of reducing duplicate code, loop logic. Could use an argument providing the move, copy option, perhaps with copy as a reasonable default. Suggest ternary conditionals based on this argument or flag.
Explore related questions
See similar questions with these tags.
mv
/cp
functions with the-r
recursive flag set. \$\endgroup\$