I wanted to make a script that would parse a main file (with int main()
) look in its #include "..."
local headers, and if they were not in the current dir, then find those headers, then its source files and provided them as implementation in g++
. In other words, I wanted to have a script-helper, that would watch for dependencies. I think I made it, perl
was used. I would like to get some reviews:
#!/usr/bin/perl
use autodie;
use Cwd qw[getcwd abs_path];
use Getopt::Long qw[GetOptions];
use File::Find qw[find];
#global arrays
@src; #source files -> .cpp
@hed; #headers files -> .hpp
@dep; #dependencies -> .hpp + .cpp
$command;
GetOptions(
"s" => \$opt_s, #headers the same as source files
"h" => \$opt_h, #help message
"o=s" => \$opt_o, #output filename
"i=s" => \%opt_i, #dependencies
"debug" => \$opt_debug #output the command
) or die "command options\n";
if($opt_h){
print "usage: exe [-h][--debug][-s][-o output_file][-i dir=directory target=source]... sources...\n";
exit 1;
}
die "no args" if !($out=$ARGV[0]);
$out = $opt_o if $opt_o;
#-------------------------------------------------
sub diff {
my $file = shift;
$file = "$file.cpp";
open MAIN, $file;
opendir CWD, getcwd;
my @file_dep = map { /#include "([^"]+)"/ ? abs_path(1ドル) : () } <MAIN>;
my %local = map { abs_path($_) => 1 } grep { !/^\./ } readdir CWD;
#headers found in the main file
my @tmp;
for(@file_dep){
push @tmp, $_ if ! $local{$_};
}
@tmp = map {/.+\/(.+)/} @tmp;
#finding absolute path for those files
my @ret;
for my $i (@tmp){
find( sub {
return unless -f;
return unless /$i/;
push @ret, $File::Find::name;
}, '/home/shepherd/Desktop');
}
@ret = map { "$_.cpp" } map {/(.+)\./} @ret;
return \@ret;
}
sub dependencies{
my $dir=shift; my $target=shift;
my @ar, my %local;
#get full names of target files
find( sub {
return unless -f;
push @ar, $File::Find::name;
}, $dir);
%local = map { $_ => 1 } @ar;
#and compare them againts the file from MAIN
for(@{diff($target)}){
push @dep, $_ if $local{$_};
}
}
sub debug{
print "final output:\n$command\n\nDependencies:\n";
print "$_\n" for @dep;
exit 1;
}
#------------------------------------------------------
#providing source and headers
if($opt_s){
@src = map { "$_.cpp" } @ARGV;
@hed = map { !/$out/ and "$_.hpp" } @ARGV;
} else {
@src = map { !/_h/ and "$_.cpp"} @ARGV;
@hed = map { /_h/ and s/^(.+)_.+/1ドル/ and "$_.hpp" } @ARGV;
}
if(%opt_i){
my @dirs; my @targets;
for(keys %opt_i){
push @dirs, $opt_i{$_} if $_ eq "dir";
push @targets, $opt_i{$_} if $_ eq "target";
}
if(@dirs!=@targets){
print "you have to specify both target and directory. Not all targets have their directories\n";
exit -1;
}
my %h;
@h{@dirs} = @targets;
dependencies($_, $h{$_}) for keys %h;
$command = "g++ ";
$command .= "-I $_ " for keys %h;
$command .= "-o $out.out @hed @dep @src";
debug if $opt_debug;
system $command;
exec "./$out.out";
} else {
$command = "g++ -o $out.out @hed @src";
debug() if $opt_debug;
system $command;
exec "./$out.out";
}
Now an example:
$pwd
/home/user/Desktop/bin/2
$ls
main.cpp student2.cpp student2.hpp
Student2.cpp has some dependencies (it uses struct defined in student.cpp
and a function defined in grade.cpp
), with the script you can see what would it give you: (script is in /usr/local/bin/exe
)
$exe -h
usage: exe [-h][--debug][-s][-o output_file][-i dir=directory target=source]... sources...
$exe --debug -i target=student2 -i dir=/home/user/Desktop/bin/1 main student2
final output:
g++ -I /home/user/Desktop/bin/1 -o main.out /home/user/Desktop/bin/1/grade.cpp /home/user/Desktop/bin/1/student.cpp main.cpp student2.cpp
Dependencies:
/home/user/Desktop/bin/1/grade.cpp
/home/user/Desktop/bin/1/student.cpp
As you can see, the script found a dependencies in studen2.cpp
which were in another directory and included them to final command. You just have to specify the source files without extension (just file base names). In conclusion I just for each target file (which could have dependencies in its #include "dependecy.hpp"
source file), I provide a directory where the dependency (dependency=header+source[implementation]) is, that's it. All the rest does the script
1 Answer 1
It is not so easy to get a clear picture of what the program is doing and why it is doing what it is doing. I think adding more documentation and comments would help, and also trying to code in a way that is easy to read. That means using function and variable names carefully to enhance readability. Avoid using compact/clever constructs if they are not easy to read, instead prefer more verbose code if it can improve readability and maintainability.
It is not clear why you did not want to use make
or cmake
to
handle dependencies in a more efficient way.
Another issue is the purpose of the command line switches. It would
help to provide more documentation and background for their usage.
Automatic compilation of dependencies is
usually done with make
or cmake
. But this requires you to write a
Makefile
or a CMakeLists.txt
file that specify dependencies. Another
option that avoids this is to use g++ -MMD -MP -MF
as mentioned by
@MartinYork in the comments. Also note that make
and cmake
has the
added benefit of only recompiling the source files that have changed
(i.e. those that are newer than the target file). This can markedly
speed up compilation times for a large project. The Perl script on the
other hand, will recompile
every dependency into a single object each time whether some of the
dependencies has changed or not.
On the other hand, an advantage of using the Perl script can be to avoid writing the
Makefile
(though I would recommend learning to write a Makefile
or
a CMakeLists.txt
as it is the common way of doing it).
The script also automatically runs the executable file after compilation, though it
does not check if the compilation failed or not (if the compilation
fails it does
not make sense to run the executable).
Another advantage can be that it does not generate multiple .o
files
(as make
and cmake
does to to enable recompilation only of changed files).
The Perl script as you named exe
(I will rename it to exe.pl
for clarity) can
be used in many ways. From reading the source code, here is what I found:
Firstly, it can be used to compile specific files in the current directory (and then run the generated executable). For example:
$ exe.pl main student2
This will run g++ -o main.out main.cpp student2.cpp
. The -o
option
can be used to specify another name for the exe (but the suffix will
always be .out
):
$ exe.pl -o prog main student2
runs g++ -o prog.out main.cpp student2.cpp
. The -s
option can be
used to add headers to the compilation (though I could not see why this
is useful, as headers are commonly included from within a .cpp
file,
and therefore should be included automatically by the g++
preprocessor):
$ exe.pl -s main student2
runs g++ -o main.exe main.cpp student2.cpp student2.hpp
. Note that main.hpp
is not added. The script considers
the first filename on the command line (here main
) as the "main"
script, and the -s
option will not add a header file for the main
script. (Please consider clarify why this is done!)
Headers can still be added without using the -s
option by supplying names that matches "_h":
$ exe.pl main student2 student2_h
runs g++ -o main.exe main.cpp student2.cpp student2.hpp
. Next, the
the -i
switch is used to handle dependencies. A dependency is a .cpp
file
in another directory, let's call it DD, from the main directory, DM, where the script is
run from. If the dependency includes header files, the
script checks if the header files are located in DM, if so they are
excluded from the later compilation (please consider clarify why this is
done).
For example, consider DM=/home/user/Desktop/bin/2
. We see that DM is located in a
parent directory DT=/home/user/Desktop
which the script will use as the
top of the source tree. Then if for example the dependency directory
is DD=/home/user/Desktop/bin/1
and the dependency file is student.cpp
which contains an include
statement #include "grade.hpp"
, the script first checks if grade.hpp
already exists in DM. If it does, it is excluded from the later g++
compilation command (please consider explaining why it is done). Next,
the script tries to find
student.cpp
in DT or any of it sub directories recursivly using
File:Find
. If it finds the file (or more than one file) and it turns
out that the file is
in DD (and not some other directory in DT), it is assumed that there
also exists a .cpp
file with the same
name in DD and the absolute path of this .cpp
file is included in the
later g++
compilation command. Also, the absolute path of DD is added
as an include search path (-I
option) to the g++
command.
I would recommend that the motivation behind the above logic (which is not at all clear to me) be explained carefully in the source code as comments.
To summarize, the above example corresponds to the following command line:
$ exe.pl -i target=student -i dir=/home/user/Desktop/bin/1 main student2
and the script will then produce the following g++
command:
g++ -I /home/user/Desktop/bin/1 -o main.exe /home/user/Desktop/bin/1/student.cpp main.cpp student2.cpp
Logical issues
The -i option does not work with more than one pair of (target, dir)
Currently, the -i
option does not work for more than one target. For example,
for the command line:
$ exe.pl -i target=student2 -i dir=/home/user/Desktop/bin/1 -i target=student3 -i dir=/home/user/Desktop/bin/3
GetOptions()
will return for the hash %opt_i
corresponding to
the input parameters "i=s" => \%opt_i
the following hash
%opt_i = (target => "student3", dir => "/home/user/Desktop/bin/3")
Notice that the first target student2
is missing, this is because
both targets use the same hash key target
. To fix this, you can try use
arrays instead of hashes as parameters to GetOptions()
. For example:
"target=s" => \@opt_t,
"dir=s" => \@opt_d,
Dependencies in sub directories are not checked for
As mentioned above, the code tries to exclude dependencies that are
present in the main directory. But if a dependency is in a sub
directory of that directory it will not find it. This is due to the
usage of readdir()
:
my %local = map { abs_path($_) => 1 } grep { !/^\./ } readdir CWD;
Here, readdir()
will only return the files in CWD
, not those in
any sub directory below it.
Account for multiple versions of the same dependency file
Currently the code uses the file in the main directory if there are multiple versions of the same file name.
Let's say the dependency file /home/user/Desktop/bin/1/student.hpp
contains:
#include "grade.hpp"
and there exists two versions of the corresponding .cpp
file. One in the dependency
directory /home/user/Desktop/bin/1/
/home/user/Desktop/bin/1/grade.cpp
and one in the CWD (where the script is run from)
/home/user/Desktop/bin/2/grade.cpp
What is the correct file? The script should at least give a warning.
Not checking recursivly for dependencies
Let's say student.hpp
has a #include "grade.hpp"
and grade.hpp
has an
include #include "calc.hpp"
. Then, it will not find and compile calc.cpp
.
The _h
command line trick does not work correctly
The following code is used to check for header files on the command line:
@hed = map { /_h/ and s/^(.+)_.+/1ドル/ and "$_.hpp" } @ARGV;
Notice that the first regex /_h/
matches any file with a _h
anywhere in the filename, for example sah_handler
. I think you need
to add an end-of-string anchor to the regex: /_h$/
.
Matching of #include files name in a dependency file
The code uses
my @file_dep = map { /#include "([^"]+)"/ ? abs_path(1ドル) : () } <MAIN>;
to extract the dependencies from a dependency file. Note that this
requires that there is no space between #
and include
. But the
assumption is not correct, it is in fact allowed to have spaces there, for example
# include "student.hpp"
is a legal C++ include statement.
Language related issues
Use strict, warnings
It is recommended to include use strict; use warnings
at the top of
your program. This will help you catch errors at an early stage.
Try to limit the use of global variables
Extensive use of global variables makes it harder to reason about a program. It is crucial that a program is easy to read (and understand) in order to maintain and extend it effectively (at a later point). It also makes it easier to track down bugs.
Note that if you add use strict
at the top of the program, global
variable needs to be declared similar to lexical variables. You
declare a global variable with our
.
Old style open() and opendir()
Modern perl uses the three-argument form of open
and avoids global
bareword filehandle names. Instead use lexical filehandles. So instead
of this:
open MAIN, $file;
do this (assuming no autodie
):
open (my $MAIN, '<', $file) or die "could not open $file: $!";
See Three-arg open() from the book "Modern Perl" for more information.
Shebang
See this blog for more information.
Consider replacing #!/usr/bin/perl
with #!/usr/bin/env perl
Most systems have /usr/bin/env
. It will also allow your script to run if you e.g.have multiple perls
on your system. For example if you are using perlbrew
.
Clever use of map()
The code uses map
to produce very concise code, but such
code can be difficult to understand and make it harder to maintain
your code in the future.
Also note that returning false from the map {} code block like in
@src = map { !/_h/ and "$_.cpp"} @ARGV;
produces an empty string element in @src, if you want to not produce
an element you must return an empty list ()
instead of false:
@src = map { !/_h/ ? "$_.cpp" : () } @ARGV;
Use good descriptive names for the subs.
The sub diff()
is supposed to find dependency files that are not
present in the current directory. But the name diff()
does not
clarify what the sub is doing. On the other hand, the following name might be too verbose:
find_abs_path_of_dep_files_that_does_not_exist_in_curdir()
but it is at least easier to understand.
Use positive return values with exit
The exit code from a linux process is usually an integer between zero (indicating success) and 125, see this answer for more information.
Check the return value of system $command
You should check the return value from the system()
call for
g++
. The compilation may fail, and then the exit code will be
nonzero. In that case, there is no point in running the executable
after the compilation has finished.
Use say
instead of print
You can avoid typing a final newline character for print statements by
using say
instead of print
. The say
function was introduced in
perl 5.10, and is mad available by adding use v5.10
or use use feature qw(say)
to the top of your script.
Example code
Here is an example of how you can write the code, following some of the principles I discussed above. I use an object oriented approach to avoid passing too many variables around in the parameter lists of the subs. It also avoids using global variables.
#! /usr/bin/env perl
package Main;
use feature qw(say);
use strict;
use warnings;
use Cwd qw(getcwd);
use File::Spec;
use Getopt::Long ();
use POSIX ();
{ # <--- Introduce scope so lexical variables do not "leak" into the subs below..
my $self = Main->new( rundir => getcwd() );
$self->parse_command_line_options();
$self->parse_command_line_arguments();
$self->find_dependencies();
$self->compile();
$self->run();
}
# ---------------------------------------
# Methods, alphabetically
# ---------------------------------------
sub check_run_cmd_result {
my ( $self, $res ) = @_;
my $signal = $res & 0x7F;
if ( $res == -1 ) {
die "Failed to execute command: $!";
}
elsif ( $signal ) {
my $str;
if ( $signal == POSIX::SIGINT ) {
die "Aborted by user.";
}
else {
die sprintf(
"Command died with signal %d, %s coredump.",
$signal, ( $res & 128 ) ? 'with' : 'without'
);
}
}
else {
$res >>= 8;
die "Compilation failed.\n" if $res != 0;
}
}
sub compile {
my ( $self ) = @_;
my @command = ('g++');
push @command, ("-I", $_) for @{$self->{inc}};
push @command, "-o", "$self->{out}.out";
push @command, @{$self->{hed}}, @{$self->{deps}}, @{$self->{src}};
$self->debug( "@command" ) if $self->{opt_debug};
my $res = system @command;
$self->check_run_cmd_result( $res );
}
sub debug{
my ( $self, $cmd ) = @_;
say "final output:\n$cmd\n\nDependencies:";
say for @{$self->{dep}};
exit 1;
}
sub find_dependency {
my ( $self, $target, $dir ) = @_;
$target .= '.cpp';
my $fn = File::Spec->catfile($dir, $target);
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my @include_args = map { /^#\s*include\s*"([^"]+)"/ ? 1ドル : () } <$fh>;
close $fh;
my @deps;
for (@include_args) {
my $fn = File::Spec->catfile( $dir, $_ );
# TODO: In your program you checked if file also existed in
# $self->{rundir}, and excluded it if so. Do you really need to check that?
if (-e $fn) { # the file exists in target dir
my ($temp_fn, $ext) = remove_file_extension( $fn );
if (defined $ext) {
check_valid_header_file_extension( $ext, $fn );
push @deps, "$temp_fn.cpp";
# TODO: Here you could call $self->find_dependency() recursively
# on basename($temp_fn)
}
}
}
if (@deps) {
push @{$self->{deps}}, @deps;
push @{$self->{inc}}, $dir;
}
}
sub find_dependencies {
my ( $self ) = @_;
$self->{deps} = [];
$self->{inc} = [];
my $targets = $self->{opt_t};
my $dirs = $self->{opt_d};
for my $i (0..$#$targets) {
my $target = $targets->[$i];
my $dir = $dirs->[$i];
$self->find_dependency( $target, $dir );
}
}
sub parse_command_line_arguments {
my ( $self ) = @_;
check_that_name_does_not_contain_suffix($_) for @ARGV;
# TODO: Describe the purpose of -s option here!!
if($self->{opt_s}){
$self->{src} = [ map { "$_.cpp" } @ARGV ];
# NOTE: exclude header file for main program name ($self->{out})
# So if main program name is "main", we include main.cpp, but not main.hpp
# TODO: describe why it is excluded
$self->{hed} = [ map { !/^$self->{out}$/ ? "$_.hpp" : () } @ARGV];
}
else {
# TODO: Describe what is the purpose of "_h" here!!
$self->{src} = [ map { !/_h$/ ? "$_.cpp" : () } @ARGV ];
$self->{hed} = [ map { /^(.+)_h$/ ? "1ドル.hpp" : () } @ARGV ];
}
}
sub parse_command_line_options {
my ( $self ) = @_;
Getopt::Long::GetOptions(
"s" => \$self->{opt_s}, # headers the same as source files
"h" => \$self->{opt_h}, # help message
"o=s" => \$self->{opt_o}, # output filename
"target=s" => \@{$self->{opt_t}}, # target name for dependency
"dir=s" => \@{$self->{opt_d}}, # target dir for dependency
"debug" => \$self->{opt_debug} # output the generated command
) or die "Failed to parse options\n";
usage() if $self->{opt_h};
usage("Bad arguments") if @ARGV==0;
$self->{out} = $self->{opt_o} // $ARGV[0];
check_that_name_does_not_contain_suffix( $self->{out} );
$self->validate_target_and_dir_arrays();
}
sub run {
my ( $self ) = @_;
exec "./$self->{out}.out";
}
sub validate_target_and_dir_arrays {
my ( $self ) = @_;
my $target_len = scalar @{$self->{opt_t}};
my $dir_len = scalar @{$self->{opt_d}};
die "Number of targets is different from number of target dirs!\n"
if $target_len != $dir_len;
$_ = make_include_dir_name_absolute($_) for @{$self->{opt_d}};
}
#-----------------------------------------------
# Helper routines not dependent on $self
#-----------------------------------------------
sub check_that_name_does_not_contain_suffix {
my ($name) = @_;
if ($name =~ /\.(?:hpp|cpp)$/ ) {
die "Argument $name not accepted: Arguments should be without extension\n";
}
}
sub check_valid_header_file_extension {
my ( $ext, $fn ) = @_;
warn "Unknown header file extension '$ext' for file '$fn'"
if $ext !~ /^(?:hpp|h)/;
}
sub make_include_dir_name_absolute {
my ($path ) = @_;
if ( !File::Spec->file_name_is_absolute( $path )) {
warn "Warning: Converting include path '$path' to absolute path: \n";
$path = Cwd::abs_path( $path );
warn " $path\n";
}
return $path;
}
sub new {
my ( $class, %args ) = @_;
return bless \%args, $class;
}
sub remove_file_extension {
my ( $fn ) = @_;
if ( $fn =~ s/\.([^.]*)$//) {
return ($fn, 1ドル);
}
else {
warn "Missing file extension for file '$fn'";
return ($fn, undef);
}
}
sub usage {
say $_[0] if defined $_[0];
say "usage: exe.pl [-h][--debug][-s][-o output_file][[-dir=directory -target=source]] <main source> <other sources>...";
# TODO: Please add more explanation of the options here!!
exit 0;
}
-
\$\begingroup\$ Very thanks for reply. There are many idioms I did not even know perl is capable of. Helped a lot. \$\endgroup\$milanHrabos– milanHrabos2020年08月10日 21:16:11 +00:00Commented Aug 10, 2020 at 21:16
-
\$\begingroup\$ I think this is that kind of scripts, where perl is still beter then python, for these kind of tasks (finding files in system, calling system commands, regexes, etc.) \$\endgroup\$milanHrabos– milanHrabos2020年08月10日 21:25:09 +00:00Commented Aug 10, 2020 at 21:25
cmake
ormake
instead of rolling your own solution \$\endgroup\$> g++ -MMD -MP -MF <FileName>.d <FileName>.cpp; cat <FileName>.d
Normally you do this by adding appropriate definitions to your make file. \$\endgroup\$