I am new to Perl and I wrote this Perl script to calculate bytes of code and lines of code in a directory. I want to know if there are any improvements that can be made to it.
#!/usr/bin/perl
# codestats.pl
# A perl script to calculate the number of lines of code in a project
# and what type of lines those code are, such as comments, code, or
# newlines. This script is specific to C/C++, but it should work with any
# language that has the same commenting scheme, so it should probably work
# with java too.
#
# Henry J Schmale
# April 30, 2015
use strict;
use warnings;
my @srcext = ("*.cpp", "*.c", "*.h", "*.hpp", "*.ino", "*.cxx", "*.cc");
my $commLines = 0; # Lines that start with a comment symbol
my $bothLines = 0; # Lines that have a comment and code
my $newLines = 0; # Lines that are just whitespace
my $codeLines = 0; # Lines of code - lines that don't fit in another space
my $totLines = 0; # Total lines of code
my $srcBytes = 0; # Total Number of bytes in src code
my $fCount = 0; # Number of files read
my $files;
for($a = 0; $a < scalar(@srcext); $a++){
$files .= `find . -name "$srcext[$a]"`;
}
my @inputs = split("\n", $files);
for($a = 0; $a < scalar(@inputs); $a++){
my $prev = $totLines;
countLines($inputs[$a]);
printf("Read %d ln. In %s\n", $totLines - $prev, $inputs[$a]);
$fCount++;
}
printResults();
# Count the lines in the given file
# The first param is the file to open for counting
sub countLines{
my ($srcfile) = @_;
open(FILE, "<$srcfile") or die "Couldn't open file: $!\n";
my @lines = <FILE>;
for($b = 0; $b < scalar(@lines); $b++){
$srcBytes += length($lines[$b]);
$totLines++;
if($lines[$b] =~ /^\s$/){ # is only whitespace ==> newLine
$newLines++;
next;
}
if(($lines[$b] =~ /^\s*\/\//) || # comments only lines
($lines[$b] =~ /^\s*\/\*/) ||
($lines[$b] =~ /^\s*\*/)){
$commLines++;
next;
}
# code + comments
if(($lines[$b] =~ /\/\//) ||
($lines[$b] =~ /\/\*.*\*\//)){
$bothLines++;
next;
}
$codeLines++;
}
close FILE;
}
sub calcPercent{
return ($_[0] / $_[1]) * 100.0;
}
sub printResults{
# print out the results
printf("Read %d Files\n", $fCount);
printf("Average Lines Per File: %d\n", $totLines / $fCount);
printf("Code : %09d ln. %06.3f", $codeLines, calcPercent($codeLines, $totLines));
print "%\n";
printf("Comment : %09d ln. %06.3f", $commLines, calcPercent($commLines, $totLines));
print "%\n";
printf("Blank : %09d ln. %06.3f", $newLines, calcPercent($newLines, $totLines));
print "%\n";
printf("Both : %09d ln. %06.3f", $bothLines, calcPercent($bothLines, $totLines));
print "%\n";
printf("Total : %09d ln.\n", $totLines);
printf("CodeSize: %09d bytes\n", $srcBytes);
}
3 Answers 3
It's easy to write very ugly Perl code, but this is pretty nicely written, so congrats.
Using barewords for file handles like this is discouraged:
open(FILE, "<$srcfile") or die "Couldn't open file: $!\n"; my @lines = <FILE>; # ... close FILE;
The recommended way:
open(my $fh, "<$srcfile") or die "Couldn't open file: $!\n";
my @lines = <$fh>;
# ...
close $fh;
Some simplifications are possible. For example instead of this:
my @srcext = ("*.cpp", "*.c");
I find this way a lot easier to type:
my @srcext = qw/*.cpp *.c/;
Another, bigger simplification is in the loops. Instead of this:
my @lines = <FILE>; for($b = 0; $b < scalar(@lines); $b++){ $srcBytes += length($lines[$b]); $totLines++; if($lines[$b] =~ /^\s$/){ # is only whitespace ==> newLine $newLines++; next; } if(($lines[$b] =~ /^\s*\/\//) || # comments only lines ($lines[$b] =~ /^\s*\/\*/) || ($lines[$b] =~ /^\s*\*/)){
You can iterate using for (<$fh>) { ... },
and instead of $lines[$b], benefit from the auto-variable $_,
which can be omitted from $lines[$b] =~ /.../ statements,
like this:
for (<$fh>) {
$srcBytes += length($_);
$totLines++;
if(/^\s$/){ # is only whitespace ==> newLine
$newLines++;
next;
}
if(/^\s*\/\// || # comments only lines
/^\s*\/\*/ ||
/^\s*\*/){
Regexes
Your /^\s$/ regex, intended to count the number of "blank" lines, only matches lines that are completely empty or that contain just a space or tab. If there are two spaces, it won't match.
Use m!regex! (or any other convenient delimiter) instead of /regex/ when your regex itself contains slashes, to avoid "leaning toothpick syndrome".
Of course, crude regex matching is not completely accurate. If a // appears within a literal string, for example, it would be misinterpreted as the start of a comment. Conversely, in multi-line /* comments */, only the first and last line are being recognized as comments.
File::Find
Instead of shelling out to find, use File::Find.
Data-directed programming
There is a proliferation of global variables, which countLines() manipulates as a side-effect, and which printResults() consumes implicitly. I recommend putting them in some kind of data structure so that they can be passed around easily. For example, using subtraction in
my $prev = $totLines; countLines($inputs[$a]); printf("Read %d ln. In %s\n", $totLines - $prev;
to retrieve information about the most recently processed file feels like a bit of a hack.
The countLines() function is a bit tedious, with multiple if statements. It would be more elegant to declare, up front, the regexes for all of the types of lines you are interested in. This technique is useful for counting the number of times a regex matches.
Printing
If you omit the parentheses when calling print, then omit them when calling printf as well.
Use printf to print the percent signs as well. The trick is to use %% to represent a literal percent character.
Suggested solution
use File::Find;
use strict;
use warnings;
my @src_exts = qw(.cpp .c .h .hpp .ino .cxx .cc);
my %categories = (
'Comment' => qr!^\s*(?://|/\*|\*/\s*$)!, # Lines that start with a comment symbol
'Blank' => qr!^\s*$!, # Lines that are just whitespace
'Both' => qr!\S.*/[/*].*|\*/.*\S.*/!, # Lines that have a comment and code
'Bytes' => qr!.!s, # Total bytes
'Total' => qr!^!, # Total lines
);
sub count_lines {
my ($srcfile) = @_;
my %stats = map { $_ => 0 } keys %categories;
open FILE, '<', $srcfile or die "Could not open file $srcfile: $!";
for my $line (<FILE>) {
while (my ($cat, $regex) = each %categories) {
$stats{$cat} += scalar(() = $line =~ m/$regex/g);
}
}
close FILE;
return \%stats;
}
sub add_stats {
my $sum = { %{shift()} };
for my $file_stats (@_) {
for my $cat (keys $file_stats) {
$sum->{$cat} += $file_stats->{$cat};
}
}
return $sum;
}
sub percent {
return (shift() / shift()) * 100.0;
}
sub print_results {
my (@file_stats) = @_;
my $stats = add_stats(@file_stats);
my $total_lines = $stats->{'Total'};
my $code_lines = $total_lines - $stats->{'Comment'} - $stats->{'Blank'};
printf "Read %d Files\n", scalar @file_stats;
printf "Average Lines Per File: %d\n", $stats->{'Total'} / scalar @file_stats;
printf "Code : %9d ln. %6.3f%%\n", $code_lines, percent($code_lines, $total_lines);
printf "Comment : %9d ln. %6.3f%%\n", $stats->{'Comment'}, percent($stats->{'Comment'}, $total_lines);
printf "Blank : %9d ln. %6.3f%%\n", $stats->{'Blank'}, percent($stats->{'Blank'}, $total_lines);
printf "Both : %9d ln. %6.3f%%\n", $stats->{'Both'}, percent($stats->{'Both'}, $total_lines);
printf "Total : %9d ln.\n", $total_lines;
printf "CodeSize: %9d bytes\n", $stats->{'Bytes'};
}
my @stats = ();
File::Find::find({ wanted => sub {
for my $ext (@src_exts) {
if (/\Q$ext\E$/) {
my $file_stats = count_lines($_);
printf("Read %d ln. In %s\n", $file_stats->{'Total'}, $_);
push @stats, $file_stats;
}
}
}}, '.');
print_results(@stats);
I'll focus on important issues regarding your code,
Don't use $a and $b global variables as these are used by sort() and are exempt from strict check, nor it is recommended to use them as lexical (my) variables.
C style for loops can be error prone, and are more verbose than perl for(each), ie.
for my $i (0 .. $#srcext) {
$files .= `find . -name "$srcext[$i]"`;
}
split() uses regex as delimiter even when you pass the string to it which is good to remember when using regex metachars, thus
my @inputs = split(/\n/, $files);
Globtypes are discouraged to use as filehandles as they are package globals and should be localized prior to usage (ie. local *FILE).
Also three argument open() is better for security reasons, thus
open(my $file, "<", $srcfile) or die "$! $srcfile\n";
Since you're processing file line by line, it's better to read from it in the same way instead of reading whole file into memory at once (@lines), thus
while (my $line = <$file>) { .. }