Calculating bytes of code

Question 1

I am new to Perl and I wrote this Perl script to calculate bytes of code and lines of code in a directory. I want to know if there are any improvements that can be made to it.

#!/usr/bin/perl
# codestats.pl
# A perl script to calculate the number of lines of code in a project
# and what type of lines those code are, such as comments, code, or
# newlines. This script is specific to C/C++, but it should work with any
# language that has the same commenting scheme, so it should probably work
# with java too.
#
# Henry J Schmale
# April 30, 2015
use strict;
use warnings;
my @srcext = ("*.cpp", "*.c", "*.h", "*.hpp", "*.ino", "*.cxx", "*.cc");
my $commLines = 0; # Lines that start with a comment symbol
my $bothLines = 0; # Lines that have a comment and code
my $newLines = 0; # Lines that are just whitespace
my $codeLines = 0; # Lines of code - lines that don't fit in another space
my $totLines = 0; # Total lines of code
my $srcBytes = 0; # Total Number of bytes in src code
my $fCount = 0; # Number of files read
my $files;
for($a = 0; $a < scalar(@srcext); $a++){
 $files .= `find . -name "$srcext[$a]"`;
}
my @inputs = split("\n", $files);
for($a = 0; $a < scalar(@inputs); $a++){
 my $prev = $totLines;
 countLines($inputs[$a]);
 printf("Read %d ln. In %s\n", $totLines - $prev, $inputs[$a]);
 $fCount++;
}
printResults();
# Count the lines in the given file
# The first param is the file to open for counting
sub countLines{
 my ($srcfile) = @_;
 open(FILE, "<$srcfile") or die "Couldn't open file: $!\n";
 my @lines = <FILE>;
 for($b = 0; $b < scalar(@lines); $b++){
 $srcBytes += length($lines[$b]);
 $totLines++;
 if($lines[$b] =~ /^\s$/){ # is only whitespace ==> newLine
 $newLines++;
 next;
 }
 if(($lines[$b] =~ /^\s*\/\//) || # comments only lines
 ($lines[$b] =~ /^\s*\/\*/) ||
 ($lines[$b] =~ /^\s*\*/)){
 $commLines++;
 next;
 }
 # code + comments
 if(($lines[$b] =~ /\/\//) ||
 ($lines[$b] =~ /\/\*.*\*\//)){
 $bothLines++;
 next;
 }
 $codeLines++;
 }
 close FILE;
}
sub calcPercent{
 return ($_[0] / $_[1]) * 100.0;
}
sub printResults{
 # print out the results
 printf("Read %d Files\n", $fCount);
 printf("Average Lines Per File: %d\n", $totLines / $fCount);
 printf("Code : %09d ln. %06.3f", $codeLines, calcPercent($codeLines, $totLines));
 print "%\n";
 printf("Comment : %09d ln. %06.3f", $commLines, calcPercent($commLines, $totLines));
 print "%\n";
 printf("Blank : %09d ln. %06.3f", $newLines, calcPercent($newLines, $totLines));
 print "%\n";
 printf("Both : %09d ln. %06.3f", $bothLines, calcPercent($bothLines, $totLines));
 print "%\n";
 printf("Total : %09d ln.\n", $totLines);
 printf("CodeSize: %09d bytes\n", $srcBytes);
}

Question 2

It's easy to write very ugly Perl code, but this is pretty nicely written, so congrats.

Using barewords for file handles like this is discouraged:

open(FILE, "<$srcfile") or die "Couldn't open file: $!\n";
my @lines = <FILE>;
# ...
close FILE;

The recommended way:

open(my $fh, "<$srcfile") or die "Couldn't open file: $!\n";
my @lines = <$fh>;
# ...
close $fh;

Some simplifications are possible. For example instead of this:

my @srcext = ("*.cpp", "*.c");

I find this way a lot easier to type:

my @srcext = qw/*.cpp *.c/;

Another, bigger simplification is in the loops. Instead of this:

my @lines = <FILE>;
for($b = 0; $b < scalar(@lines); $b++){
 $srcBytes += length($lines[$b]);
 $totLines++;
 if($lines[$b] =~ /^\s$/){ # is only whitespace ==> newLine
 $newLines++;
 next;
 }
 if(($lines[$b] =~ /^\s*\/\//) || # comments only lines
 ($lines[$b] =~ /^\s*\/\*/) ||
 ($lines[$b] =~ /^\s*\*/)){

You can iterate using for (<$fh>) { ... }, and instead of $lines[$b], benefit from the auto-variable $_, which can be omitted from $lines[$b] =~ /.../ statements, like this:

for (<$fh>) {
 $srcBytes += length($_);
 $totLines++;
 if(/^\s$/){ # is only whitespace ==> newLine
 $newLines++;
 next;
 }
 if(/^\s*\/\// || # comments only lines
 /^\s*\/\*/ ||
 /^\s*\*/){

Question 3

Regexes

Your /^\s$/ regex, intended to count the number of "blank" lines, only matches lines that are completely empty or that contain just a space or tab. If there are two spaces, it won't match.

Use m!regex! (or any other convenient delimiter) instead of /regex/ when your regex itself contains slashes, to avoid "leaning toothpick syndrome".

Of course, crude regex matching is not completely accurate. If a // appears within a literal string, for example, it would be misinterpreted as the start of a comment. Conversely, in multi-line /* comments */, only the first and last line are being recognized as comments.

`File::Find`

Instead of shelling out to find, use File::Find.

Data-directed programming

There is a proliferation of global variables, which countLines() manipulates as a side-effect, and which printResults() consumes implicitly. I recommend putting them in some kind of data structure so that they can be passed around easily. For example, using subtraction in

my $prev = $totLines;
countLines($inputs[$a]);
printf("Read %d ln. In %s\n", $totLines - $prev;

to retrieve information about the most recently processed file feels like a bit of a hack.

The countLines() function is a bit tedious, with multiple if statements. It would be more elegant to declare, up front, the regexes for all of the types of lines you are interested in. This technique is useful for counting the number of times a regex matches.

Printing

If you omit the parentheses when calling print, then omit them when calling printf as well.

Use printf to print the percent signs as well. The trick is to use %% to represent a literal percent character.

Suggested solution

use File::Find;
use strict;
use warnings;
my @src_exts = qw(.cpp .c .h .hpp .ino .cxx .cc);
my %categories = (
 'Comment' => qr!^\s*(?://|/\*|\*/\s*$)!, # Lines that start with a comment symbol
 'Blank' => qr!^\s*$!, # Lines that are just whitespace
 'Both' => qr!\S.*/[/*].*|\*/.*\S.*/!, # Lines that have a comment and code
 'Bytes' => qr!.!s, # Total bytes
 'Total' => qr!^!, # Total lines
);
sub count_lines {
 my ($srcfile) = @_;
 my %stats = map { $_ => 0 } keys %categories;
 open FILE, '<', $srcfile or die "Could not open file $srcfile: $!";
 for my $line (<FILE>) {
 while (my ($cat, $regex) = each %categories) {
 $stats{$cat} += scalar(() = $line =~ m/$regex/g);
 }
 }
 close FILE;
 return \%stats;
}
sub add_stats {
 my $sum = { %{shift()} };
 for my $file_stats (@_) {
 for my $cat (keys $file_stats) {
 $sum->{$cat} += $file_stats->{$cat};
 }
 }
 return $sum;
}
sub percent {
 return (shift() / shift()) * 100.0;
}
sub print_results {
 my (@file_stats) = @_;
 my $stats = add_stats(@file_stats);
 my $total_lines = $stats->{'Total'};
 my $code_lines = $total_lines - $stats->{'Comment'} - $stats->{'Blank'};
 printf "Read %d Files\n", scalar @file_stats;
 printf "Average Lines Per File: %d\n", $stats->{'Total'} / scalar @file_stats;
 printf "Code : %9d ln. %6.3f%%\n", $code_lines, percent($code_lines, $total_lines);
 printf "Comment : %9d ln. %6.3f%%\n", $stats->{'Comment'}, percent($stats->{'Comment'}, $total_lines);
 printf "Blank : %9d ln. %6.3f%%\n", $stats->{'Blank'}, percent($stats->{'Blank'}, $total_lines);
 printf "Both : %9d ln. %6.3f%%\n", $stats->{'Both'}, percent($stats->{'Both'}, $total_lines);
 printf "Total : %9d ln.\n", $total_lines;
 printf "CodeSize: %9d bytes\n", $stats->{'Bytes'};
}
my @stats = ();
File::Find::find({ wanted => sub {
 for my $ext (@src_exts) {
 if (/\Q$ext\E$/) {
 my $file_stats = count_lines($_);
 printf("Read %d ln. In %s\n", $file_stats->{'Total'}, $_);
 push @stats, $file_stats;
 }
 }
}}, '.');
print_results(@stats);

Question 4

I'll focus on important issues regarding your code,

Don't use $a and $b global variables as these are used by sort() and are exempt from strict check, nor it is recommended to use them as lexical (my) variables.

C style for loops can be error prone, and are more verbose than perl for(each), ie.

for my $i (0 .. $#srcext) {
 $files .= `find . -name "$srcext[$i]"`;
}

split() uses regex as delimiter even when you pass the string to it which is good to remember when using regex metachars, thus

my @inputs = split(/\n/, $files);

Globtypes are discouraged to use as filehandles as they are package globals and should be localized prior to usage (ie. local *FILE). Also three argument open() is better for security reasons, thus

open(my $file, "<", $srcfile) or die "$! $srcfile\n";

Since you're processing file line by line, it's better to read from it in the same way instead of reading whole file into memory at once (@lines), thus

while (my $line = <$file>) { .. }

janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2015-06-24 21:25:40Z

It's easy to write very ugly Perl code, but this is pretty nicely written, so congrats.

Using barewords for file handles like this is discouraged:

open(FILE, "<$srcfile") or die "Couldn't open file: $!\n";
my @lines = <FILE>;
# ...
close FILE;

The recommended way:

open(my $fh, "<$srcfile") or die "Couldn't open file: $!\n";
my @lines = <$fh>;
# ...
close $fh;

Some simplifications are possible. For example instead of this:

my @srcext = ("*.cpp", "*.c");

I find this way a lot easier to type:

my @srcext = qw/*.cpp *.c/;

Another, bigger simplification is in the loops. Instead of this:

my @lines = <FILE>;
for($b = 0; $b < scalar(@lines); $b++){
 $srcBytes += length($lines[$b]);
 $totLines++;
 if($lines[$b] =~ /^\s$/){ # is only whitespace ==> newLine
 $newLines++;
 next;
 }
 if(($lines[$b] =~ /^\s*\/\//) || # comments only lines
 ($lines[$b] =~ /^\s*\/\*/) ||
 ($lines[$b] =~ /^\s*\*/)){

You can iterate using for (<$fh>) { ... }, and instead of $lines[$b], benefit from the auto-variable $_, which can be omitted from $lines[$b] =~ /.../ statements, like this:

for (<$fh>) {
 $srcBytes += length($_);
 $totLines++;
 if(/^\s$/){ # is only whitespace ==> newLine
 $newLines++;
 next;
 }
 if(/^\s*\/\// || # comments only lines
 /^\s*\/\*/ ||
 /^\s*\*/){

Stack Exchange Network

Calculating bytes of code

3 Answers 3

Regexes

`File::Find`

Data-directed programming

Printing

Suggested solution

You must log in to answer this question.

Hot Network Questions

Calculating bytes of code

3 Answers 3

Regexes

File::Find

Data-directed programming

Printing

Suggested solution

You must log in to answer this question.

Related

Hot Network Questions

`File::Find`