1
\$\begingroup\$

I have been working on this code for quite a while and just want to make sure it is up to good standards. I know many of you will have questions, so as they come up, I will edit my initial question to provide you answers. So any ways to make this code more concise would be great. Be gentle on me, I am only a beginner.

(If you are curious about this projects requirements or the source file I have inputted, if anyone knows how, I can perhaps maybe post a download link to the code or the assignment, let me know if you're interested!)

#!/usr/bin/perl -w
# Final project
use strict;
use warnings;
use diagnostics;
#opens txt file: read mode
open MYFILE, '<', 'source_file.txt' or die $!;
#opens output txt file: write mode
open OUT, '>', 'Summary_Report.txt' or die $!;
#open output txt file: write mode
#used to store header 'split' info
open OUTFILE, '>', 'Header.txt' or die $!;
my $i = 0;
$| = 1; #disable output buffering
my $start_time = undef; #undefined to avoid recycling through other time stamps
my $end_time;
my $user = 0;
my $pass = 0;
my $packet_size = 0; #goes with length#
my @header;
my @source_ip;
my @source_port;
my $src_port;
my @src_port;
my @dest_ip;
my @dest_port;
my $destination_port;
my @destination_port;
while (<MYFILE>) { #loops through every line in file
 chomp; #break new line
 #separate pieces of information from TCPDUMP into list
 @header = split (/\s+/, $_);
 print OUTFILE "$.: @header\n\n";
 if (/^\d+:\d+/) {
##############################T I M E###################################
 #defining first 'line & time' as 'special'
 if (/^22:28/ && !defined($start_time)) {
 $start_time = $header[0];
 #print "$start_time\n"; ###used as a check###
 } 
 #Used recycling of time stamps to find last one available
 if (/22:28/) {
 $end_time = $header[0];
 } 
############################S O U R C E##################################
 #categorizing each section of ip's from source
 @source_ip = split ('\.', $header[2]);
 #adding ip's together, joining in concatenation by '.'
 $source_ip[$i] = $source_ip[0] . '.' . $source_ip[1] . '.' . $source_ip[2] . '.' . $source_ip[3];
 #print $source_ip[$i]; (check)
 @source_port = split (':', $source_ip[4]);
 $src_port[$i] = $source_port[0];
#########################D E S T I N A T I O N###########################
 #categorizing each section of ip's from destination
 @dest_ip = split ('\.', $header[4]);
 #adding ip's together, joining in concatenation by '.'
 $dest_ip[$i] = $dest_ip[0] . '.' . $dest_ip[1] . '.' . $dest_ip[2] . '.' . $dest_ip[3];
 #print $dest_ip[$i]; (check)
 @dest_port = split (':', $dest_ip[4]);
 $destination_port[$i] = $dest_port[0];
#############################L E N G T H#################################
 #-1 represents length
 #transferring $header[-1] into new variable for ease of use
 $packet_size = $packet_size + $header[-1];
 #print $packet_size; (check)
 $i++;
 }
}
############################INTERESTING TEXT#############################
my @total_timesplit;
my @s_timesplit = split (':', $start_time);
#print @start_split; check
my @e_timesplit = split (':', $end_time);
#print @end_split; check
$total_timesplit[0] = $e_timesplit[0] - $s_timesplit[0];
$total_timesplit[1] = $e_timesplit[1] - $s_timesplit[1];
$total_timesplit[2] = $e_timesplit[2] - $s_timesplit[2];
#print @total_split; check
#yields average packet size
my $avg_length = $packet_size/$i;
#print $mean; check
##############################SOURCEIP###################################
my %ip_source;
my @ip_source;
%ip_source = map {$_, 1} @ip_source;
my @distinct_src_ip;
@distinct_src_ip = keys %ip_source;
my $sourceip_amt;
$sourceip_amt = @distinct_src_ip;
#print $sourceipnum;
#############################SOURCEPORT##################################
my %port_source;
my @port_source; 
%port_source = map {$_, 1} @port_source;
my @distinct_src_port;
@distinct_src_port = keys %port_source;
my $srcport_amt;
$srcport_amt = @distinct_src_port;
#print $srcportnum;
#############################DESTINATIONIP###############################
my %ip_dest;
my @ip_dest;
%ip_dest = map {$_, 1} @ip_dest;
my @distinct_dest_ip;
@distinct_dest_ip = keys %ip_dest;
my $destip_amt;
$destip_amt = @distinct_dest_ip;
#print $destipnum;
#############################DESTINATIONPORT#############################
my %port_dest;
my @port_dest;
%port_dest = map {$_, 1} @port_dest;
my @distinct_dest_port;
@distinct_dest_port = keys %port_dest;
my $destport_amt;
$destport_amt = @distinct_dest_port;
#print $destportnum;
close MYFILE;
#########################D A T A S E C T I O N###########################
open MYFILE, '<', 'source_file.txt' or die $!;
#I am separating loop to reset values#
while (<MYFILE>) { 
 #finds all instances of USER
 $user++ if /USER/ig;
 #print "user" (use as check)
 #finds all instances of PASS
 $pass++ if /PASS/ig;
 #print "pass" (use as check)
}
#Output summary to new file: overwrite file
print OUT "SUMMARY REPORT:\n\n";
print OUT "# of total lines in the file = $.\n\n\n";
print OUT "Range of time the file encompasses:\n\n
 Starting Time = $start_time\n\n
 Ending Time = $end_time\n\n
 Total Time = @total_timesplit\n\n\n";
print OUT "Total # of distinct SOURCE ip addresses = \n\n\n";
print OUT "Total # of distinct DESTINATION ip addresses = \n\n\n";
print OUT "Listing of distinct SOURCE ip addresses = \n\n\n";
print OUT "Listing of distinct DESTINATION ip addresses = \n\n\n";
print OUT "Total # of distinct SOURCE TCP ports = \n\n\n";
print OUT "Total # of distinct DESTINATION TCP ports = \n\n\n";
print OUT "Listing of distinct SOURCE TCP ports = \n\n\n";
print OUT "Listing of distinct DESTINATION TCP ports = \n\n\n";
print OUT "Total # of times phrases were used:\n\n
USER (variations thereof) = $user\n\n
PASS (variations thereof) = $pass\n\n\n";
print OUT "DETAIL SECTION:\n\n\n";
print OUT "SOURCE IP address activity by port over time:\n\n
 Mean packet size for above = \n\n
 Median packet size for above = \n\n\n";
print OUT "Detail IP address activity by port over time:\n\n
 Mean packet size for above = \n\n
 Median packet size for above = \n\n\n";
print OUT "Any and all interesting text w/in the DATA section of the file:\n\n
 Packet total = $packet_size\n\n
 Number of header lines = $i\n\n
 Average packet size = $avg_length\n\n";
close OUT; #[ ]
close OUTFILE; #[close remaining files ]
close MYFILE; #[ ]
my $j = 0;
if ($j == 0) { 
 sleep (1); #dramatic pause!
 print "\n\n File has successfully run. Please open your Local Disk C: folder to reveal\n 
 'Summary_Report' file\n\n";
}
asked Dec 11, 2013 at 23:03
\$\endgroup\$
1
  • \$\begingroup\$ Not a question, but I'd appreciate it if you could put the code's intent as the title. That would help us and help you attract more reviewers. \$\endgroup\$ Commented Dec 11, 2013 at 23:12

1 Answer 1

3
\$\begingroup\$

This got a bit longer than expected.

  • Don't use -w in the shebang line. it serves no purpose if you already use warnings.

  • Instead of doing error handling for open etc. yourself, you should use autodie. This is a Core module since perl 5.10.1.

  • Don't create "bareword filehandles", because those are actually global variables. Instead: open my $out, '>', 'some_file'. Now $out would be the filehandle which you can print to as you're used to: print $out "stuff" or better: print { $out } "stuff".

  • $| = 1 disables buffering for the currently selected filehandle (by default, STDOUT). Only do this if you need to, as it degrades performance. Also, you should restrict this to the tightest scope neccessary, e.g. if ($foo) { local $| = 1; print "stuff" }. But as you have commented out all your debug statements, there is no need for unbuffered output anyway.

  • Don't declare all your variables up front – declare them in the tightest scope possible. For example @header can be declared inside the loop.

  • split /\s+/, $_ is almost equivalent to the default behavior of split (which skips over leading whitespace). You probably want to use that instead: my @header = split;.

  • Instead of while (...) { ... if ($cond) { STUFF }}, write:

    while (...) {
     ...
     next if not $cond;
     STUFF;
    }
    

    this avoids an unnecessary level of indentation.

  • if (/^22:28/ && !defined($start_time)) { A } if (/22:28/) { B } – are you sure the second regex shouldn't be anchored at the beginning as well? In that case, the code could be simplified to:

    if (/^22:28/) {
     if (not defined $start_time) {
     A;
     }
     B;
    }
    
  • This line

    $source_ip[$i] = $source_ip[0] . '.' . $source_ip[1] . '.' . $source_ip[2] . '.' . $source_ip[3]
    

    would be better written as $source_ip[$i] = join '.', @source_ip[0 .. 3]. This uses an array slice and the join builtin.

  • I am not sure about this piece of code:

    $source_ip[$i] = $source_ip[0] . '.' . $source_ip[1] . '.' . $source_ip[2] . '.' . $source_ip[3];
    @source_port = split (':', $source_ip[4]);
    

    You write into the $i-entry of @source_ip, and $i starts with zero. This means that $source_ip[4] is likely undef, and should emit some warning. Are you sure you didn't mean $source_ip[4] = ...;? But why don't you use another scalar variable for that? my $ip_string = .... Actually, we don't need any extra variable here, and could just write:

    @source_port = split /:/, join '.', @source_ip[0 .. 3];
    
  • I am not sure what that line intends to do, I guess you are trying to remove the port part of an IP address? A regex would do a better job than splitting and concatenating strings in order to parse the input.

  • $packet_size = $packet_size + $header[-1] would usually be written as:

    $packet_size += $header[-1];
    
  • "Three or more, use a for":

    for my $i (0 .. 2) {
     $total_timesplit[$i] = $e_timesplit[$i] - $s_timesplit[$i];
    }
    
  • This is weird:

    my %ip_source;
    my @ip_source;
    %ip_source = map {$_, 1} @ip_source;
    

    Because @ip_source was just declared, it is empty. Thus %ip_source does not contain any keys. You can basically remove that code, as it does nothing.

  • To find unique elements in an array, you can use the trick with the hash. Instead of writing that code yourself every time, you can use List::Util qw< uniq >, and then find the unique elements like

    my @unique_items = uniq @items_with duplicates;
    
  • ... if /USER/ig – please do not use the /g flag in scalar context, as that generally turns the match into an iterator over the current string – but not in this special case, for more complex reasons (which I can explain if you're really interested). But most of the time, this is a bug.

  • Instead of multiple prints, of double-quoted strings,

    print OUT "SUMMARY REPORT:\n\n";
    print OUT "# of total lines in the file = $.\n\n\n";
    

    you could just use a here-doc:

    print { $out } <<"END";
    SUMMARY REPORT:
    # of total lines in the file = $.
    Range of time the file encompasses:
     Starting Time = $start_time
     Ending Time = $end_time
     Total Time = @total_timesplit
    ...
    END
    

    (but are you really sure that all those newlines in between actually make the output easier to read?)

    A heredoc is introduced with <<TERMINATOR where TERMINATOR is either a double-quoted "string" (in which case the here-doc behaves like a double quoted string), or a single-quoted 'string'. The terminator must occur on a line of its own. Heredocs cannot be indented. They are better than ordinary strings if you have a multi-line template (as is the case here).

  • You might as well remove that condition, as it is always true:

    my $j = 0;
    if ($j == 0) { ... }
    
  • I do not think that code like sleep (1); #dramatic pause! makes for a good user experience.

  • If you find yourself writing the same code again and again (or worse: copy&pasting some code), then you will likely want to write a subroutine. See the introduction in the Modern Perl Book or in the official docs to learn how to write your own subroutines.

answered Dec 12, 2013 at 0:03
\$\endgroup\$
1
  • \$\begingroup\$ wow thats a lot thank you! But as you can see, I need some help with the hashes. Do you kind of see what I am trying to do with them at all? \$\endgroup\$ Commented Dec 12, 2013 at 21:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.