I have been working on this code for quite a while and just want to make sure it is up to good standards. I know many of you will have questions, so as they come up, I will edit my initial question to provide you answers. So any ways to make this code more concise would be great. Be gentle on me, I am only a beginner.
(If you are curious about this projects requirements or the source file I have inputted, if anyone knows how, I can perhaps maybe post a download link to the code or the assignment, let me know if you're interested!)
#!/usr/bin/perl -w
# Final project
use strict;
use warnings;
use diagnostics;
#opens txt file: read mode
open MYFILE, '<', 'source_file.txt' or die $!;
#opens output txt file: write mode
open OUT, '>', 'Summary_Report.txt' or die $!;
#open output txt file: write mode
#used to store header 'split' info
open OUTFILE, '>', 'Header.txt' or die $!;
my $i = 0;
$| = 1; #disable output buffering
my $start_time = undef; #undefined to avoid recycling through other time stamps
my $end_time;
my $user = 0;
my $pass = 0;
my $packet_size = 0; #goes with length#
my @header;
my @source_ip;
my @source_port;
my $src_port;
my @src_port;
my @dest_ip;
my @dest_port;
my $destination_port;
my @destination_port;
while (<MYFILE>) { #loops through every line in file
chomp; #break new line
#separate pieces of information from TCPDUMP into list
@header = split (/\s+/, $_);
print OUTFILE "$.: @header\n\n";
if (/^\d+:\d+/) {
##############################T I M E###################################
#defining first 'line & time' as 'special'
if (/^22:28/ && !defined($start_time)) {
$start_time = $header[0];
#print "$start_time\n"; ###used as a check###
}
#Used recycling of time stamps to find last one available
if (/22:28/) {
$end_time = $header[0];
}
############################S O U R C E##################################
#categorizing each section of ip's from source
@source_ip = split ('\.', $header[2]);
#adding ip's together, joining in concatenation by '.'
$source_ip[$i] = $source_ip[0] . '.' . $source_ip[1] . '.' . $source_ip[2] . '.' . $source_ip[3];
#print $source_ip[$i]; (check)
@source_port = split (':', $source_ip[4]);
$src_port[$i] = $source_port[0];
#########################D E S T I N A T I O N###########################
#categorizing each section of ip's from destination
@dest_ip = split ('\.', $header[4]);
#adding ip's together, joining in concatenation by '.'
$dest_ip[$i] = $dest_ip[0] . '.' . $dest_ip[1] . '.' . $dest_ip[2] . '.' . $dest_ip[3];
#print $dest_ip[$i]; (check)
@dest_port = split (':', $dest_ip[4]);
$destination_port[$i] = $dest_port[0];
#############################L E N G T H#################################
#-1 represents length
#transferring $header[-1] into new variable for ease of use
$packet_size = $packet_size + $header[-1];
#print $packet_size; (check)
$i++;
}
}
############################INTERESTING TEXT#############################
my @total_timesplit;
my @s_timesplit = split (':', $start_time);
#print @start_split; check
my @e_timesplit = split (':', $end_time);
#print @end_split; check
$total_timesplit[0] = $e_timesplit[0] - $s_timesplit[0];
$total_timesplit[1] = $e_timesplit[1] - $s_timesplit[1];
$total_timesplit[2] = $e_timesplit[2] - $s_timesplit[2];
#print @total_split; check
#yields average packet size
my $avg_length = $packet_size/$i;
#print $mean; check
##############################SOURCEIP###################################
my %ip_source;
my @ip_source;
%ip_source = map {$_, 1} @ip_source;
my @distinct_src_ip;
@distinct_src_ip = keys %ip_source;
my $sourceip_amt;
$sourceip_amt = @distinct_src_ip;
#print $sourceipnum;
#############################SOURCEPORT##################################
my %port_source;
my @port_source;
%port_source = map {$_, 1} @port_source;
my @distinct_src_port;
@distinct_src_port = keys %port_source;
my $srcport_amt;
$srcport_amt = @distinct_src_port;
#print $srcportnum;
#############################DESTINATIONIP###############################
my %ip_dest;
my @ip_dest;
%ip_dest = map {$_, 1} @ip_dest;
my @distinct_dest_ip;
@distinct_dest_ip = keys %ip_dest;
my $destip_amt;
$destip_amt = @distinct_dest_ip;
#print $destipnum;
#############################DESTINATIONPORT#############################
my %port_dest;
my @port_dest;
%port_dest = map {$_, 1} @port_dest;
my @distinct_dest_port;
@distinct_dest_port = keys %port_dest;
my $destport_amt;
$destport_amt = @distinct_dest_port;
#print $destportnum;
close MYFILE;
#########################D A T A S E C T I O N###########################
open MYFILE, '<', 'source_file.txt' or die $!;
#I am separating loop to reset values#
while (<MYFILE>) {
#finds all instances of USER
$user++ if /USER/ig;
#print "user" (use as check)
#finds all instances of PASS
$pass++ if /PASS/ig;
#print "pass" (use as check)
}
#Output summary to new file: overwrite file
print OUT "SUMMARY REPORT:\n\n";
print OUT "# of total lines in the file = $.\n\n\n";
print OUT "Range of time the file encompasses:\n\n
Starting Time = $start_time\n\n
Ending Time = $end_time\n\n
Total Time = @total_timesplit\n\n\n";
print OUT "Total # of distinct SOURCE ip addresses = \n\n\n";
print OUT "Total # of distinct DESTINATION ip addresses = \n\n\n";
print OUT "Listing of distinct SOURCE ip addresses = \n\n\n";
print OUT "Listing of distinct DESTINATION ip addresses = \n\n\n";
print OUT "Total # of distinct SOURCE TCP ports = \n\n\n";
print OUT "Total # of distinct DESTINATION TCP ports = \n\n\n";
print OUT "Listing of distinct SOURCE TCP ports = \n\n\n";
print OUT "Listing of distinct DESTINATION TCP ports = \n\n\n";
print OUT "Total # of times phrases were used:\n\n
USER (variations thereof) = $user\n\n
PASS (variations thereof) = $pass\n\n\n";
print OUT "DETAIL SECTION:\n\n\n";
print OUT "SOURCE IP address activity by port over time:\n\n
Mean packet size for above = \n\n
Median packet size for above = \n\n\n";
print OUT "Detail IP address activity by port over time:\n\n
Mean packet size for above = \n\n
Median packet size for above = \n\n\n";
print OUT "Any and all interesting text w/in the DATA section of the file:\n\n
Packet total = $packet_size\n\n
Number of header lines = $i\n\n
Average packet size = $avg_length\n\n";
close OUT; #[ ]
close OUTFILE; #[close remaining files ]
close MYFILE; #[ ]
my $j = 0;
if ($j == 0) {
sleep (1); #dramatic pause!
print "\n\n File has successfully run. Please open your Local Disk C: folder to reveal\n
'Summary_Report' file\n\n";
}
-
\$\begingroup\$ Not a question, but I'd appreciate it if you could put the code's intent as the title. That would help us and help you attract more reviewers. \$\endgroup\$Jamal– Jamal2013年12月11日 23:12:23 +00:00Commented Dec 11, 2013 at 23:12
1 Answer 1
This got a bit longer than expected.
Don't use
-w
in the shebang line. it serves no purpose if you alreadyuse warnings
.Instead of doing error handling for
open
etc. yourself, you shoulduse autodie
. This is a Core module since perl 5.10.1.Don't create "bareword filehandles", because those are actually global variables. Instead:
open my $out, '>', 'some_file'
. Now$out
would be the filehandle which you can print to as you're used to:print $out "stuff"
or better:print { $out } "stuff"
.$| = 1
disables buffering for the currently selected filehandle (by default,STDOUT
). Only do this if you need to, as it degrades performance. Also, you should restrict this to the tightest scope neccessary, e.g.if ($foo) { local $| = 1; print "stuff" }
. But as you have commented out all your debug statements, there is no need for unbuffered output anyway.Don't declare all your variables up front – declare them in the tightest scope possible. For example
@header
can be declared inside the loop.split /\s+/, $_
is almost equivalent to the default behavior ofsplit
(which skips over leading whitespace). You probably want to use that instead:my @header = split;
.Instead of
while (...) { ... if ($cond) { STUFF }}
, write:while (...) { ... next if not $cond; STUFF; }
this avoids an unnecessary level of indentation.
if (/^22:28/ && !defined($start_time)) { A } if (/22:28/) { B }
– are you sure the second regex shouldn't be anchored at the beginning as well? In that case, the code could be simplified to:if (/^22:28/) { if (not defined $start_time) { A; } B; }
This line
$source_ip[$i] = $source_ip[0] . '.' . $source_ip[1] . '.' . $source_ip[2] . '.' . $source_ip[3]
would be better written as
$source_ip[$i] = join '.', @source_ip[0 .. 3]
. This uses an array slice and thejoin
builtin.I am not sure about this piece of code:
$source_ip[$i] = $source_ip[0] . '.' . $source_ip[1] . '.' . $source_ip[2] . '.' . $source_ip[3]; @source_port = split (':', $source_ip[4]);
You write into the
$i
-entry of@source_ip
, and$i
starts with zero. This means that$source_ip[4]
is likelyundef
, and should emit some warning. Are you sure you didn't mean$source_ip[4] = ...;
? But why don't you use another scalar variable for that?my $ip_string = ...
. Actually, we don't need any extra variable here, and could just write:@source_port = split /:/, join '.', @source_ip[0 .. 3];
I am not sure what that line intends to do, I guess you are trying to remove the port part of an IP address? A regex would do a better job than
split
ting and concatenating strings in order to parse the input.$packet_size = $packet_size + $header[-1]
would usually be written as:$packet_size += $header[-1];
"Three or more, use a
for
":for my $i (0 .. 2) { $total_timesplit[$i] = $e_timesplit[$i] - $s_timesplit[$i]; }
This is weird:
my %ip_source; my @ip_source; %ip_source = map {$_, 1} @ip_source;
Because
@ip_source
was just declared, it is empty. Thus%ip_source
does not contain any keys. You can basically remove that code, as it does nothing.To find unique elements in an array, you can use the trick with the hash. Instead of writing that code yourself every time, you can
use List::Util qw< uniq >
, and then find the unique elements likemy @unique_items = uniq @items_with duplicates;
... if /USER/ig
– please do not use the/g
flag in scalar context, as that generally turns the match into an iterator over the current string – but not in this special case, for more complex reasons (which I can explain if you're really interested). But most of the time, this is a bug.Instead of multiple
print
s, of double-quoted strings,print OUT "SUMMARY REPORT:\n\n"; print OUT "# of total lines in the file = $.\n\n\n";
you could just use a here-doc:
print { $out } <<"END"; SUMMARY REPORT: # of total lines in the file = $. Range of time the file encompasses: Starting Time = $start_time Ending Time = $end_time Total Time = @total_timesplit ... END
(but are you really sure that all those newlines in between actually make the output easier to read?)
A heredoc is introduced with
<<TERMINATOR
whereTERMINATOR
is either a double-quoted"string"
(in which case the here-doc behaves like a double quoted string), or a single-quoted'string'
. The terminator must occur on a line of its own. Heredocs cannot be indented. They are better than ordinary strings if you have a multi-line template (as is the case here).You might as well remove that condition, as it is always true:
my $j = 0; if ($j == 0) { ... }
I do not think that code like
sleep (1); #dramatic pause!
makes for a good user experience.If you find yourself writing the same code again and again (or worse: copy&pasting some code), then you will likely want to write a subroutine. See the introduction in the Modern Perl Book or in the official docs to learn how to write your own subroutines.
-
\$\begingroup\$ wow thats a lot thank you! But as you can see, I need some help with the hashes. Do you kind of see what I am trying to do with them at all? \$\endgroup\$user2288– user22882013年12月12日 21:27:32 +00:00Commented Dec 12, 2013 at 21:27