Parsing text from reports

Question 1

I want to parse some reports from multiple devices, reports looks like this:

VR Destination Mac Age Static VLAN VID Port
VR-Default 192.168.11.13 90:e2:ba:3c:95:c0 2 NO intra1 350 49
VR-Default 192.168.1.1 00:0e:a6:f7:b6:b5 0 NO main 602 1
VR-Default 192.168.1.2 00:0d:88:63:bf:d1 3 NO main 602 1
VR-Default 192.168.1.14 00:1c:f0:c7:d2:52 4 NO main 602 1
etc...
Dynamic Entries : 19 Static Entries : 0
Pending Entries : 1
In Request : 3888802 In Response : 4531
and some more data...
Rx Error : 0 Dup IP Addr : 0.0.0.0
and some more...

I need only vr, destination, mac, age, static, vlan, vid and port fields. I can parse it using split function and regexes, but split fails if one field (e.g. Age) is empty. perldoc says I can use unpack:

my $template = 'A13xA16xA18xA4xA7xA13xA5xA*'; 
for my $line ( split /\n/, $data ) {
 chomp $line;
 my ($vr, $destination, $mac, $age, $static, $vlan, $vid, $port) = unpack $template, $line;
...
}

But it dies on lines with length < 84. And I got to check string length every time (Or maybe using eval on unpack? Is it better?). And again I got to use regexes or index to find the end of main table and skip headers. The code will looks like:

#!/usr/bin/perl
use strict;
use warnings;
my $arp = <<'ARP';
VR Destination Mac Age Static VLAN VID Port
VR-Default 192.168.11.13 90:e2:ba:3c:95:c0 2 NO intra1 350 49
VR-Default 192.168.1.1 00:0e:a6:f7:b6:b5 0 NO main 602 1
VR-Default 192.168.1.2 00:0d:88:63:bf:d1 3 NO main 602 1
VR-Default 192.168.1.14 00:1c:f0:c7:d2:52 4 NO main 602 1
Dynamic Entries : 19 Static Entries : 0
Pending Entries : 1
In Request : 3888802 In Response : 4531
Rx Error : 0 Dup IP Addr : 0.0.0.0
ARP
my $template = 'A13xA16xA18xA4xA7xA13xA5xA*';
for my $line ( split /\n/, $arp ) {
 last if index( $line, 'Dynamic E' ) == 0;
 next if length $line < 84;
 chomp $line;
 my ($vr, $destination, $mac, $age, $static, $vlan, $vid, $port) = unpack $template, $line;
 next if $mac eq 'Mac';
 print "$mac - $destination\n";
}

My question is: how would you parse this data? split, regexes, unpack, substr or something else?

Question 2

This question does not deserve to be closed — the second program works.

Question 3

perhaps CSV module?

Question 4

@mpapec I thought CSV module will fails when some fields will be empty

Question 5

If a problem has been encountered before by someone else, chances are that there is a CPAN module for that (DataExtract::FixedWidth).

If you don't want to use a CPAN module, then my next choice would be to use regular expressions.

use strict;
# Strips leading and trailing whitespace from all parameters
sub strip {
 for (@_) { s/^\s+//; s/\s+$//; }
 @_;
}
# Extracts data from lines of text in tabular format.
#
# First parameter is a regular expression for capturing fixed-width fields.
#
# Subsequent parameters are the lines of tabular data, the first of which holds
# the column headings. Any line that does not match the regular expression,
# as well as subsequent lines, are discarded.
#
# Returns a list (one element per input line) of hashes (keyed by column names).
sub extract_table {
 my ($fmt, $first_line) = (shift, shift);
 my (@headers) = strip($first_line =~ $fmt);
 my @table;
 for my $line (@_) {
 my (@fields) = $line =~ $fmt;
 last unless @fields;
 my %data;
 @data{@headers} = strip(@fields);
 push @table, \%data;
 }
 return @table;
}
my $fmt = qr/^(.{14})(.{17})(.{19})(.{5})(.{8})(.{14})(.{6})(.*)/;
# Take lines of input from a reasonable source (STDIN or a filename
# argument on the command line)
my @table = extract_table($fmt, <>);
use Data::Dumper;
print Dumper(\@table);

Note that chomp() is unnecessary since we're stripping whitespace characters anyway.

Question 6

It might be worthwhile mentioning that strip(@fields); also alters @fields array.

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Accepted Answer · 2013-11-06 02:09:09Z

If a problem has been encountered before by someone else, chances are that there is a CPAN module for that (DataExtract::FixedWidth).

If you don't want to use a CPAN module, then my next choice would be to use regular expressions.

use strict;
# Strips leading and trailing whitespace from all parameters
sub strip {
 for (@_) { s/^\s+//; s/\s+$//; }
 @_;
}
# Extracts data from lines of text in tabular format.
#
# First parameter is a regular expression for capturing fixed-width fields.
#
# Subsequent parameters are the lines of tabular data, the first of which holds
# the column headings. Any line that does not match the regular expression,
# as well as subsequent lines, are discarded.
#
# Returns a list (one element per input line) of hashes (keyed by column names).
sub extract_table {
 my ($fmt, $first_line) = (shift, shift);
 my (@headers) = strip($first_line =~ $fmt);
 my @table;
 for my $line (@_) {
 my (@fields) = $line =~ $fmt;
 last unless @fields;
 my %data;
 @data{@headers} = strip(@fields);
 push @table, \%data;
 }
 return @table;
}
my $fmt = qr/^(.{14})(.{17})(.{19})(.{5})(.{8})(.{14})(.{6})(.*)/;
# Take lines of input from a reasonable source (STDIN or a filename
# argument on the command line)
my @table = extract_table($fmt, <>);
use Data::Dumper;
print Dumper(\@table);

Note that chomp() is unnecessary since we're stripping whitespace characters anyway.

It might be worthwhile mentioning that strip(@fields); also alters @fields array.

Stack Exchange Network

Parsing text from reports

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parsing text from reports

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions