I'm wondering if there is a simplier regex I could use in my code to remove the beginning and ending char in a line. Maybe combine some regex's? In this instance, it's a comma at the beginning and ending of a line. My output should be the fields seperated into CSV format.
#!/usr/local/bin/perl
use strict;
use warnings;
parse_DPCRS();
sub parse_DPCRS {
open ( FILEIN, 'txt_files/AKR_DPCRS.txt' );
open ( FILEOUT, '>txt_files/AKR_DPCRS.csv' );
while (<FILEIN>) {
next if /^(\s)*$/; #skip blank lines
next if /^\>/; #skip command line that start with >
next if /^\s+POINT\sCODE/; #skip header
next if /^\s+NODE\sNAME/; #skip header
next if /^\s+\=+/; #skip header
next if /^\s+CCS\sDPCRS/; #skip pageination footer
chomp; #removing trailing newline character
s/\s+/,/g; #replace white space with a comma
s/^,//; #replace beginning comma with empty
s/,$//; #replace ending comma with empty
my (
$nodeName, $pointCodeDec ) = split( "," );
print FILEOUT ($nodeName . "," . $pointCodeDec . "\n");
#print "$_\n";
}
};
close (FILEOUT);
close (FILEIN);
exit;
Here's a slice of the text file I'm parsing
>DISP CCS DPCRS ALL 0
POINT CODE POINT CODE TYPE OF ROUTESET NOTIFY NODE
NODE NAME DECIMAL HEX ROUTE MASTER SCCP LOCATION
=========== =========== ========== ======= ======== ====== ========
PBVJPRCO01T 1-1-1 010101 FULL PC 119 NO NON-ADJ
ROCHNYXA06T 1-6-1 010601 FULL PC 58 NO NON-ADJ
NYCNNYDRW17 1-6-2 010602 FULL PC 58 NO NON-ADJ
SYRCNYSW01T 1-6-3 010603 FULL PC 22 NO NON-ADJ
SYRCNYSWDS0 1-6-15 01060F FULL PC 58 NO NON-ADJ
ROCHNYFEDS0 1-6-17 010611 FULL PC 58 NO NON-ADJ
NYCMNYHD01T 1-9-11 01090B FULL PC 22 NO NON-ADJ
NWRKNJ1001T 1-9-14 01090E FULL PC 22 NO NON-ADJ
BSTNMABL01T 1-9-16 010910 FULL PC 22 NO NON-ADJ
4 Answers 4
Here is a single regex that removes ,
(comma) at the beginig or at the end of a string:
$str =~ s/^,+|,+$//g;
and here is a benchmark that compares this regex with a double one:
use Benchmark qw(:all);
my $str = q/,a,b,c,d,/;
my $count = -3;
cmpthese($count, {
'two regex' => sub {
$str =~ s/^,+//;
$str =~ s/,+$//;
},
'one regex' => sub {
$str =~ s/^,+|,+$//g;
},
});
The result:
Rate one regex two regex
one regex 597559/s -- -58%
two regex 1410348/s 136% --
We can see that two regex are really faster than one regex that combines the two.
I don't know of a tricky regexp, but wouldn't you be better served by creating the string using a combination of split and join?
http://perldoc.perl.org/functions/split.html
The examples in the middle show different combinations to split based on word boundaries. There are also examples that show how to get matching splits for the last and first character.
Hope thIs helps
I try to run your code, and i don't understand what do you want on exit?
I'll see first two colums, if what you want you need to use print "1,ドル2ドル\n"
Try to use this:
#!/usr/bin/perl
while (<>) {
print "1,ドル2,ドル3,ドル4,ドル5,ドル6,ドル7ドル\n" if /(\w{11})\s+(\d+-\d+-\d+)\s+(\w{6})\s+(\w.*?)\s+(\d+)\s+(\w.*?)\s+(\w.*?)$/gs
}
usage:
txt2csv.pl txt_files/AKR_DPCRS.txt > txt_files/AKR_DPCRS.csv
Here is an example of how I might do that.
#!/usr/local/bin/perl
use strict;
use warnings;
use 5.10.1;
my $input_filename = 'test.in';
my $output_filename = 'test.out';
open my $input, '<', $input_filename;
my $pack = '';
# throw out everything before "=====" line
while( <$input> ){
if( /^\s*=[\s=]+$/ ){
# use "=====" line to calculate lengths and offsets
# split on \s [=] boundary keeping everything
my @elem = split /(\s+)/;
my $pos = 0;
for (@elem){
my $length = length $_;
if( /=/ ){
$pack .= '@'.$pos.'A'.$length;
}
$pos += $length;
}
last; # stop skipping lines
}
}
# at this point the iterator for $input is after "=====" line
{
open my $output, '>', $output_filename;
for my $line ( <$input> ){
say {$output} join ',', unpack $pack, $line;
}
close $output;
}
close $input;
This code will continue to work if the width of the columns, or number of columns change.