5
\$\begingroup\$

I'm wondering if there is a simplier regex I could use in my code to remove the beginning and ending char in a line. Maybe combine some regex's? In this instance, it's a comma at the beginning and ending of a line. My output should be the fields seperated into CSV format.

#!/usr/local/bin/perl
use strict;
use warnings;
parse_DPCRS();
sub parse_DPCRS {
 open ( FILEIN, 'txt_files/AKR_DPCRS.txt' );
 open ( FILEOUT, '>txt_files/AKR_DPCRS.csv' );
 while (<FILEIN>) {
 next if /^(\s)*$/; #skip blank lines
 next if /^\>/; #skip command line that start with >
 next if /^\s+POINT\sCODE/; #skip header
 next if /^\s+NODE\sNAME/; #skip header
 next if /^\s+\=+/; #skip header
 next if /^\s+CCS\sDPCRS/; #skip pageination footer
 chomp; #removing trailing newline character
 s/\s+/,/g; #replace white space with a comma
 s/^,//; #replace beginning comma with empty
 s/,$//; #replace ending comma with empty
 my (
 $nodeName, $pointCodeDec ) = split( "," );
 print FILEOUT ($nodeName . "," . $pointCodeDec . "\n");
 #print "$_\n";
 }
};
close (FILEOUT);
close (FILEIN);
exit;

Here's a slice of the text file I'm parsing

>DISP CCS DPCRS ALL 0
 POINT CODE POINT CODE TYPE OF ROUTESET NOTIFY NODE
 NODE NAME DECIMAL HEX ROUTE MASTER SCCP LOCATION
 =========== =========== ========== ======= ======== ====== ========
 PBVJPRCO01T 1-1-1 010101 FULL PC 119 NO NON-ADJ
 ROCHNYXA06T 1-6-1 010601 FULL PC 58 NO NON-ADJ
 NYCNNYDRW17 1-6-2 010602 FULL PC 58 NO NON-ADJ
 SYRCNYSW01T 1-6-3 010603 FULL PC 22 NO NON-ADJ
 SYRCNYSWDS0 1-6-15 01060F FULL PC 58 NO NON-ADJ
 ROCHNYFEDS0 1-6-17 010611 FULL PC 58 NO NON-ADJ
 NYCMNYHD01T 1-9-11 01090B FULL PC 22 NO NON-ADJ
 NWRKNJ1001T 1-9-14 01090E FULL PC 22 NO NON-ADJ
 BSTNMABL01T 1-9-16 010910 FULL PC 22 NO NON-ADJ
asked Oct 7, 2011 at 16:33
\$\endgroup\$

4 Answers 4

2
\$\begingroup\$

Here is a single regex that removes , (comma) at the beginig or at the end of a string:

$str =~ s/^,+|,+$//g;

and here is a benchmark that compares this regex with a double one:

use Benchmark qw(:all);
my $str = q/,a,b,c,d,/;
my $count = -3;
cmpthese($count, {
 'two regex' => sub {
 $str =~ s/^,+//;
 $str =~ s/,+$//;
 },
 'one regex' => sub {
 $str =~ s/^,+|,+$//g;
 },
 });

The result:

 Rate one regex two regex
one regex 597559/s -- -58%
two regex 1410348/s 136% --

We can see that two regex are really faster than one regex that combines the two.

answered Apr 1, 2013 at 12:15
\$\endgroup\$
1
\$\begingroup\$

I don't know of a tricky regexp, but wouldn't you be better served by creating the string using a combination of split and join?

http://perldoc.perl.org/functions/split.html

The examples in the middle show different combinations to split based on word boundaries. There are also examples that show how to get matching splits for the last and first character.

Hope thIs helps

answered Oct 8, 2011 at 14:50
\$\endgroup\$
1
\$\begingroup\$

I try to run your code, and i don't understand what do you want on exit? I'll see first two colums, if what you want you need to use print "1,ドル2ドル\n"

Try to use this:

#!/usr/bin/perl
while (<>) {
 print "1,ドル2,ドル3,ドル4,ドル5,ドル6,ドル7ドル\n" if /(\w{11})\s+(\d+-\d+-\d+)\s+(\w{6})\s+(\w.*?)\s+(\d+)\s+(\w.*?)\s+(\w.*?)$/gs
}

usage:

txt2csv.pl txt_files/AKR_DPCRS.txt > txt_files/AKR_DPCRS.csv
answered Oct 18, 2011 at 14:02
\$\endgroup\$
1
\$\begingroup\$

Here is an example of how I might do that.

#!/usr/local/bin/perl
use strict;
use warnings;
use 5.10.1;
my $input_filename = 'test.in';
my $output_filename = 'test.out';
open my $input, '<', $input_filename;
my $pack = '';
# throw out everything before "=====" line
while( <$input> ){
 if( /^\s*=[\s=]+$/ ){
 # use "=====" line to calculate lengths and offsets
 # split on \s [=] boundary keeping everything
 my @elem = split /(\s+)/;
 my $pos = 0;
 for (@elem){
 my $length = length $_;
 if( /=/ ){
 $pack .= '@'.$pos.'A'.$length;
 }
 $pos += $length;
 }
 last; # stop skipping lines
 }
}
# at this point the iterator for $input is after "=====" line
{
 open my $output, '>', $output_filename;
 for my $line ( <$input> ){
 say {$output} join ',', unpack $pack, $line;
 }
 close $output;
}
close $input;

This code will continue to work if the width of the columns, or number of columns change.

answered Dec 16, 2011 at 21:42
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.