Best way to replace a beginning and end character in Perl using Regular Expression?

Question 1

I'm wondering if there is a simplier regex I could use in my code to remove the beginning and ending char in a line. Maybe combine some regex's? In this instance, it's a comma at the beginning and ending of a line. My output should be the fields seperated into CSV format.

#!/usr/local/bin/perl
use strict;
use warnings;
parse_DPCRS();
sub parse_DPCRS {
 open ( FILEIN, 'txt_files/AKR_DPCRS.txt' );
 open ( FILEOUT, '>txt_files/AKR_DPCRS.csv' );
 while (<FILEIN>) {
 next if /^(\s)*$/; #skip blank lines
 next if /^\>/; #skip command line that start with >
 next if /^\s+POINT\sCODE/; #skip header
 next if /^\s+NODE\sNAME/; #skip header
 next if /^\s+\=+/; #skip header
 next if /^\s+CCS\sDPCRS/; #skip pageination footer
 chomp; #removing trailing newline character
 s/\s+/,/g; #replace white space with a comma
 s/^,//; #replace beginning comma with empty
 s/,$//; #replace ending comma with empty
 my (
 $nodeName, $pointCodeDec ) = split( "," );
 print FILEOUT ($nodeName . "," . $pointCodeDec . "\n");
 #print "$_\n";
 }
};
close (FILEOUT);
close (FILEIN);
exit;

Here's a slice of the text file I'm parsing

>DISP CCS DPCRS ALL 0
 POINT CODE POINT CODE TYPE OF ROUTESET NOTIFY NODE
 NODE NAME DECIMAL HEX ROUTE MASTER SCCP LOCATION
 =========== =========== ========== ======= ======== ====== ========
 PBVJPRCO01T 1-1-1 010101 FULL PC 119 NO NON-ADJ
 ROCHNYXA06T 1-6-1 010601 FULL PC 58 NO NON-ADJ
 NYCNNYDRW17 1-6-2 010602 FULL PC 58 NO NON-ADJ
 SYRCNYSW01T 1-6-3 010603 FULL PC 22 NO NON-ADJ
 SYRCNYSWDS0 1-6-15 01060F FULL PC 58 NO NON-ADJ
 ROCHNYFEDS0 1-6-17 010611 FULL PC 58 NO NON-ADJ
 NYCMNYHD01T 1-9-11 01090B FULL PC 22 NO NON-ADJ
 NWRKNJ1001T 1-9-14 01090E FULL PC 22 NO NON-ADJ
 BSTNMABL01T 1-9-16 010910 FULL PC 22 NO NON-ADJ

Question 2

Here is a single regex that removes , (comma) at the beginig or at the end of a string:

$str =~ s/^,+|,+$//g;

and here is a benchmark that compares this regex with a double one:

use Benchmark qw(:all);
my $str = q/,a,b,c,d,/;
my $count = -3;
cmpthese($count, {
 'two regex' => sub {
 $str =~ s/^,+//;
 $str =~ s/,+$//;
 },
 'one regex' => sub {
 $str =~ s/^,+|,+$//g;
 },
 });

The result:

 Rate one regex two regex
one regex 597559/s -- -58%
two regex 1410348/s 136% --

We can see that two regex are really faster than one regex that combines the two.

Question 3

I don't know of a tricky regexp, but wouldn't you be better served by creating the string using a combination of split and join?

http://perldoc.perl.org/functions/split.html

The examples in the middle show different combinations to split based on word boundaries. There are also examples that show how to get matching splits for the last and first character.

Hope thIs helps

Question 4

I try to run your code, and i don't understand what do you want on exit? I'll see first two colums, if what you want you need to use print "1,ドル2ドル\n"

Try to use this:

#!/usr/bin/perl
while (<>) {
 print "1,ドル2,ドル3,ドル4,ドル5,ドル6,ドル7ドル\n" if /(\w{11})\s+(\d+-\d+-\d+)\s+(\w{6})\s+(\w.*?)\s+(\d+)\s+(\w.*?)\s+(\w.*?)$/gs
}

usage:

txt2csv.pl txt_files/AKR_DPCRS.txt > txt_files/AKR_DPCRS.csv

Question 5

Here is an example of how I might do that.

#!/usr/local/bin/perl
use strict;
use warnings;
use 5.10.1;
my $input_filename = 'test.in';
my $output_filename = 'test.out';
open my $input, '<', $input_filename;
my $pack = '';
# throw out everything before "=====" line
while( <$input> ){
 if( /^\s*=[\s=]+$/ ){
 # use "=====" line to calculate lengths and offsets
 # split on \s [=] boundary keeping everything
 my @elem = split /(\s+)/;
 my $pos = 0;
 for (@elem){
 my $length = length $_;
 if( /=/ ){
 $pack .= '@'.$pos.'A'.$length;
 }
 $pos += $length;
 }
 last; # stop skipping lines
 }
}
# at this point the iterator for $input is after "=====" line
{
 open my $output, '>', $output_filename;
 for my $line ( <$input> ){
 say {$output} join ',', unpack $pack, $line;
 }
 close $output;
}
close $input;

This code will continue to work if the width of the columns, or number of columns change.

Toto Toto 5791 gold badge8 silver badges15 bronze badges · Accepted Answer · 2013-04-01 12:15:50Z

Here is a single regex that removes , (comma) at the beginig or at the end of a string:

$str =~ s/^,+|,+$//g;

and here is a benchmark that compares this regex with a double one:

use Benchmark qw(:all);
my $str = q/,a,b,c,d,/;
my $count = -3;
cmpthese($count, {
 'two regex' => sub {
 $str =~ s/^,+//;
 $str =~ s/,+$//;
 },
 'one regex' => sub {
 $str =~ s/^,+|,+$//g;
 },
 });

The result:

 Rate one regex two regex
one regex 597559/s -- -58%
two regex 1410348/s 136% --

We can see that two regex are really faster than one regex that combines the two.

Stack Exchange Network

Best way to replace a beginning and end character in Perl using Regular Expression?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Best way to replace a beginning and end character in Perl using Regular Expression?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions