I want a Perl one-liner that checks whether the first fields of an input file is the file's name and, if it isn't, adds the file name as the first column on every line.
Example written in shell :
for f in *file*.csv;
do
file_column=`cat ${f} | awk -F',' '{print1ドル}'`
if [ $file_column != ${f} ]
then
sed -i "s/^/$f,/" $f 2>/dev/null;
fi
done
But the approach above, which checks whether the file name is present in the first column and adds it if it isn't, is taking ~3 Hours for 4 Laks files. I understand that Perl is faster for file operations.
The Perl command I tried:
perl -p -i -e 's/^/Welcome to Hell,/' file*.csv
Please help me add the logic to check whether the field exists already and only change if it doesn't.
Input : file1.csv
col1,col2,col3
data1,data2,dat3
Output: file1.csv
file1.csv,col1,col2,col3
file1.csv,data1,data2,data3
or if here is any faster way please suggest. Perl one liner because it's part of another shell script so tiny call will be better i guess (suggest please)
-
Can you give some sample input/output? Also: Why is a one liner desirable?Sobrique– Sobrique2015年06月24日 10:40:31 +00:00Commented Jun 24, 2015 at 10:40
-
I would offer - embedding perl into another script isn't as useful as just writing a script in perl.Sobrique– Sobrique2015年06月24日 10:55:18 +00:00Commented Jun 24, 2015 at 10:55
-
@Sobrique , i would be happy if you offer an perl script for the above problemWilliam R– William R2015年06月24日 11:58:03 +00:00Commented Jun 24, 2015 at 11:58
5 Answers 5
Here's your perl one-liner: it works with multiple file arguments
perl -i -pe '/^$ARGV,/ or print "$ARGV,"' file1 file2 ...
$ARGV
is the magic variable that holds the filename of the current file.
See http://perldoc.perl.org/perlvar.html#Variables-related-to-filehandles
The field separator (comma) is hardcoded. You can decide if that's a problem.
Small performance improvement:
perl -i -pe 'index($_, "$ARGV,") == 0 or print "$ARGV,"' file1 file2 ...
Before told about perl speed try to speed up your own script
for f in *file*.csv;
do
sed -i "/^$f,/! s/^/$f,/" "$f"
done
-
other than first line it removed rest of the line in the file ,i need file name along with the data as starting columnWilliam R– William R2015年06月24日 11:00:24 +00:00Commented Jun 24, 2015 at 11:00
-
@WilliamR I have edited alreadyCostas– Costas2015年06月24日 11:03:33 +00:00Commented Jun 24, 2015 at 11:03
-
Thanks lot @Costas , It took 20 seconds for 30K file, Let me check for 4Laks files.William R– William R2015年06月24日 11:05:00 +00:00Commented Jun 24, 2015 at 11:05
-
It's taking long time to complete for 4 Laks files , For smaller amount like 30K files it's taking ~5 to 10 seconds. Do you have any suggestion ?William R– William R2015年06月24日 11:56:42 +00:00Commented Jun 24, 2015 at 11:56
While you can actually do this with Perl, the syntax is not the simplest (or at least, it isn't with the best I can come up with). It will probably be both simpler and faster to use other tools. For example,
gawk (relatively recent versions)
for f in file*csv; do awk -i inplace -F, '{ if(1ドル==FILENAME){print} else{print FILENAME","0ドル} }' "$f"; done
OK, the problem with a 'perl one liner' as you note:
perl -p -i -e 's/^/Welcome to Hell,/' file*.csv
This applies a transform to the file right enough, but perl 'handles' opening the file(s) and streaming them through STDIN
automagically. Which means you don't know your file name when you're doing it.
The in place edit
option (-i
) is a convenience but actually becomes rather more difficult to actually use effectively, since you're potentially opening a file for reading and writing concurrently.
Anyway, I'd approach your problem like this:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1 } );
foreach my $filename ( glob("*.csv") ) {
open( my $output, ">", "new.$filename.csv" ) or warn $!;
open( my $input, "<", "$filename.csv" ) or warn $!;
while ( my $row = $csv->getline($input) ) {
if ( not $row->[0] eq m/$filename/ ) {
unshift( @{$row}, $filename );
}
$csv->print( $output, $row );
}
}
It uses the Text::CSV
module, because actually CSV is often more complicated than just "split on comma" (think multi-line fields, and commas in text).
Can't manage a one liner, but here's a perl script. Put it in a file
and make it executable. Then give it the *.csv
filenames as args. It
creates *.new
files. If you are confident it works, uncomment the
rename
command at the end.
#!/usr/bin/perl
use strict;
foreach my $file(@ARGV){
open(F,$file) or die "$file:$!";
$_ = <F>;
next if $_=~/^$file,/;
open(OUT,">$file.new") or die;
my $add = "$file,";
print OUT $add,$_;
while(<F>){
print OUT $add,$_;
}
close OUT;
close F;
#rename("$file.new","$file");
}