I have a huge text file which has below columns
col1 col2 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
abc dec 10 20 30 40 50 60 70 80 90 11 12 13
The output I am looking for is an addition of all months in new column FullYear.
col1 col2 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FullYear
abc dec 10 20 30 40 50 60 70 80 90 11 12 13 486
I tried using awk command, however, my data is with huge precision numbers. And the below command is giving wrong output.
awk -F ' ' {print 1ドル" "2ドル" "3ドル" "4ドル" "5ドル" "6ドル" "7ドル" "8ドル" "9ドル" "10ドル" "11ドル" "12ドル" "13ドル" "14ドル" "3ドル+4ドル+5ドル+6ドル+7ドル+8ドル+9ドル+10ドル+11ドル+12ドル+13ドル+14ドル}' inputfile.txt > outputfile.txt
I need to write a Perl script to get this done.
4 Answers 4
This is fairly easy to do in Perl, even as a one-liner:
perl -MList::Util=sum -anE 'if (1 == $.) { say join(q{ }, @F, q{FullYear}) } else { say join(q{ }, @F, sum(@F[2..13])) }' «YOUR-FILE»
Explanation:
-MList::Util=sum
loads the List::Util module and imports the sum
function. This is the same as use List::Util qw(sum)
.
-n
tells Perl to process the input file line-by-line, running the script for each line. (Actually redundant, as the next option implicitly turns this on). -a
turns on autosplit mode, so we get an array @F
with one entry per field. -E
means we're going to provide a script as a command-line argument, using current Perl features (for "say" in this case).
Full details for those options can be found in the perlrun
manpage/podfile.
Then, here is the script, with spacing added, and comments explaining:
if (1 == $.) { # $. is the line number. Line 1 is header line.
say join(' ', @F, q{FullYear}); # print out the heder + FullYear
}
else {
# print out rows + sum of columns 2..13. Remember Perl counts from 0 in arrays,
# so column 2 is the 3rd column (the number for January).
say join(' ', @F, sum(@F[2..13]));
}
BTW: You can ask Perl to help understand one-liners (at least ones you trust — this is not safe with untrusted scripts) with -MO=Deparse
, which gives output like this:
command:
perl -MO=Deparse -MList::Util=sum -anE 'if (1 == $.) { say join(q{ }, @F, q{FullYear}) } else { say join(q{ }, @F, sum(@F[2..13])) }' t-file
output:
use List::Util (split(/,/, 'sum', 0));
use feature 'current_sub', 'bitwise', 'evalbytes', 'fc', 'postderef_qq', 'say', 'state', 'switch', 'unicode_strings', 'unicode_eval';
LINE: while (defined($_ = readline ARGV)) {
our @F = split(' ', $_, 0);
if (1 == $.) {
say join(' ', @F, 'FullYear');
}
else {
say join(' ', @F, &sum(@F[2..13]));
}
}
-e syntax OK
So you can see the List::Util
load, the -n
going line-by-line, and -a
adding the split
.
-
This works if my numbers are up to 5 digits. However, it gives incorrect added numbers if my numbers are in 7 to 8 digits.Parix– Parix2019年01月24日 17:49:45 +00:00Commented Jan 24, 2019 at 17:49
Would Math::BigFloat
do for your "huge precision"?
perl -MMath::BigFloat -ape 'my $s=0; $s += new Math::BigFloat($_) for @F[2..$#F]; s/$/ $s/'
abc dec 7.5 8.5
abc dec 7.5 8.5 16
You could also use List::Util::sum
with Math::BigFloat
; it's quite pointless, though:
perl -MMath::BigFloat -MList::Util=sum -ape 's/$/" ".sum map new Math::BigFloat($_), @F[2..$#F]/e'
-
Tried this approach. However, the results I am getting is NaN. Not sure if I am missing something here. perl -MMath::BigFloat -ape 'my $s=0; $s += new Math::BigFloat($_) for @F[4..15]; s/$/ $s/' inputParix– Parix2019年01月24日 17:52:21 +00:00Commented Jan 24, 2019 at 17:52
-
that means that any of the fields from the 5th to 16th is not a number. Notice that in perl array indexes start from 0, as in C, not from 1 as in awk or Fortran.user313992– user3139922019年01月24日 18:05:07 +00:00Commented Jan 24, 2019 at 18:05
-
@Parix In your awk script, you have
3ドル+...+14ドル
; that should translate to@F[2..13]
in perl, not to@F[4..15]
.user313992– user3139922019年01月25日 02:26:09 +00:00Commented Jan 25, 2019 at 2:26
It's not perl
, but this seems to get the job done:
awk 'NR==1 {$(NF+1) = "FullYear"; print} NR>1 {subtotal=0; for(f=0;f<=NF; f++) {subtotal+=$f}; $(NF+1)=subtotal; printf( "%s %s %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f\n", 1,ドル 2,ドル 3,ドル 4,ドル 5,ドル 6,ドル 7,ドル 8,ドル 9,ドル 10,ドル 11,ドル 12,ドル 13,ドル 14,ドル 15ドル ) }' inputfile
Just a variant of @derobert:
perl -MList::Util=sum -nlE 'say "$_ ", sum((split)[2..13])||"FullYear"' input
or using -a
perl -MList::Util=sum -nalE 'say "$_ ", sum(@F[2..13])||"FullYear"' input
-
Tries this approach too. This works if my numbers are up to 5 digits. However, it gives incorrect added numbers if my numbers are in 7 to 8 digits.Parix– Parix2019年01月24日 17:50:11 +00:00Commented Jan 24, 2019 at 17:50
-
@Parix, could you please show me an example of such a situation?JJoao– JJoao2019年01月24日 18:43:04 +00:00Commented Jan 24, 2019 at 18:43
-
Output for 2 records is as below: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FullYear col11 col12 2298846.53 2328664.3 2326527.39 2385298.77 2400046.08 2404192.36 2394351.11 2415755.8 2383387.25 2410001.65 2388574.37 2387894.37 26135645.61 col21 col22 13438.40828 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 176043.1485 The #1 record <-- gives incorrect added value The #2 record <-- gives correct added valueParix– Parix2019年01月24日 19:23:16 +00:00Commented Jan 24, 2019 at 19:23
-
@Parix, with both variants, in my machine I get not "26135645.61" but "28523539.98"JJoao– JJoao2019年01月25日 17:01:48 +00:00Commented Jan 25, 2019 at 17:01
-
hi.. seems my starting data columns are causing the issue. The calculation works where it's a single word. however, it is not considering the last column where in it's more than a single word. Though my entire file is tab delimited with data identifier as double quotes "Parix– Parix2019年03月14日 17:28:17 +00:00Commented Mar 14, 2019 at 17:28
awk
script not working as expected?printf
to specify the format in which you want the results to be printed. It's also helpful in your example input to use data which demonstrates the problem you're having. By default it's probably using exponential notation; if you want fixed-point notation you can do something likeprintf( "%5.10f", 6ドル )
to get five and ten places of output before and after the decimal point, respectively.