0

I have a huge text file which has below columns

col1 col2 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
abc dec 10 20 30 40 50 60 70 80 90 11 12 13

The output I am looking for is an addition of all months in new column FullYear.

col1 col2 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FullYear
abc dec 10 20 30 40 50 60 70 80 90 11 12 13 486

I tried using awk command, however, my data is with huge precision numbers. And the below command is giving wrong output.

awk -F ' ' {print 1ドル" "2ドル" "3ドル" "4ドル" "5ドル" "6ドル" "7ドル" "8ドル" "9ドル" "10ドル" "11ドル" "12ドル" "13ドル" "14ドル" "3ドル+4ドル+5ドル+6ドル+7ドル+8ドル+9ドル+10ドル+11ドル+12ドル+13ドル+14ドル}' inputfile.txt > outputfile.txt

I need to write a Perl script to get this done.

Siva
9,2529 gold badges59 silver badges88 bronze badges
asked Jan 23, 2019 at 21:11
5
  • Define "wrong". How is the awk script not working as expected? Commented Jan 23, 2019 at 21:36
  • The actual numbers in the file are with huge precision 13438.40828455529 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 14782.24911301082 when these numbers are added. It gives addition in unusual format. Commented Jan 23, 2019 at 21:38
  • That's not a probem; use printf to specify the format in which you want the results to be printed. It's also helpful in your example input to use data which demonstrates the problem you're having. By default it's probably using exponential notation; if you want fixed-point notation you can do something like printf( "%5.10f", 6ドル ) to get five and ten places of output before and after the decimal point, respectively. Commented Jan 23, 2019 at 21:40
  • I want to add a new column to existing file. awk -F ' ' {printf "%s %s %d %d %d %d %d %d %d %d %d %d %10.10f" , 1ドル" "2ドル" "3ドル" "4ドル" "5ドル" "6ドル" "7ドル" "8ドル" "9ドル" "10ドル" "11ドル" "12ドル" "13ドル" "14ドル" "3ドル+4ドル+5ドル+6ドル+7ドル+8ドル+9ドル+10ドル+11ドル+12ドル+13ドル+14ドル}' inputfile.txt > outputfile.txt Tried this, however it doesn't work. Commented Jan 23, 2019 at 22:00
  • Again, define "doesn't work". How does it not function as expected or intended? Please answer this question not by adding an additional comment, but by editing your question to include the relevant information. Commented Jan 23, 2019 at 22:06

4 Answers 4

3

This is fairly easy to do in Perl, even as a one-liner:

perl -MList::Util=sum -anE 'if (1 == $.) { say join(q{ }, @F, q{FullYear}) } else { say join(q{ }, @F, sum(@F[2..13])) }' «YOUR-FILE»

Explanation:

-MList::Util=sum loads the List::Util module and imports the sum function. This is the same as use List::Util qw(sum).

-n tells Perl to process the input file line-by-line, running the script for each line. (Actually redundant, as the next option implicitly turns this on). -a turns on autosplit mode, so we get an array @F with one entry per field. -E means we're going to provide a script as a command-line argument, using current Perl features (for "say" in this case).

Full details for those options can be found in the perlrun manpage/podfile.

Then, here is the script, with spacing added, and comments explaining:

if (1 == $.) { # $. is the line number. Line 1 is header line.
 say join(' ', @F, q{FullYear}); # print out the heder + FullYear
}
else {
 # print out rows + sum of columns 2..13. Remember Perl counts from 0 in arrays,
 # so column 2 is the 3rd column (the number for January).
 say join(' ', @F, sum(@F[2..13]));
}

BTW: You can ask Perl to help understand one-liners (at least ones you trust — this is not safe with untrusted scripts) with -MO=Deparse, which gives output like this:

command:

perl -MO=Deparse -MList::Util=sum -anE 'if (1 == $.) { say join(q{ }, @F, q{FullYear}) } else { say join(q{ }, @F, sum(@F[2..13])) }' t-file 

output:

use List::Util (split(/,/, 'sum', 0));
use feature 'current_sub', 'bitwise', 'evalbytes', 'fc', 'postderef_qq', 'say', 'state', 'switch', 'unicode_strings', 'unicode_eval';
LINE: while (defined($_ = readline ARGV)) {
 our @F = split(' ', $_, 0);
 if (1 == $.) {
 say join(' ', @F, 'FullYear');
 }
 else {
 say join(' ', @F, &sum(@F[2..13]));
 }
}
-e syntax OK

So you can see the List::Util load, the -n going line-by-line, and -a adding the split.

answered Jan 23, 2019 at 21:45
1
  • This works if my numbers are up to 5 digits. However, it gives incorrect added numbers if my numbers are in 7 to 8 digits. Commented Jan 24, 2019 at 17:49
2

Would Math::BigFloat do for your "huge precision"?

perl -MMath::BigFloat -ape 'my $s=0; $s += new Math::BigFloat($_) for @F[2..$#F]; s/$/ $s/'
abc dec 7.5 8.5
abc dec 7.5 8.5 16

You could also use List::Util::sum with Math::BigFloat; it's quite pointless, though:

perl -MMath::BigFloat -MList::Util=sum -ape 's/$/" ".sum map new Math::BigFloat($_), @F[2..$#F]/e'
answered Jan 24, 2019 at 2:57
3
  • Tried this approach. However, the results I am getting is NaN. Not sure if I am missing something here. perl -MMath::BigFloat -ape 'my $s=0; $s += new Math::BigFloat($_) for @F[4..15]; s/$/ $s/' input Commented Jan 24, 2019 at 17:52
  • that means that any of the fields from the 5th to 16th is not a number. Notice that in perl array indexes start from 0, as in C, not from 1 as in awk or Fortran. Commented Jan 24, 2019 at 18:05
  • @Parix In your awk script, you have 3ドル+...+14ドル; that should translate to @F[2..13] in perl, not to @F[4..15]. Commented Jan 25, 2019 at 2:26
1

It's not perl, but this seems to get the job done:

awk 'NR==1 {$(NF+1) = "FullYear"; print} NR>1 {subtotal=0; for(f=0;f<=NF; f++) {subtotal+=$f}; $(NF+1)=subtotal; printf( "%s %s %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f %5.10f\n", 1,ドル 2,ドル 3,ドル 4,ドル 5,ドル 6,ドル 7,ドル 8,ドル 9,ドル 10,ドル 11,ドル 12,ドル 13,ドル 14,ドル 15ドル ) }' inputfile
answered Jan 23, 2019 at 21:45
1

Just a variant of @derobert:

perl -MList::Util=sum -nlE 'say "$_ ", sum((split)[2..13])||"FullYear"' input

or using -a

perl -MList::Util=sum -nalE 'say "$_ ", sum(@F[2..13])||"FullYear"' input
answered Jan 23, 2019 at 21:59
5
  • Tries this approach too. This works if my numbers are up to 5 digits. However, it gives incorrect added numbers if my numbers are in 7 to 8 digits. Commented Jan 24, 2019 at 17:50
  • @Parix, could you please show me an example of such a situation? Commented Jan 24, 2019 at 18:43
  • Output for 2 records is as below: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FullYear col11 col12 2298846.53 2328664.3 2326527.39 2385298.77 2400046.08 2404192.36 2394351.11 2415755.8 2383387.25 2410001.65 2388574.37 2387894.37 26135645.61 col21 col22 13438.40828 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 14782.24911 176043.1485 The #1 record <-- gives incorrect added value The #2 record <-- gives correct added value Commented Jan 24, 2019 at 19:23
  • @Parix, with both variants, in my machine I get not "26135645.61" but "28523539.98" Commented Jan 25, 2019 at 17:01
  • hi.. seems my starting data columns are causing the issue. The calculation works where it's a single word. however, it is not considering the last column where in it's more than a single word. Though my entire file is tab delimited with data identifier as double quotes " Commented Mar 14, 2019 at 17:28

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.