What general tips do you have for golfing in AWK? I'm looking for ideas that can be applied to code golf problems in general that are at least somewhat specific to AWK. Please post one tip per answer.
6 Answers 6
These are not in any particular order, and some of them might even apply to other languages too, but not to a lot of them I think.
You can sometimes use awk's simple string concatenation to assemble numbers
z=x*10+yanda=b*10become
z=x yanda=b 0You can use the ~ operator (match pattern on the right side) to compare numbers or strings if you know that the two parts meet certain conditions
for instance if you know that
awill always be smaller than or equal tob, you can replacea==bwith
a~bor if you want to check for an empty string
s(which means that s isn't allowed to be"0"either), and you haven's changed the variableXbefore, so that it is still undefined, you could useX~sinstead of
s==""if you want to check if i is an integer you can use
i!~/\./(idoesn't contain a dot)instead of
i==int(i)You can swap two numbers in a single command
t=a;a=b;b=tbecomes
a+=b-(b=a)saving one character
If you don't need them as input anymore, you can use the input variables 1,ドル 2,ドル 3,ドル ... as an array. So instead of writing a[n] you can just write $n. Sometimes even while reading the input from these you can use the already handled ones as a stack for something.
You can make good use of the built-in separators FS (space, if not changed by you) and RS (newline) when concatenating strings
a=1ドル" "2ドルbecomes
a=1ドルFS2ドルor even better
a=1ドル"\n"2ドルbecomes
a=1ドルRS2ドルIf there is no input and you use the BEGIN block, you can try to use the END block instead. Depending on the judge it works. If you test it, you have to press
Ctrl-D(end of input) to access theENDblock.If you want to skip the first line, you can use
C++instead of
NR>1if you won't be using
Canywhere in your program.
Techniques
Thanks to Peter Taylor for pointing out the language-generic version of this question for which I did not bother to search. I'll try to limit this answer to things that are not available in other C-like languages.
The frequently accessed builtin variables are 2 characters long. Find the break-even point for assigning to a temp variable.
a=1ドル;is five characters, so it breaks even after five uses of1ドル;Remember that the default action for a matching pattern is
print 0ドル, so if you get what you want printed into0ドル, just use1Getting creative with operator ordering can save you some parentheses or extra statements
for(i=1;i<=NF;i++)print$i;vs
for(;i<NF;)print$++i;
Examples
A solution to Do We Sink or Swim? in 70 characters:
n{for(;NF;NF--)s+=$NF;n--}NR==1{n=1ドル;p=3ドル}END{print p<s?"Swim":"Sink"}A solution to The Floating Horde in 44 characters:
{for(i=NF;i>1;){n=int($i/2);$i%=2;$--i+=n}}1
Summary
awk will probably never outgolf APL, Golfscript, J, or K, but you can quite consistently beat other high level languages.
-
3\$\begingroup\$ The question asks for tips which are somewhat specific to Awk. The ternary operator is included in the tips for all languages. \$\endgroup\$Peter Taylor– Peter Taylor2014年03月17日 20:35:38 +00:00Commented Mar 17, 2014 at 20:35
-
\$\begingroup\$ @laindir :
for(;i<NF;)print$++i;can becomeNF=NF OFS='\n'\$\endgroup\$RARE Kpop Manifesto– RARE Kpop Manifesto2022年11月01日 14:13:17 +00:00Commented Nov 1, 2022 at 14:13 -
1\$\begingroup\$ @laindir :
'{for(_=NF;1<_;)$--_+=($_-($_%=2))/2}_'i got the floating horde one down to 37 bytes \$\endgroup\$RARE Kpop Manifesto– RARE Kpop Manifesto2024年05月06日 22:00:13 +00:00Commented May 6, 2024 at 22:00 -
\$\begingroup\$ as for the sequential printing one, I think
'{while(_++<NF)print$_}'is same # bytes but more elegant \$\endgroup\$RARE Kpop Manifesto– RARE Kpop Manifesto2024年05月06日 22:04:25 +00:00Commented May 6, 2024 at 22:04 -
\$\begingroup\$
for(;i<NF;)print$++i;- that's kinda verbose for basically doingawk 'NF+=OFS=RS'\$\endgroup\$RARE Kpop Manifesto– RARE Kpop Manifesto2025年03月29日 00:44:03 +00:00Commented Mar 29 at 0:44
one of the great things about awk's sigils $ is that string concat can be even more condensed than having to use the built-ins as buffer zones - say, u wanna prepend a zero to the full row:
_=0 $_=_$_
jot -c 8 75 | gawk '$_=+_$_'
0K
0L
0M
0N
0O
0P
0Q
0R
and you wanna make patterns out of it ?
jot -c 8 75 | gawk '$_=_++$_' # integers
0K
L 1
M 2
N 3
O 4
P 5
Q 6
R 7
jot -c 8 75 | gawk '$_=_++$++_' # even numbers
K 0
L 2
M 4
N 6
O 8
P 10
Q 12
R 14
jot -c 8 75 | gawk '$_=++_$++_' # odd numbers
K 1
L 3
M 5
N 7
O 9
P 11
Q 13
R 15
And honestly, what language can repeat stings THIS easily :
jot 20 | mawk 'NF=OFS=$_'
1
22
333
4444
55555
666666
7777777
88888888
999999999
10101010101010101010
1111111111111111111111
121212121212121212121212
13131313131313131313131313
1414141414141414141414141414
151515151515151515151515151515
16161616161616161616161616161616
1717171717171717171717171717171717
181818181818181818181818181818181818
19191919191919191919191919191919191919
2020202020202020202020202020202020202020
or can decode arbitrary precision hex with this few keystrokes :
echo 0xEDCFAB12EDCFAB127659438976594389EDCFAB |
gawk -nM '$_=+$_'
5303367068685265828195859270035065456131166123
To read and process a number on each line:
{
n=1ドル;
print(n*n);
// OR printf("%d\n",n*n);
}
Compressed form (Length = 14):
{print(1ドル*1ドル)} // thanks due to @manatwork
Shorter Code (Length = 7)
1,0ドル^=2 // thanks due to @llhuii
When compiled and run in gawk with inputs:
1
2
3
Output:
1
4
9
-
2\$\begingroup\$ Why the variable? And why the parenthesis?
{print1ドル*1ドル}is shorter. \$\endgroup\$manatwork– manatwork2013年10月26日 11:02:55 +00:00Commented Oct 26, 2013 at 11:02 -
6\$\begingroup\$ 1,0ドル^=2 is shorter \$\endgroup\$llhuii– llhuii2013年12月08日 07:12:26 +00:00Commented Dec 8, 2013 at 7:12
-
\$\begingroup\$ For any input different from
0or null,0ドル^=2does the same. Careful that when using1,0ドル^=2, blank lines return0. \$\endgroup\$Pedro Maimere– Pedro Maimere2020年12月26日 16:11:21 +00:00Commented Dec 26, 2020 at 16:11 -
\$\begingroup\$ @PedroMaimere : this solves it
echo "1\n2\n3\n\n4\n5" | mawk '!NF || ($!_^=2)^_' 1 4 9 16 25.... or not write any numbers at allmawk '!NF || ($!_*=$!_)^_'\$\endgroup\$RARE Kpop Manifesto– RARE Kpop Manifesto2022年11月01日 11:46:24 +00:00Commented Nov 1, 2022 at 11:46 -
\$\begingroup\$ @PedroMaimere : better yet :
echo "0\n1\n2\n3\n\n4\n5" | mawk '!NF || ($!_^=++NF)^_' 0 1 4 9 16 25- actual input lines containing a zero will print out zero squared, while blank lines remain as such. u can directly square hexes withmawk:echo 0x4F0FFF9 | mawk '($!_^=++NF)^_' CONVFMT='%.f' 6872912880599089\$\endgroup\$RARE Kpop Manifesto– RARE Kpop Manifesto2022年11月01日 11:49:30 +00:00Commented Nov 1, 2022 at 11:49
regarding string concat freebies in awk, there are 5 scenarios where gapless concat is guaranteed to be safe (first 4 examples are attempting to either prepend or append some arbitrary string inside awk variable __)
Immediately trailing numbers : e.g.
367 __—>367__— ditto for fields referenced by digits. e.g.
19ドル __—>19ドル__
Immediately before fields/sigils : e.g.
__ $NF—>__$NF— to perform
++$NFor--$NF, do(__)++$NFinstead.
Immediately trailing array cells: e.g.
___[_] __—>___[_]__— this is mostly useful when performing array join with seps
Immediately trailing closing parenthesis
)(like grouping pairs or function calls) : e.g.split(...) __—>split(...)__, or(sumN - cntN) __—>(sumN - cntN)__— the primary use case for this is to concat an empty string after the grouping, which would force convert a numeric zero (
0) to string zero ("0") so it would evaluate to TRUE in any boolean context or pattern space. The alternative approach would be extra logic and verbosity to handle the edge case.
- The strangest one - conjuring up arbitrary chain of digits by concating the same variable repeatedly while altering its value along the way :
gawk -p- -be 'BEGIN { print _++_!_--_!_++_++_^_^_^_, _ }'
01001165536 2
# gawk profile, created Fri Mar 28 21:29:37 2025
# BEGIN rule(s)
BEGIN {
1 print _++ _ !_-- _ !_++ _++ _^_^_^_, _
}
— By end of the sequence,
_only has a measly value of2, since it never stored65,536back into itself
Truthy and falsey values
Boolean evaluation is somehow flexible in AWK, and this is awesome. Remember: AWK's basis is pattern{action}; when pattern is true, it executes action.
Besides doing their business, some built-in functions return values, e.g., split(), gsub(). They are also useful as a pattern when manipulating the input.
Examples of truthy and falsey variables
I tried to come up with valuable examples of how to exploit truthy/falsey variables. This is non-exhaustive. Anyone is encouraged to share more examples, and I would add them to this list.
"strings are truthy"{print 1} # truthy; strings are always truthy, except for the null string
-0xF3e10{print 2} # truthy; numbers different from zero are truthy
0{print 3} # falsey; zeroes are false: 0, -0, +0, 0x0, 0000... expect for "0", which is a string
0 b{print 4} # truthy; now the number 0 is concatenated to a null string (a variable still not assigned), thus converting it to "0"
""{print 5} # falsey; null string
b{print 6} # falsey; b is null (a variable not assigned)
a=@/x/{print 7} # truthy; defining a strongly typed regex constant. Different from a /x/ pattern
/x/{print 8} # falsey; /x/ does not match the input
c["test"]{print 9} # falsey; undefined item of array, null
c["test"]++{print 10} # falsey; variable is evaluated before increment
c["test"]{print 11} # truthy; now, c["test"] equals 1
c["test2"]{print 12} # falsey; although the c array now exists, the "test2" element does not
Result:
1
2
4
7
11