Parsing an array using IFS with non-whites space values creates empty elements.
Even using tr -s
to shrink multiple delims to a single delim isn't enough.
An example may explain the issue more clearly..
Is there a way to achieve "normal" results via a tweaking of IFS (is there an associated setting to change IFS's behaviour? .... ie. To act the same as the default whitespace IFS.
var=" abc def ghi "
echo "============== IFS=<default>"
arr=($var)
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
#
sfi="$IFS" ; IFS=':'
set -f # Disable file name generation (globbing)
# (This data won't "glob", but unless globbing
# is actually needed, turn if off, because
# unusual/unexpected combinations of data can glob!
# and they can do it in the most obscure ways...
# With IFS, "you're not in Kansas any more! :)
var=":abc::def:::ghi::::"
echo "============== IFS=$IFS"
arr=($var)
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
echo "============== IFS=$IFS and tr"
arr=($(echo -n "$var"|tr -s "$IFS"))
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
set +f # enable globbing
IFS="$sfi" # re-instate original IFS val
echo "============== IFS=<default>"
Here is the output
============== IFS=<default>
# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"
============== IFS=:
# arr[0] ""
# arr[1] "abc"
# arr[2] ""
# arr[3] "def"
# arr[4] ""
# arr[5] ""
# arr[6] "ghi"
# arr[7] ""
# arr[8] ""
# arr[9] ""
============== IFS=: and tr
# arr[0] ""
# arr[1] "abc"
# arr[2] "def"
# arr[3] "ghi"
============== IFS=<default>
-
There is a better (I think) answer to same question: stackoverflow.com/a/14789518/1765658F. Hauri - Give Up GitHub– F. Hauri - Give Up GitHub2013年02月10日 17:35:24 +00:00Commented Feb 10, 2013 at 17:35
4 Answers 4
From bash
manpage :
Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.
It means that IFS whitespace (space, tab and newline) is not treated like the other separators. If you want to get exactly the same behaviour with an alternative separator, you can do some separator swapping with the help of tr
or sed
:
var=":abc::def:::ghi::::"
arr=($(echo -n $var | sed 's/ /%#%#%#%#%/g;s/:/ /g'))
for x in ${!arr[*]} ; do
el=$(echo -n $arr | sed 's/%#%#%#%#%/ /g')
echo "# arr[$x] \"$el\""
done
The %#%#%#%#%
thing is a magic value to replace the possible spaces inside the fields, it is expected to be "unique" (or very unlinkely). If you are sure that no space will ever be in the fields, just drop this part).
-
@FussyS... Thanks (see modificaton in my question ) ... You may have given me the answer to my intended question.. and that answer may be (probably is) "There is no way to get IFS to behave in the manner I want"... I intendet the
tr
examples to show the problem... I want to avoid a system call, so I'll look at a bash option beyond the${var##:}
which I mentioned in my comment to glen's ansewer..... I'll wait for a time.. maybe there is a way to coax IFS, otherwise the first part of your answer is was after....Peter.O– Peter.O2011年02月23日 17:31:30 +00:00Commented Feb 23, 2011 at 17:31 -
That treatment of
IFS
is the same in all Bourne-style shells, it's specified in POSIX.Gilles 'SO- stop being evil'– Gilles 'SO- stop being evil'2011年02月23日 21:25:27 +00:00Commented Feb 23, 2011 at 21:25 -
4-plus years since i asked this question - i found @nazad's answer (posted over a year ago) to be the simplest way to juggle IFS to create an array with any number and combination of
IFS
chars as delimiter-string. My question was best answered byjon_d
, but @nazad's's answer shows a nifty way to useIFS
with no loops and no utility apps.Peter.O– Peter.O2015年05月07日 02:48:50 +00:00Commented May 7, 2015 at 2:48
To remove multiple (non-space) consecutive delimiter chars, two (string/array) parameter expansions can be used. The trick is to set the IFS
variable to the empty string for the array parameter expansion.
This is documented in man bash
under Word Splitting:
Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed.
(
set -f
str=':abc::def:::ghi::::'
IFS=':'
arr=(${str})
IFS=""
arr=(${arr[@]})
echo ${!arr[*]}
for ((i=0; i < ${#arr[@]}; i++)); do
echo "${i}: '${arr[${i}]}'"
done
)
-
Good! A simple and effective method - with no need for a bash loop and no need to call a utility app — BTW. As you mentioned "(non-space)", I'd point out, for clarity, that it works fine with any combination of delimiter chars, including space.Peter.O– Peter.O2015年05月22日 18:37:52 +00:00Commented May 22, 2015 at 18:37
-
In my tests setting
IFS=' '
(i.e. a whitespace) behaves the same. I find this less confusing than an explicit null argument ("" or '') ofIFS
.Micha Wiedenmann– Micha Wiedenmann2015年09月22日 15:13:34 +00:00Commented Sep 22, 2015 at 15:13 -
That's kind of a terrible solution if your data contains embedded whitespace. This, if your data was 'a bc' instead of 'abc', IFS="" would split 'a' into a separate element from 'bc'.Dejay Clayton– Dejay Clayton2015年09月24日 15:19:24 +00:00Commented Sep 24, 2015 at 15:19
-
@DejayClayton - I do not understand the solution thoroughly. But I tested with data containing white spaces on bash and POSIX sh, both seem work. White spaces are preserved.midnite– midnite2022年04月19日 23:51:40 +00:00Commented Apr 19, 2022 at 23:51
You can do it with gawk too, but it's not pretty:
var=":abc::def:::ghi::::"
out=$( gawk -F ':+' '
{
# strip delimiters from the ends of the line
sub("^"FS,"")
sub(FS"$","")
# then output in a bash-friendly format
for (i=1;i<=NF;i++) printf("\"%s\" ", $i)
print ""
}
' <<< "$var" )
eval arr=($out)
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
outputs
# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"
-
Thanks... I seem to have not been clear in my main request (modified question)... It's easy enough to do it by just changing my
$var
to${var##:}
... I was really after a way to tweak IFS itself.. I want to do this without an external call (I have a feeling that bash can do this more effeciently than any external can.. so I'll keep on that track)... your method works (+1).... As far as modifying the input goes, I'd prefer to try it with bash, rather than awk or tr (it would avoid a system call), but I'm really hanging out for an IFS tweak...Peter.O– Peter.O2011年02月23日 17:57:56 +00:00Commented Feb 23, 2011 at 17:57 -
@fred, as mentioned, IFS only slurps up multiple consecutive delimeters for the default whitespace value. Otherwise, consecutive delimiters results in extraneous empty fields. I expect one or two external calls is exceedingly unlikely to impact performance in any real way.glenn jackman– glenn jackman2011年02月23日 20:23:38 +00:00Commented Feb 23, 2011 at 20:23
-
@glen.. (You said your answer is not "pretty".. I think it is! :) However, I have put together an all bash version (vs an external call) and based on 10000 itterations of just building the arrray (no I/O)...
bash 1.276s
...call (awk) 0m32.210s
,,,call (tr) 0m32.178s
... Do that a few times and you might think bash is slow! ... Is awk easier in this case? ... not if you've already got the snippet :) ... I'll post it later; must go now.Peter.O– Peter.O2011年02月24日 03:00:15 +00:00Commented Feb 24, 2011 at 3:00 -
Just by the way, re your gawk script... I've basically not used awk before, so I've been looking at it (and others) in detail...I can't pick why, but I'll mention the issue anyhow.. When given quoted data, it looses the quotes, and splits at spaces between the quotes.. and crashes for odd numbers of quotes... Here's the test data:
var="The \"X\" factor:::A single '\"' crashes:::\"One Two\""
Peter.O– Peter.O2011年02月24日 16:28:45 +00:00Commented Feb 24, 2011 at 16:28
As bash IFS does not provide an in-house way to treat consecutive delimiter chars as a single delimiter (for non-whitespace delimiters), I have put together an all bash version (vs.using an external call eg. tr, awk, sed)
It can handle mult-char IFS..
Here are its execution-time resu;ts, along with similar tests for the tr
and awk
options shown on this Q/A page... The tests are based on 10000 itterations of just building the arrray (with no I/O )...
pure bash 3.174s (28 char IFS)
call (awk) 0m32.210s (1 char IFS)
call (tr) 0m32.178s (1 char IFS)
Here is the output
# dlm_str = :.~!@#$%^&()_+-=`}{][ ";></,
# original = :abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'single*quote?'..123:
# unified = :abc::::def::::::::::::::::::::::::::::'single*quote?'::123:
# max-w 2^ = ::::::::::::::::
# shrunk.. = :abc:def:'single*quote?':123:
# arr[0] "abc"
# arr[1] "def"
# arr[2] "'single*quote?'"
# arr[3] "123"
Here is the script
#!/bin/bash
# Note: This script modifies the source string.
# so work with a copy, if you need the original.
# also: Use the name varG (Global) it's required by 'shrink_repeat_chars'
#
# NOTE: * asterisk in IFS causes a regex(?) issue, but * is ok in data.
# NOTE: ? Question-mark in IFS causes a regex(?) issue, but ? is ok in data.
# NOTE: 0..9 digits in IFS causes empty/wacky elements, but they're ok in data.
# NOTE: ' single quote in IFS; don't know yet, but ' is ok in data.
#
function shrink_repeat_chars () # A 'tr -s' analog
{
# Shrink repeating occurrences of char
#
# 1ドル: A string of delimiters which when consecutively repeated and are
# considered as a shrinkable group. A example is: " " whitespace delimiter.
#
# $varG A global var which contains the string to be "shrunk".
#
# echo "# dlm_str = 1ドル"
# echo "# original = $varG"
dlms="1ドル" # arg delimiter string
dlm1=${dlms:0:1} # 1st delimiter char
dlmw=$dlm1 # work delimiter
# More than one delimiter char
# ============================
# When a delimiter contains more than one char.. ie (different byte` values),
# make all delimiter-chars in string $varG the same as the 1st delimiter char.
ix=1;xx=${#dlms};
while ((ix<xx)) ; do # Where more than one delim char, make all the same in varG
varG="${varG//${dlms:$ix:1}/$dlm1}"
ix=$((ix+1))
done
# echo "# unified = $varG"
#
# Binary shrink
# =============
# Find the longest required "power of 2' group needed for a binary shrink
while [[ "$varG" =~ .*$dlmw$dlmw.* ]] ; do dlmw=$dlmw$dlmw; done # double its length
# echo "# max-w 2^ = $dlmw"
#
# Shrik groups of delims to a single char
while [[ ! "$dlmw" == "$dlm1" ]] ; do
varG=${varG//${dlmw}$dlm1/$dlm1}
dlmw=${dlmw:$((${#dlmw}/2))}
done
varG=${varG//${dlmw}$dlm1/$dlm1}
# echo "# shrunk.. = $varG"
}
# Main
varG=':abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'\''single*quote?'\''..123:'
sfi="$IFS"; IFS=':.~!@#$%^&()_+-=`}{][ ";></,' # save original IFS and set new multi-char IFS
set -f # disable globbing
shrink_repeat_chars "$IFS" # The source string name must be $varG
arr=(${varG:1}) # Strip leading dlim; A single trailing dlim is ok (strangely
for ix in ${!arr[*]} ; do # Dump the array
echo "# arr[$ix] \"${arr[ix]}\""
done
set +f # re-enable globbing
IFS="$sfi" # re-instate the original IFS
#
exit
-
Great work, interesting +1!F. Hauri - Give Up GitHub– F. Hauri - Give Up GitHub2013年02月10日 14:43:45 +00:00Commented Feb 10, 2013 at 14:43