12

Parsing an array using IFS with non-whites space values creates empty elements.
Even using tr -s to shrink multiple delims to a single delim isn't enough.
An example may explain the issue more clearly..
Is there a way to achieve "normal" results via a tweaking of IFS (is there an associated setting to change IFS's behaviour? .... ie. To act the same as the default whitespace IFS.

var=" abc def ghi "
echo "============== IFS=<default>"
arr=($var)
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done
#
sfi="$IFS" ; IFS=':'
set -f # Disable file name generation (globbing)
 # (This data won't "glob", but unless globbing 
 # is actually needed, turn if off, because 
 # unusual/unexpected combinations of data can glob!
 # and they can do it in the most obscure ways... 
 # With IFS, "you're not in Kansas any more! :) 
var=":abc::def:::ghi::::"
echo "============== IFS=$IFS"
arr=($var)
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done
echo "============== IFS=$IFS and tr"
arr=($(echo -n "$var"|tr -s "$IFS"))
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done
set +f # enable globbing 
IFS="$sfi" # re-instate original IFS val
echo "============== IFS=<default>"

Here is the output


============== IFS=<default>
# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"
============== IFS=:
# arr[0] ""
# arr[1] "abc"
# arr[2] ""
# arr[3] "def"
# arr[4] ""
# arr[5] ""
# arr[6] "ghi"
# arr[7] ""
# arr[8] ""
# arr[9] ""
============== IFS=: and tr
# arr[0] ""
# arr[1] "abc"
# arr[2] "def"
# arr[3] "ghi"
============== IFS=<default>
asked Feb 23, 2011 at 15:12
1

4 Answers 4

5

From bash manpage :

Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.

It means that IFS whitespace (space, tab and newline) is not treated like the other separators. If you want to get exactly the same behaviour with an alternative separator, you can do some separator swapping with the help of tr or sed :

var=":abc::def:::ghi::::"
arr=($(echo -n $var | sed 's/ /%#%#%#%#%/g;s/:/ /g'))
for x in ${!arr[*]} ; do
 el=$(echo -n $arr | sed 's/%#%#%#%#%/ /g')
 echo "# arr[$x] \"$el\""
done

The %#%#%#%#% thing is a magic value to replace the possible spaces inside the fields, it is expected to be "unique" (or very unlinkely). If you are sure that no space will ever be in the fields, just drop this part).

answered Feb 23, 2011 at 15:49
3
  • @FussyS... Thanks (see modificaton in my question ) ... You may have given me the answer to my intended question.. and that answer may be (probably is) "There is no way to get IFS to behave in the manner I want"... I intendet the tr examples to show the problem... I want to avoid a system call, so I'll look at a bash option beyond the ${var##:} which I mentioned in my comment to glen's ansewer..... I'll wait for a time.. maybe there is a way to coax IFS, otherwise the first part of your answer is was after.... Commented Feb 23, 2011 at 17:31
  • That treatment of IFS is the same in all Bourne-style shells, it's specified in POSIX. Commented Feb 23, 2011 at 21:25
  • 4-plus years since i asked this question - i found @nazad's answer (posted over a year ago) to be the simplest way to juggle IFS to create an array with any number and combination of IFS chars as delimiter-string. My question was best answered by jon_d, but @nazad's's answer shows a nifty way to use IFS with no loops and no utility apps. Commented May 7, 2015 at 2:48
4

To remove multiple (non-space) consecutive delimiter chars, two (string/array) parameter expansions can be used. The trick is to set the IFS variable to the empty string for the array parameter expansion.

This is documented in man bash under Word Splitting:

Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed.

(
set -f
str=':abc::def:::ghi::::'
IFS=':'
arr=(${str})
IFS=""
arr=(${arr[@]})
echo ${!arr[*]}
for ((i=0; i < ${#arr[@]}; i++)); do 
 echo "${i}: '${arr[${i}]}'"
done
)
answered Jan 24, 2014 at 23:14
4
  • Good! A simple and effective method - with no need for a bash loop and no need to call a utility app — BTW. As you mentioned "(non-space)", I'd point out, for clarity, that it works fine with any combination of delimiter chars, including space. Commented May 22, 2015 at 18:37
  • In my tests setting IFS=' ' (i.e. a whitespace) behaves the same. I find this less confusing than an explicit null argument ("" or '') of IFS. Commented Sep 22, 2015 at 15:13
  • That's kind of a terrible solution if your data contains embedded whitespace. This, if your data was 'a bc' instead of 'abc', IFS="" would split 'a' into a separate element from 'bc'. Commented Sep 24, 2015 at 15:19
  • @DejayClayton - I do not understand the solution thoroughly. But I tested with data containing white spaces on bash and POSIX sh, both seem work. White spaces are preserved. Commented Apr 19, 2022 at 23:51
1

You can do it with gawk too, but it's not pretty:

var=":abc::def:::ghi::::"
out=$( gawk -F ':+' '
 {
 # strip delimiters from the ends of the line
 sub("^"FS,"")
 sub(FS"$","")
 # then output in a bash-friendly format
 for (i=1;i<=NF;i++) printf("\"%s\" ", $i)
 print ""
 }
' <<< "$var" )
eval arr=($out)
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done

outputs

# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"
answered Feb 23, 2011 at 16:44
4
  • Thanks... I seem to have not been clear in my main request (modified question)... It's easy enough to do it by just changing my $var to ${var##:} ... I was really after a way to tweak IFS itself.. I want to do this without an external call (I have a feeling that bash can do this more effeciently than any external can.. so I'll keep on that track)... your method works (+1).... As far as modifying the input goes, I'd prefer to try it with bash, rather than awk or tr (it would avoid a system call), but I'm really hanging out for an IFS tweak... Commented Feb 23, 2011 at 17:57
  • @fred, as mentioned, IFS only slurps up multiple consecutive delimeters for the default whitespace value. Otherwise, consecutive delimiters results in extraneous empty fields. I expect one or two external calls is exceedingly unlikely to impact performance in any real way. Commented Feb 23, 2011 at 20:23
  • @glen.. (You said your answer is not "pretty".. I think it is! :) However, I have put together an all bash version (vs an external call) and based on 10000 itterations of just building the arrray (no I/O)... bash 1.276s ... call (awk) 0m32.210s ,,, call (tr) 0m32.178s ... Do that a few times and you might think bash is slow! ... Is awk easier in this case? ... not if you've already got the snippet :) ... I'll post it later; must go now. Commented Feb 24, 2011 at 3:00
  • Just by the way, re your gawk script... I've basically not used awk before, so I've been looking at it (and others) in detail...I can't pick why, but I'll mention the issue anyhow.. When given quoted data, it looses the quotes, and splits at spaces between the quotes.. and crashes for odd numbers of quotes... Here's the test data: var="The \"X\" factor:::A single '\"' crashes:::\"One Two\"" Commented Feb 24, 2011 at 16:28
1

As bash IFS does not provide an in-house way to treat consecutive delimiter chars as a single delimiter (for non-whitespace delimiters), I have put together an all bash version (vs.using an external call eg. tr, awk, sed)

It can handle mult-char IFS..

Here are its execution-time resu;ts, along with similar tests for the tr and awk options shown on this Q/A page... The tests are based on 10000 itterations of just building the arrray (with no I/O )...

pure bash 3.174s (28 char IFS)
call (awk) 0m32.210s (1 char IFS) 
call (tr) 0m32.178s (1 char IFS) 

Here is the output

# dlm_str = :.~!@#$%^&()_+-=`}{][ ";></,
# original = :abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'single*quote?'..123:
# unified = :abc::::def::::::::::::::::::::::::::::'single*quote?'::123:
# max-w 2^ = ::::::::::::::::
# shrunk.. = :abc:def:'single*quote?':123:
# arr[0] "abc"
# arr[1] "def"
# arr[2] "'single*quote?'"
# arr[3] "123"

Here is the script

#!/bin/bash
# Note: This script modifies the source string. 
# so work with a copy, if you need the original. 
# also: Use the name varG (Global) it's required by 'shrink_repeat_chars'
#
# NOTE: * asterisk in IFS causes a regex(?) issue, but * is ok in data. 
# NOTE: ? Question-mark in IFS causes a regex(?) issue, but ? is ok in data. 
# NOTE: 0..9 digits in IFS causes empty/wacky elements, but they're ok in data.
# NOTE: ' single quote in IFS; don't know yet, but ' is ok in data.
# 
function shrink_repeat_chars () # A 'tr -s' analog
{
 # Shrink repeating occurrences of char
 #
 # 1ドル: A string of delimiters which when consecutively repeated and are 
 # considered as a shrinkable group. A example is: " " whitespace delimiter.
 #
 # $varG A global var which contains the string to be "shrunk".
 #
# echo "# dlm_str = 1ドル" 
# echo "# original = $varG" 
 dlms="1ドル" # arg delimiter string
 dlm1=${dlms:0:1} # 1st delimiter char 
 dlmw=$dlm1 # work delimiter 
 # More than one delimiter char
 # ============================
 # When a delimiter contains more than one char.. ie (different byte` values), 
 # make all delimiter-chars in string $varG the same as the 1st delimiter char.
 ix=1;xx=${#dlms}; 
 while ((ix<xx)) ; do # Where more than one delim char, make all the same in varG 
 varG="${varG//${dlms:$ix:1}/$dlm1}"
 ix=$((ix+1))
 done
# echo "# unified = $varG" 
 #
 # Binary shrink
 # =============
 # Find the longest required "power of 2' group needed for a binary shrink
 while [[ "$varG" =~ .*$dlmw$dlmw.* ]] ; do dlmw=$dlmw$dlmw; done # double its length
# echo "# max-w 2^ = $dlmw"
 #
 # Shrik groups of delims to a single char
 while [[ ! "$dlmw" == "$dlm1" ]] ; do
 varG=${varG//${dlmw}$dlm1/$dlm1}
 dlmw=${dlmw:$((${#dlmw}/2))}
 done
 varG=${varG//${dlmw}$dlm1/$dlm1}
# echo "# shrunk.. = $varG"
}
# Main
 varG=':abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'\''single*quote?'\''..123:' 
 sfi="$IFS"; IFS=':.~!@#$%^&()_+-=`}{][ ";></,' # save original IFS and set new multi-char IFS
 set -f # disable globbing
 shrink_repeat_chars "$IFS" # The source string name must be $varG
 arr=(${varG:1}) # Strip leading dlim; A single trailing dlim is ok (strangely
 for ix in ${!arr[*]} ; do # Dump the array
 echo "# arr[$ix] \"${arr[ix]}\""
 done
 set +f # re-enable globbing 
 IFS="$sfi" # re-instate the original IFS
 #
exit
answered Feb 24, 2011 at 10:38
1
  • Great work, interesting +1! Commented Feb 10, 2013 at 14:43

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.