Can IFS (Internal Field Separator) function as a single separator for multiple consecutive delimiter chars?

Question 1

Parsing an array using IFS with non-whites space values creates empty elements.
Even using tr -s to shrink multiple delims to a single delim isn't enough.
An example may explain the issue more clearly..
Is there a way to achieve "normal" results via a tweaking of IFS (is there an associated setting to change IFS's behaviour? .... ie. To act the same as the default whitespace IFS.

var=" abc def ghi "
echo "============== IFS=<default>"
arr=($var)
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done
#
sfi="$IFS" ; IFS=':'
set -f # Disable file name generation (globbing)
 # (This data won't "glob", but unless globbing 
 # is actually needed, turn if off, because 
 # unusual/unexpected combinations of data can glob!
 # and they can do it in the most obscure ways... 
 # With IFS, "you're not in Kansas any more! :) 
var=":abc::def:::ghi::::"
echo "============== IFS=$IFS"
arr=($var)
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done
echo "============== IFS=$IFS and tr"
arr=($(echo -n "$var"|tr -s "$IFS"))
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done
set +f # enable globbing 
IFS="$sfi" # re-instate original IFS val
echo "============== IFS=<default>"

Here is the output

============== IFS=<default>
# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"
============== IFS=:
# arr[0] ""
# arr[1] "abc"
# arr[2] ""
# arr[3] "def"
# arr[4] ""
# arr[5] ""
# arr[6] "ghi"
# arr[7] ""
# arr[8] ""
# arr[9] ""
============== IFS=: and tr
# arr[0] ""
# arr[1] "abc"
# arr[2] "def"
# arr[3] "ghi"
============== IFS=<default>

Question 2

There is a better (I think) answer to same question: stackoverflow.com/a/14789518/1765658

Question 3

From bash manpage :

Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.

It means that IFS whitespace (space, tab and newline) is not treated like the other separators. If you want to get exactly the same behaviour with an alternative separator, you can do some separator swapping with the help of tr or sed :

var=":abc::def:::ghi::::"
arr=($(echo -n $var | sed 's/ /%#%#%#%#%/g;s/:/ /g'))
for x in ${!arr[*]} ; do
 el=$(echo -n $arr | sed 's/%#%#%#%#%/ /g')
 echo "# arr[$x] \"$el\""
done

The %#%#%#%#% thing is a magic value to replace the possible spaces inside the fields, it is expected to be "unique" (or very unlinkely). If you are sure that no space will ever be in the fields, just drop this part).

Question 4

@FussyS... Thanks (see modificaton in my question ) ... You may have given me the answer to my intended question.. and that answer may be (probably is) "There is no way to get IFS to behave in the manner I want"... I intendet the tr examples to show the problem... I want to avoid a system call, so I'll look at a bash option beyond the ${var##:} which I mentioned in my comment to glen's ansewer..... I'll wait for a time.. maybe there is a way to coax IFS, otherwise the first part of your answer is was after....

Question 5

That treatment of IFS is the same in all Bourne-style shells, it's specified in POSIX.

Question 6

4-plus years since i asked this question - i found @nazad's answer (posted over a year ago) to be the simplest way to juggle IFS to create an array with any number and combination of IFS chars as delimiter-string. My question was best answered by jon_d, but @nazad's's answer shows a nifty way to use IFS with no loops and no utility apps.

Question 7

To remove multiple (non-space) consecutive delimiter chars, two (string/array) parameter expansions can be used. The trick is to set the IFS variable to the empty string for the array parameter expansion.

This is documented in man bash under Word Splitting:

Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed.

(
set -f
str=':abc::def:::ghi::::'
IFS=':'
arr=(${str})
IFS=""
arr=(${arr[@]})
echo ${!arr[*]}
for ((i=0; i < ${#arr[@]}; i++)); do 
 echo "${i}: '${arr[${i}]}'"
done
)

Question 8

Good! A simple and effective method - with no need for a bash loop and no need to call a utility app — BTW. As you mentioned "(non-space)", I'd point out, for clarity, that it works fine with any combination of delimiter chars, including space.

Question 9

In my tests setting IFS=' ' (i.e. a whitespace) behaves the same. I find this less confusing than an explicit null argument ("" or '') of IFS.

Question 10

That's kind of a terrible solution if your data contains embedded whitespace. This, if your data was 'a bc' instead of 'abc', IFS="" would split 'a' into a separate element from 'bc'.

Question 11

@DejayClayton - I do not understand the solution thoroughly. But I tested with data containing white spaces on bash and POSIX sh, both seem work. White spaces are preserved.

Question 12

You can do it with gawk too, but it's not pretty:

var=":abc::def:::ghi::::"
out=$( gawk -F ':+' '
 {
 # strip delimiters from the ends of the line
 sub("^"FS,"")
 sub(FS"$","")
 # then output in a bash-friendly format
 for (i=1;i<=NF;i++) printf("\"%s\" ", $i)
 print ""
 }
' <<< "$var" )
eval arr=($out)
for x in ${!arr[*]} ; do
 echo "# arr[$x] \"${arr[x]}\""
done

outputs

# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"

Question 13

Thanks... I seem to have not been clear in my main request (modified question)... It's easy enough to do it by just changing my $var to ${var##:} ... I was really after a way to tweak IFS itself.. I want to do this without an external call (I have a feeling that bash can do this more effeciently than any external can.. so I'll keep on that track)... your method works (+1).... As far as modifying the input goes, I'd prefer to try it with bash, rather than awk or tr (it would avoid a system call), but I'm really hanging out for an IFS tweak...

Question 14

@fred, as mentioned, IFS only slurps up multiple consecutive delimeters for the default whitespace value. Otherwise, consecutive delimiters results in extraneous empty fields. I expect one or two external calls is exceedingly unlikely to impact performance in any real way.

Question 15

@glen.. (You said your answer is not "pretty".. I think it is! :) However, I have put together an all bash version (vs an external call) and based on 10000 itterations of just building the arrray (no I/O)... bash 1.276s ... call (awk) 0m32.210s ,,, call (tr) 0m32.178s ... Do that a few times and you might think bash is slow! ... Is awk easier in this case? ... not if you've already got the snippet :) ... I'll post it later; must go now.

Question 16

Just by the way, re your gawk script... I've basically not used awk before, so I've been looking at it (and others) in detail...I can't pick why, but I'll mention the issue anyhow.. When given quoted data, it looses the quotes, and splits at spaces between the quotes.. and crashes for odd numbers of quotes... Here's the test data: var="The \"X\" factor:::A single '\"' crashes:::\"One Two\""

Question 17

As bash IFS does not provide an in-house way to treat consecutive delimiter chars as a single delimiter (for non-whitespace delimiters), I have put together an all bash version (vs.using an external call eg. tr, awk, sed)

It can handle mult-char IFS..

Here are its execution-time resu;ts, along with similar tests for the tr and awk options shown on this Q/A page... The tests are based on 10000 itterations of just building the arrray (with no I/O )...

pure bash 3.174s (28 char IFS)
call (awk) 0m32.210s (1 char IFS) 
call (tr) 0m32.178s (1 char IFS)

Here is the output

# dlm_str = :.~!@#$%^&()_+-=`}{][ ";></,
# original = :abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'single*quote?'..123:
# unified = :abc::::def::::::::::::::::::::::::::::'single*quote?'::123:
# max-w 2^ = ::::::::::::::::
# shrunk.. = :abc:def:'single*quote?':123:
# arr[0] "abc"
# arr[1] "def"
# arr[2] "'single*quote?'"
# arr[3] "123"

Here is the script

#!/bin/bash
# Note: This script modifies the source string. 
# so work with a copy, if you need the original. 
# also: Use the name varG (Global) it's required by 'shrink_repeat_chars'
#
# NOTE: * asterisk in IFS causes a regex(?) issue, but * is ok in data. 
# NOTE: ? Question-mark in IFS causes a regex(?) issue, but ? is ok in data. 
# NOTE: 0..9 digits in IFS causes empty/wacky elements, but they're ok in data.
# NOTE: ' single quote in IFS; don't know yet, but ' is ok in data.
# 
function shrink_repeat_chars () # A 'tr -s' analog
{
 # Shrink repeating occurrences of char
 #
 # 1ドル: A string of delimiters which when consecutively repeated and are 
 # considered as a shrinkable group. A example is: " " whitespace delimiter.
 #
 # $varG A global var which contains the string to be "shrunk".
 #
# echo "# dlm_str = 1ドル" 
# echo "# original = $varG" 
 dlms="1ドル" # arg delimiter string
 dlm1=${dlms:0:1} # 1st delimiter char 
 dlmw=$dlm1 # work delimiter 
 # More than one delimiter char
 # ============================
 # When a delimiter contains more than one char.. ie (different byte` values), 
 # make all delimiter-chars in string $varG the same as the 1st delimiter char.
 ix=1;xx=${#dlms}; 
 while ((ix<xx)) ; do # Where more than one delim char, make all the same in varG 
 varG="${varG//${dlms:$ix:1}/$dlm1}"
 ix=$((ix+1))
 done
# echo "# unified = $varG" 
 #
 # Binary shrink
 # =============
 # Find the longest required "power of 2' group needed for a binary shrink
 while [[ "$varG" =~ .*$dlmw$dlmw.* ]] ; do dlmw=$dlmw$dlmw; done # double its length
# echo "# max-w 2^ = $dlmw"
 #
 # Shrik groups of delims to a single char
 while [[ ! "$dlmw" == "$dlm1" ]] ; do
 varG=${varG//${dlmw}$dlm1/$dlm1}
 dlmw=${dlmw:$((${#dlmw}/2))}
 done
 varG=${varG//${dlmw}$dlm1/$dlm1}
# echo "# shrunk.. = $varG"
}
# Main
 varG=':abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'\''single*quote?'\''..123:' 
 sfi="$IFS"; IFS=':.~!@#$%^&()_+-=`}{][ ";></,' # save original IFS and set new multi-char IFS
 set -f # disable globbing
 shrink_repeat_chars "$IFS" # The source string name must be $varG
 arr=(${varG:1}) # Strip leading dlim; A single trailing dlim is ok (strangely
 for ix in ${!arr[*]} ; do # Dump the array
 echo "# arr[$ix] \"${arr[ix]}\""
 done
 set +f # re-enable globbing 
 IFS="$sfi" # re-instate the original IFS
 #
exit

Question 18

Great work, interesting +1!

jon_d jon_d 1,0437 silver badges8 bronze badges · Accepted Answer · 2011-02-23 15:49:51Z

From bash manpage :

Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.

It means that IFS whitespace (space, tab and newline) is not treated like the other separators. If you want to get exactly the same behaviour with an alternative separator, you can do some separator swapping with the help of tr or sed :

var=":abc::def:::ghi::::"
arr=($(echo -n $var | sed 's/ /%#%#%#%#%/g;s/:/ /g'))
for x in ${!arr[*]} ; do
 el=$(echo -n $arr | sed 's/%#%#%#%#%/ /g')
 echo "# arr[$x] \"$el\""
done

The %#%#%#%#% thing is a magic value to replace the possible spaces inside the fields, it is expected to be "unique" (or very unlinkely). If you are sure that no space will ever be in the fields, just drop this part).

@FussyS... Thanks (see modificaton in my question ) ... You may have given me the answer to my intended question.. and that answer may be (probably is) "There is no way to get IFS to behave in the manner I want"... I intendet the tr examples to show the problem... I want to avoid a system call, so I'll look at a bash option beyond the ${var##:} which I mentioned in my comment to glen's ansewer..... I'll wait for a time.. maybe there is a way to coax IFS, otherwise the first part of your answer is was after....
That treatment of IFS is the same in all Bourne-style shells, it's specified in POSIX.
4-plus years since i asked this question - i found @nazad's answer (posted over a year ago) to be the simplest way to juggle IFS to create an array with any number and combination of IFS chars as delimiter-string. My question was best answered by jon_d, but @nazad's's answer shows a nifty way to use IFS with no loops and no utility apps.

Stack Exchange Network

Can IFS (Internal Field Separator) function as a single separator for multiple consecutive delimiter chars?

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

Can IFS (Internal Field Separator) function as a single separator for multiple consecutive delimiter chars?

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions