Find indices of white space characters in a string

Question 1

This function returns the indices of all white spaces as an array of Integer. It works fine with a small string:

func whiteSpacesIndices(value : String) -> Array<Int> {
 var indices: Array<Int> = []
 for (index, char) in value.enumerated() {
 if char == " " || char == "\n" || char == "\t" {
 indices.append(index)
 }
 }
 return indices
}

However when the string is too long, it could be very slow, because it is looping in every character.

Is there a better way for doing it?

Question 2

How long is the string tested? Could you please give the character count and the number of spaces?

Question 3

With a string that has 8447 characters and 1307 spaces, your code takes 0.5ms on my machine

Question 4

This returns integral positions of white space, but they're not indices, per say. I.e., they cannot be used to index back into the string

Question 5

General remarks

value, as a parameter name, isn't very descriptive,
The code considers that white spaces can only be " " or "\n" or "\t". This a performance optimization and supposes prior knowledge of the contents of the string. More generally you could make the check this way:
```
if char.isWhitespace {
 indices.append(index)
}
```
Array<Int> is not the same as [String.Index] of IndexSet. A String can be traversed using String.Index and not and Int.

Performance

The following codeis twice as fast in my tests, but doesn’t work with emoji :

func whiteSpacesIndices(in str : String) -> Array<Int> {
 var indices: Array<Int> = []
 let blanks: [UInt32] = [32, 10, 9] //these values correspond to space, new line, and tabulation respectively.
 for (index, scalar) in str.unicodeScalars.enumerated() {
 if blanks.contains(scalar.value) {
 indices.append(index)
 }
 }
 return indices
}

You can learn more about the Unicode scalar representation here.

Free function or instance method?

The whiteSpacesIndices function seems more like a property on strings. It is appropriate for a String to know about the indices of white spaces (and new lines) within itself:

extension String {
 var whiteSpaceIndices: [Int] {
 var indices = [Int]()
 let blanks: [UInt32] = [32, 10, 9]
 for (index, scalar) in self.unicodeScalars.enumerated() {
 if blanks.contains(scalar.value) {
 indices.append(index)
 }
 }
 return indices
 }
}

And could be used like so:

"Hello world!".whiteSpaceIndices //[5]
"ä ö ü".whiteSpaceIndices //[1, 3]

Question 6

Note that your method returns other results than the original code. For the string "ä ö ü" the original code returns [1, 3] and your code returns [2, 5].

Question 7

Btw, char.isWhitespace returns true for newline characters, the || char.isNewline is not needed.

Question 8

Still different results for "🇩🇪 👨🏼‍⚖️ x"

Question 9

@MartinR from my limited knowledge, I think if we’d like to take emoji into consideration, we’ll have to revert to the code in question. Do share any trick if there is any.

Question 10

The character/utf8/unicodeScalars are different and have different offsets. But it was not my intention to say that your code "does not work." I just wanted to make you aware of the difference. It may not be relevant for OP (who did not provide more information about the text), but should be mentioned.

ielyamani ielyamani 8891 gold badge5 silver badges18 bronze badges · Accepted Answer · 2019-06-08 19:29:40Z

General remarks

value, as a parameter name, isn't very descriptive,
The code considers that white spaces can only be " " or "\n" or "\t". This a performance optimization and supposes prior knowledge of the contents of the string. More generally you could make the check this way:
```
if char.isWhitespace {
 indices.append(index)
}
```
Array<Int> is not the same as [String.Index] of IndexSet. A String can be traversed using String.Index and not and Int.

Performance

The following codeis twice as fast in my tests, but doesn’t work with emoji :

func whiteSpacesIndices(in str : String) -> Array<Int> {
 var indices: Array<Int> = []
 let blanks: [UInt32] = [32, 10, 9] //these values correspond to space, new line, and tabulation respectively.
 for (index, scalar) in str.unicodeScalars.enumerated() {
 if blanks.contains(scalar.value) {
 indices.append(index)
 }
 }
 return indices
}

You can learn more about the Unicode scalar representation here.

Free function or instance method?

The whiteSpacesIndices function seems more like a property on strings. It is appropriate for a String to know about the indices of white spaces (and new lines) within itself:

extension String {
 var whiteSpaceIndices: [Int] {
 var indices = [Int]()
 let blanks: [UInt32] = [32, 10, 9]
 for (index, scalar) in self.unicodeScalars.enumerated() {
 if blanks.contains(scalar.value) {
 indices.append(index)
 }
 }
 return indices
 }
}

And could be used like so:

"Hello world!".whiteSpaceIndices //[5]
"ä ö ü".whiteSpaceIndices //[1, 3]

Note that your method returns other results than the original code. For the string "ä ö ü" the original code returns [1, 3] and your code returns [2, 5].
Btw, char.isWhitespace returns true for newline characters, the || char.isNewline is not needed.
Still different results for "🇩🇪 👨🏼‍⚖️ x"
@MartinR from my limited knowledge, I think if we’d like to take emoji into consideration, we’ll have to revert to the code in question. Do share any trick if there is any.
The character/utf8/unicodeScalars are different and have different offsets. But it was not my intention to say that your code "does not work." I just wanted to make you aware of the difference. It may not be relevant for OP (who did not provide more information about the text), but should be mentioned.

Stack Exchange Network

Find indices of white space characters in a string

1 Answer 1

General remarks

Performance

Free function or instance method?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find indices of white space characters in a string

1 Answer 1

General remarks

Performance

Free function or instance method?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions