This function returns the indices of all white spaces as an array of Integer. It works fine with a small string:
func whiteSpacesIndices(value : String) -> Array<Int> {
var indices: Array<Int> = []
for (index, char) in value.enumerated() {
if char == " " || char == "\n" || char == "\t" {
indices.append(index)
}
}
return indices
}
However when the string is too long, it could be very slow, because it is looping in every character.
Is there a better way for doing it?
1 Answer 1
General remarks
value
, as a parameter name, isn't very descriptive,The code considers that white spaces can only be
" "
or"\n"
or"\t"
. This a performance optimization and supposes prior knowledge of the contents of the string. More generally you could make the check this way:if char.isWhitespace { indices.append(index) }
Array<Int>
is not the same as[String.Index]
of IndexSet. AString
can be traversed usingString.Index
and not andInt
.
Performance
The following codeis twice as fast in my tests, but doesn’t work with emoji :
func whiteSpacesIndices(in str : String) -> Array<Int> {
var indices: Array<Int> = []
let blanks: [UInt32] = [32, 10, 9] //these values correspond to space, new line, and tabulation respectively.
for (index, scalar) in str.unicodeScalars.enumerated() {
if blanks.contains(scalar.value) {
indices.append(index)
}
}
return indices
}
You can learn more about the Unicode scalar representation here.
Free function or instance method?
The whiteSpacesIndices
function seems more like a property on strings. It is appropriate for a String to know about the indices of white spaces (and new lines) within itself:
extension String {
var whiteSpaceIndices: [Int] {
var indices = [Int]()
let blanks: [UInt32] = [32, 10, 9]
for (index, scalar) in self.unicodeScalars.enumerated() {
if blanks.contains(scalar.value) {
indices.append(index)
}
}
return indices
}
}
And could be used like so:
"Hello world!".whiteSpaceIndices //[5]
"ä ö ü".whiteSpaceIndices //[1, 3]
-
\$\begingroup\$ Note that your method returns other results than the original code. For the string
"ä ö ü"
the original code returns[1, 3]
and your code returns[2, 5]
. \$\endgroup\$Martin R– Martin R2019年06月08日 19:49:55 +00:00Commented Jun 8, 2019 at 19:49 -
\$\begingroup\$ Btw,
char.isWhitespace
returns true for newline characters, the|| char.isNewline
is not needed. \$\endgroup\$Martin R– Martin R2019年06月08日 19:54:04 +00:00Commented Jun 8, 2019 at 19:54 -
\$\begingroup\$ Still different results for
"🇩🇪 👨🏼⚖️ x"
\$\endgroup\$Martin R– Martin R2019年06月09日 04:37:00 +00:00Commented Jun 9, 2019 at 4:37 -
\$\begingroup\$ @MartinR from my limited knowledge, I think if we’d like to take emoji into consideration, we’ll have to revert to the code in question. Do share any trick if there is any. \$\endgroup\$ielyamani– ielyamani2019年06月09日 04:56:49 +00:00Commented Jun 9, 2019 at 4:56
-
1\$\begingroup\$ The character/utf8/unicodeScalars are different and have different offsets. But it was not my intention to say that your code "does not work." I just wanted to make you aware of the difference. It may not be relevant for OP (who did not provide more information about the text), but should be mentioned. \$\endgroup\$Martin R– Martin R2019年06月09日 05:08:57 +00:00Commented Jun 9, 2019 at 5:08
8447
characters and1307
spaces, your code takes 0.5ms on my machine \$\endgroup\$