- 1.7k
- 3
- 21
- 36
The consideration around the UTF18UTF16 approach depends if you're going to be using text that has multi-width characters and how you're getting the actual start and end values to make the substring. Here's a post that I found useful getting my head around these options: https://oleb.net/blog/2016/08/swift-3-strings/
Results... π₯
It also looks like returning nil
instead of an empty string gives a bit of a boost, but you might loose that small advantage elsewhere in your code depending on how you handle the nil` return. Also my benchmark did 1 million iterations and got a tiny difference, so I can't imagine what application this would have a practical improvement in anyway.
The consideration around the UTF18 approach depends if you're going to be using text that has multi-width characters and how you're getting the actual start and end values to make the substring. Here's a post that I found useful getting my head around these options: https://oleb.net/blog/2016/08/swift-3-strings/
Results... π₯
It also looks like returning nil
instead of an empty string gives a bit of a boost, but you might loose that small advantage elsewhere in your code depending on how you handle the nil` return. Also my benchmark did 1 million iterations and got a tiny difference, so I can't imagine what application this would have a practical improvement in anyway.
The consideration around the UTF16 approach depends if you're going to be using text that has multi-width characters and how you're getting the actual start and end values to make the substring. Here's a post that I found useful getting my head around these options: https://oleb.net/blog/2016/08/swift-3-strings/
It also looks like returning nil
instead of an empty string gives a bit of a boost, but you might loose that small advantage elsewhere in your code depending on how you handle the nil` return.
- Baseline approach (in question)
- Use
isEmpty
and calculate endIndex from startIndex - Use
isEmpty
and create UTF16 index directly from Int - Use
isEmpty
,create UTF16 index from Int and return String?
1. Baseline approach (in question)
2. Alternative (use isEmpty and calculate endIndex from startIndex)
3. UTF16 (use isEmpty and create UTF16 index directly from Int)
4. UTF16 nil (use isEmpty, create UTF16 index from Int and return String?)
-1. baselineBaseline -> 1.151s (2% STDEV)
-2. alternativeAlternative -> 0.633s (1% STDEV)
-3. UTF16 -> 0.408s (2% STDEV)
-4. UTF16 nil -> 0.404s (1% STDEV)
-1. baselineBaseline -> 0.074s (4% STDEV)
-2. alternativeAlternative -> 0.024s (12% STDEV)
-3. UTF16 -> 0.024s (11% STDEV)
-4. UTF16 nil -> 0.019s (12% STDEV)
- Baseline approach (in question)
- Use
isEmpty
and calculate endIndex from startIndex - Use
isEmpty
and create UTF16 index directly from Int - Use
isEmpty
,create UTF16 index from Int and return String?
- baseline -> 1.151s (2% STDEV)
- alternative -> 0.633s (1% STDEV)
- UTF16 -> 0.408s (2% STDEV)
- UTF16 nil -> 0.404s (1% STDEV)
- baseline -> 0.074s (4% STDEV)
- alternative -> 0.024s (12% STDEV)
- UTF16 -> 0.024s (11% STDEV)
- UTF16 nil -> 0.019s (12% STDEV)
1. Baseline approach (in question)
2. Alternative (use isEmpty and calculate endIndex from startIndex)
3. UTF16 (use isEmpty and create UTF16 index directly from Int)
4. UTF16 nil (use isEmpty, create UTF16 index from Int and return String?)
1. Baseline -> 1.151s (2% STDEV)
2. Alternative -> 0.633s (1% STDEV)
3. UTF16 -> 0.408s (2% STDEV)
4. UTF16 nil -> 0.404s (1% STDEV)
1. Baseline -> 0.074s (4% STDEV)
2. Alternative -> 0.024s (12% STDEV)
3. UTF16 -> 0.024s (11% STDEV)
4. UTF16 nil -> 0.019s (12% STDEV)
I see twosome things that might help:
index(_:, offsetBy:)
is O(n) where n is the amount you're offsetting, so you can squeeze a bit out of calculating the endIndex
as an offset from the startIndex
especially if you're getting substrings from near the end of the string:
The consideration around the UTF18 approach depends if you're going to be using text that has multi-width characters (emoji, non-latin character sets) because thenand how you're not going to have 1:1 relationship between charactergetting the actual start and end values to make the UTF16 indexessubstring. Here's a post that I found useful getting my head around these options: https://oleb.net/blog/2016/08/swift-3-strings/
Lastly, for yourWhen you check to see if the string is empty you're checking if the character count is zero. self.characters.count == 0
is O(n) where n is the number of characters, you can get some performance increase here by using self.isEmpty
which is O(1).
edit: added 4th option that returns String?
Finally, with the UTF16 option there's the need to cast to a String and force unwrap if you want to return the type String
. An alternative could be to return nil
instead as your early exit:
I ran a quick profile in Xcode comparing threethose four options:
- Baseline approach (in question)
- UsingUse
isEmpty
check and calculatingcalculate endIndex from startIndex - UsingUse
isEmpty
check and UTF18 substringcreate UTF16 index directly from Int - Use
isEmpty
,create UTF16 index from Int and return String?
- Option 1: 1.302s
- Option 2: 0.729s
- Option 3: 0.462s
Benchmark substring using "hello tests".substring(1,10)
- baseline -> 1.151s (2% STDEV)
- alternative -> 0.633s (1% STDEV)
- UTF16 -> 0.408s (2% STDEV)
- UTF16 nil -> 0.404s (1% STDEV)
Benchmark early exit using "".substring(1,10)
- baseline -> 0.074s (4% STDEV)
- alternative -> 0.024s (12% STDEV)
- UTF16 -> 0.024s (11% STDEV)
- UTF16 nil -> 0.019s (12% STDEV)
Here's a gist of the test I used for full transparency: https://gist.github.com/mathewsanders/c4c43915c5e1c13e8fe3b912bf4c27d1
So absolutely use isIndex
instead of counting characters, and maybe consider using the UTF18UTF16 view if it's appropriate for the text you'll be making substrings from!.
It also looks like returning nil
instead of an empty string gives a bit of a boost, but you might loose that small advantage elsewhere in your code depending on how you handle the nil` return. Also my benchmark did 1 million iterations and got a tiny difference, so I can't imagine what application this would have a practical improvement in anyway.
I see two things that might help:
index(_:, offsetBy:)
is O(n) where n is the amount you're offsetting, so you can squeeze a bit out of calculating the endIndex
as an offset from the startIndex
:
The consideration around the UTF18 approach depends if you're going to be using text that has multi-width characters (emoji, non-latin character sets) because then you're not going to have 1:1 relationship between character and the UTF16 indexes.
Lastly, for your check to see if the string is empty you're checking if the character count is zero. self.characters.count == 0
is O(n) where n is the number of characters, you can get some performance increase here by using self.isEmpty
which is O(1).
I ran a quick profile in Xcode comparing three options:
- Baseline approach (in question)
- Using
isEmpty
check and calculating endIndex from startIndex - Using
isEmpty
check and UTF18 substring
- Option 1: 1.302s
- Option 2: 0.729s
- Option 3: 0.462s
So absolutely use isIndex
instead of counting characters, and maybe consider using the UTF18 view if it's appropriate for the text you'll be making substrings from!
I see some things that might help:
index(_:, offsetBy:)
is O(n) where n is the amount you're offsetting, so you can squeeze a bit out of calculating the endIndex
as an offset from the startIndex
especially if you're getting substrings from near the end of the string:
The consideration around the UTF18 approach depends if you're going to be using text that has multi-width characters and how you're getting the actual start and end values to make the substring. Here's a post that I found useful getting my head around these options: https://oleb.net/blog/2016/08/swift-3-strings/
When you check to see if the string is empty you're checking if the character count is zero. self.characters.count == 0
is O(n) where n is the number of characters, you can get some performance increase here by using self.isEmpty
which is O(1).
edit: added 4th option that returns String?
Finally, with the UTF16 option there's the need to cast to a String and force unwrap if you want to return the type String
. An alternative could be to return nil
instead as your early exit:
I ran a quick profile in Xcode comparing those four options:
- Baseline approach (in question)
- Use
isEmpty
and calculate endIndex from startIndex - Use
isEmpty
and create UTF16 index directly from Int - Use
isEmpty
,create UTF16 index from Int and return String?
Benchmark substring using "hello tests".substring(1,10)
- baseline -> 1.151s (2% STDEV)
- alternative -> 0.633s (1% STDEV)
- UTF16 -> 0.408s (2% STDEV)
- UTF16 nil -> 0.404s (1% STDEV)
Benchmark early exit using "".substring(1,10)
- baseline -> 0.074s (4% STDEV)
- alternative -> 0.024s (12% STDEV)
- UTF16 -> 0.024s (11% STDEV)
- UTF16 nil -> 0.019s (12% STDEV)
Here's a gist of the test I used for full transparency: https://gist.github.com/mathewsanders/c4c43915c5e1c13e8fe3b912bf4c27d1
So absolutely use isIndex
instead of counting characters, and maybe consider using the UTF16 view if it's appropriate for the text you'll be making substrings from.
It also looks like returning nil
instead of an empty string gives a bit of a boost, but you might loose that small advantage elsewhere in your code depending on how you handle the nil` return. Also my benchmark did 1 million iterations and got a tiny difference, so I can't imagine what application this would have a practical improvement in anyway.