I have been picking and probing at Swift standard libraries sort()
function for its Array
type. To my surprise I have noticed it performs poorly on already-sorted data.
Sorting an array of Int
which is shuffled seems to be 5x faster than sorting that very same array when it is already sorted. Sorting an array of shuffled objects is about 4x faster than sorting the very same one already in sorted order (sorting object array vs Int
array use different algorithms I am sure so I sorted both to eliminate bias).
These are the results:
Shuffled Int array sort time: 1.3961209654808
Shuffled ColorObject array sort time: 3.14633798599243
NOnshuffled Int array sort time: 7.34714204072952
NOnshuffled ColorObject array sort time: 10.9310839772224
For reference below is my code:
class ElapsedTimer {
let startTime: CFAbsoluteTime
var endTime: CFAbsoluteTime?
init() {
startTime = CFAbsoluteTimeGetCurrent()
}
func stop() -> CFAbsoluteTime {
endTime = CFAbsoluteTimeGetCurrent()
return duration!
}
var duration: CFAbsoluteTime? {
if let endTime = endTime {
return endTime - startTime
} else {
return nil
}
}
}
public class CountedColor {
public private(set) var count: Int
public private(set) var color: UIColor
public init(color: UIColor, colorCount: Int) {
self.count = colorCount
self.color = color
}
}
var distributedIntArray = [Int]()
for value in 1..<1000000 {
distributedIntArray.append(value)
}
var distributedCountedColorArray = distributedIntArray.map{ CountedColor(color: UIColor.white, colorCount: 0ドル) }
distributedCountedColorArray.shuffle()
distributedIntArray.shuffle()
var timer = ElapsedTimer()
distributedIntArray.sort()
print("Shuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()
distributedCountedColorArray.sort{ return 0ドル.count < 1ドル.count }
print("Shuffled Color array sort time: \(timer.stop())")
timer = ElapsedTimer()
distributedIntArray.sort()
print("NOnshuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()
distributedCountedColorArray.sort{ return 0ドル.count < 1ドル.count }
print("Non shuffled Color array sort time: \(timer.stop())")
My array shuffle()
method was pulled from this post. My ElapsedTimer
simply wraps and uses CACurrentMediaTime()
functions.
My question is why am I seeing this behavior? Especially when I am sorting the object array which should surely be using a general purpose sort. What kind of general purpose sorting algorithm is swift using? It surely can’t be one where the worst case and average case are the same like mergeSort.
3 Answers 3
Swift uses Introsort. Looking at the source code we see that the chosen pivot is the first element. The wikipedia page on Introsort says:
(...), one of the critical operations is choosing the pivot: the element around which the list is partitioned. The simplest pivot selection algorithm is to take the first or the last element of the list as the pivot, causing poor behavior for the case of sorted or nearly sorted input.
Thus it is entirely predictable, given the implementation choice, that Swift's sorting performance is worst for sorted inputs.
I have built a complete benchmark for people who want to easily reproduce the OP's claims : https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/sort
For reference, the GNU ISO C++ standard library uses a median-of-3 pivot (as per the stl_algo.h
header).
-
Follow up: if choosing the first item as pivot is a bad choice, what would be a better implementation? Wikipedia mention median of 3 with some caveats. Seems to me it would be harder to come up with a worst case with median of 3 versus first item though.Emmanuel Oga– Emmanuel Oga12/10/2016 01:22:54Commented Dec 10, 2016 at 1:22
-
No longer true. 2020 ish switched to Timsort. See other answersaehlke– aehlke12/14/2024 09:07:52Commented Dec 14, 2024 at 9:07
In the Swift 5 evolution, the IntroSort algorithm was replaced with a modified version TimSort (implemented first in 2002 by Tim Peters for Python) in the 'sort()' method: https://github.com/apple/swift/blob/master/stdlib/public/core/Sort.swift
-
Great to know! If it's like Python's timsort, it's much faster for data that's already almost sorted! That's fantastic because it simplifies a lot of common use cases.aehlke– aehlke12/14/2024 09:07:29Commented Dec 14, 2024 at 9:07
This is a duplicate of this question it seems: Swift sorting algorithm implementation
Furthermore the reason why it performs so much better when shuffled is probably just because when its shuffled its performance is not hitting the upper bound of NlogN. Sorting the sorted array probably gets closer to that upper limit so its still all in all the same sort. But I dont know this is just theory
-
this is incorrect, the performance of swift’s sort on an already sorted array larger than a few down elements is
O(n**2)
. On a shuffled array it comes down toO(n*log(n))
.user6451264– user645126401/05/2017 17:16:32Commented Jan 5, 2017 at 17:16 -
By O(n**2) I assume you mean N squared. How can that be swifts sort is an intro sort, and introSorts worst case complexity is NlogN.AyBayBay– AyBayBay01/05/2017 17:18:30Commented Jan 5, 2017 at 17:18
-
As evidenced by the OP’s question, it appears swift’s sort does not behave like a proper introsort, or else it would not be vulnerable to ordered input. Perhaps the depth threshold is set very high that it predominantly exhibits quicksort behavior.user6451264– user645126401/05/2017 17:24:57Commented Jan 5, 2017 at 17:24
-
Potentially or the pivot selection is poor.AyBayBay– AyBayBay01/05/2017 17:29:51Commented Jan 5, 2017 at 17:29
-
pivot selection only changes which kind of patterned input the sort is vulnerable to. Though last I heard, swift was using the first element, which seems like an extraordinarily bad choice considering how common simple-sorted input is...user6451264– user645126401/05/2017 17:33:17Commented Jan 5, 2017 at 17:33
Explore related questions
See similar questions with these tags.
CountedColor
?