Aug 26, 2017 at 10:10pm UTC
If i had to guess, there are far fewer cache misses with a vector. In a vector, everything is stored next to each other in memory, and searching through the vector runs through it sequentially.
To examine an item in a vector, as the lower_bound function does, requires fetching that item from memory. Fetching one item in a vector will, if the items are small enough, also fetch into the processor many other items next to it (because memory is fetched in big lumps, and since in a vector the items are all next to each other, when the processor fetches one item in the vector, that big lump of memory will contain the items right next to it). If the next item wanted has already been loading into the processor (into the processor's memory cache) there is no need for another expensive fetch all the way out to main memory.
Having to go all the way to main memory because the item needed is not already cached on the processor is known as a "cache miss"; the item needed is not already in local cache, and the processor has to fetch it all the way from main memory, which takes a long time.
To examine an item in a set, as the lower_bound function does, requires fetching that item from memory. A typical set will not store items next to each other in memory, so as you iterate through the set, every item looked at has to be fetched from main memory, which is very expensive (i.e. takes a long time).
What you've stumbled onto here is that to get performance, it's not enough to understand C++; you also need to understand the hardware it's running on.
You could test this hypothesis by iterating through the vector and outputting the memory locations of each element, and repeat with the set; see if they are contiguous in the vector and not contiguous in the set.
Last edited on Aug 26, 2017 at 10:30pm UTC