A colleague came up with the following search algorithm:
- We have an array of sorted, distinct integers [$i_0$ < $i_2$ < ... < $i_n$]
- We are looking for the index of $i_k$ (for simplicity suppose that $i_k$ is in the array)
- We suppose that the elements are pretty uniformly distributed between $i_1$ and $i_n$
The algorithm:
- let "position" be initially be 0
- if $i_k$ == $i_{position},ドル stop
- otherwise:
- calculate distance = $i_k$ - $i_{position}$
- if we have just switched direction (eg. the previous distance was positive and this one is negative or vice-versa), make sure that abs(distance) < abs(previous distance) by adding/subtracting 1 from distance such that it gets closer to 0. Note: if distance is already is 1 or -1, it isn't adjusted.
- let position be position + distance (see note below for possible optimization)
- if position is positive, let position be min(position, n) (ie. don't overshoot the end)
- if position is negative, let position be max(position, 0) (ie. don't undershoot the beginning)
I would be interested if somebody studied this algorithm and if so, what's its name? The best I could come up with is that it's a variant of "interpolation search", however I expect the above algorithm to be faster on x86 hardware because it only uses addition/subtraction which is faster than the division/multiplication used by "standard" interpolation search.
Note:
Instead of updating position by adding the distance (ie. position = position + distance) we can use the "average distance between consecutive elements" as a scaling factor: $$position \leftarrow position + distance / {i_n-i_0 \over n+1}$$
This should give a more precise estimate about the element position.
A disadvantage of the above is that it involves a division which is slow(er) on x86/x64 platforms. Perhaps we can get away with finding the closes power of two to the scaling factor (average distance between consecutive elements) and use shift to perform the division (ie. if 2ドル^t$ is the power of two closes to the scaling factor we can do $position \leftarrow position + distance >> k$.
Update:
- as people have pointed out the initial element needs to be $i_0$
- added steps to avoid oscillating indefinitely between undershooting / overshooting the target
- discovered the "galloping search" from Timsort which seems related but not the same
-
2$\begingroup$ This algorithm doesn't seem to work correctly. On the array $\{1,3,5\},ドル searching for 3ドル$ it will jump between 1ドル$ and 5ドル$ forever. $\endgroup$Tom van der Zanden– Tom van der Zanden2015年02月24日 14:00:22 +00:00Commented Feb 24, 2015 at 14:00
-
1$\begingroup$ Unless your array indices start at 0, the first time you do step 2 you'll refer to a non-existent array location. $\endgroup$Rick Decker– Rick Decker2015年02月24日 14:12:58 +00:00Commented Feb 24, 2015 at 14:12
-
$\begingroup$ @RickDecker - correct, I updated the text to reflect the fact that the array starts at 0. $\endgroup$Grey Panther– Grey Panther2015年02月24日 20:01:16 +00:00Commented Feb 24, 2015 at 20:01
-
$\begingroup$ @TomvanderZanden - correct. I updated the algorithm to avoid this "oscillating indefinitely" situation. $\endgroup$Grey Panther– Grey Panther2015年02月24日 20:01:52 +00:00Commented Feb 24, 2015 at 20:01
-
2$\begingroup$ @Cd-Man That resolves my concern with it oscillating indefinitely. However, it is still a pretty inefficient search algorithm since on an array $\{2,4,\ldots,2k-2,2k,2k+2,\ldots,4k-2,4k\},ドル searching for 2ドルk$ it needs linear time since it triggers your "stop it jumping" thing every step. $\endgroup$Tom van der Zanden– Tom van der Zanden2015年02月24日 20:49:12 +00:00Commented Feb 24, 2015 at 20:49
1 Answer 1
You may be interested to a related algorithm, known as Interpolation Search, which achieves on average the complexity of $O(\log \log n)$ for searching $n$ keys. This is the usual search you do when you scan a phone book for example.
This paper discusses the algorithm, which was originally introduced earlier (1957) by Peterson.
Related data structures: dynamic interpolation search and interpolation search for non independent data.
-
3$\begingroup$ @Cd-MaN is aware of interpolation search, he even mentions it in his question. He hopes his search algorithm improves upon interpolation search by removing the (relatively expensive on many processors) division operation. $\endgroup$Tom van der Zanden– Tom van der Zanden2015年02月25日 11:51:59 +00:00Commented Feb 25, 2015 at 11:51
Explore related questions
See similar questions with these tags.