6
$\begingroup$

A colleague came up with the following search algorithm:

  • We have an array of sorted, distinct integers [$i_0$ < $i_2$ < ... < $i_n$]
  • We are looking for the index of $i_k$ (for simplicity suppose that $i_k$ is in the array)
  • We suppose that the elements are pretty uniformly distributed between $i_1$ and $i_n$

The algorithm:

  1. let "position" be initially be 0
  2. if $i_k$ == $i_{position},ドル stop
  3. otherwise:
    • calculate distance = $i_k$ - $i_{position}$
    • if we have just switched direction (eg. the previous distance was positive and this one is negative or vice-versa), make sure that abs(distance) < abs(previous distance) by adding/subtracting 1 from distance such that it gets closer to 0. Note: if distance is already is 1 or -1, it isn't adjusted.
    • let position be position + distance (see note below for possible optimization)
    • if position is positive, let position be min(position, n) (ie. don't overshoot the end)
    • if position is negative, let position be max(position, 0) (ie. don't undershoot the beginning)

I would be interested if somebody studied this algorithm and if so, what's its name? The best I could come up with is that it's a variant of "interpolation search", however I expect the above algorithm to be faster on x86 hardware because it only uses addition/subtraction which is faster than the division/multiplication used by "standard" interpolation search.

Note:

Instead of updating position by adding the distance (ie. position = position + distance) we can use the "average distance between consecutive elements" as a scaling factor: $$position \leftarrow position + distance / {i_n-i_0 \over n+1}$$

This should give a more precise estimate about the element position.

A disadvantage of the above is that it involves a division which is slow(er) on x86/x64 platforms. Perhaps we can get away with finding the closes power of two to the scaling factor (average distance between consecutive elements) and use shift to perform the division (ie. if 2ドル^t$ is the power of two closes to the scaling factor we can do $position \leftarrow position + distance >> k$.

Update:

  • as people have pointed out the initial element needs to be $i_0$
  • added steps to avoid oscillating indefinitely between undershooting / overshooting the target
  • discovered the "galloping search" from Timsort which seems related but not the same
asked Feb 24, 2015 at 13:20
$\endgroup$
6
  • 2
    $\begingroup$ This algorithm doesn't seem to work correctly. On the array $\{1,3,5\},ドル searching for 3ドル$ it will jump between 1ドル$ and 5ドル$ forever. $\endgroup$ Commented Feb 24, 2015 at 14:00
  • 1
    $\begingroup$ Unless your array indices start at 0, the first time you do step 2 you'll refer to a non-existent array location. $\endgroup$ Commented Feb 24, 2015 at 14:12
  • $\begingroup$ @RickDecker - correct, I updated the text to reflect the fact that the array starts at 0. $\endgroup$ Commented Feb 24, 2015 at 20:01
  • $\begingroup$ @TomvanderZanden - correct. I updated the algorithm to avoid this "oscillating indefinitely" situation. $\endgroup$ Commented Feb 24, 2015 at 20:01
  • 2
    $\begingroup$ @Cd-Man That resolves my concern with it oscillating indefinitely. However, it is still a pretty inefficient search algorithm since on an array $\{2,4,\ldots,2k-2,2k,2k+2,\ldots,4k-2,4k\},ドル searching for 2ドルk$ it needs linear time since it triggers your "stop it jumping" thing every step. $\endgroup$ Commented Feb 24, 2015 at 20:49

1 Answer 1

1
$\begingroup$

You may be interested to a related algorithm, known as Interpolation Search, which achieves on average the complexity of $O(\log \log n)$ for searching $n$ keys. This is the usual search you do when you scan a phone book for example.

This paper discusses the algorithm, which was originally introduced earlier (1957) by Peterson.

Related data structures: dynamic interpolation search and interpolation search for non independent data.

answered Feb 25, 2015 at 10:11
$\endgroup$
1
  • 3
    $\begingroup$ @Cd-MaN is aware of interpolation search, he even mentions it in his question. He hopes his search algorithm improves upon interpolation search by removing the (relatively expensive on many processors) division operation. $\endgroup$ Commented Feb 25, 2015 at 11:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.