Dynamic programming based range minimum query

Question 1

I am implementing a DP-based range minimum query data structure and algorithm in Python 2.7. Any advice on algorithm time complexity improvements, code bugs or code style issues are appreciated.

More information about range minimum query.

from collections import defaultdict
import random
def build_rmq(numbers):
 rmq_map = defaultdict(int) # key: tuple of (start, end) index, value: min value
 j = 0
 while 2 ** j - 1 < len(numbers):
 j += 1
 j -= 1
 # print j
 for step in range(0,j+1):
 for i in range(0, len(numbers)):
 if step == 0:
 rmq_map[(i,step)] = numbers[i]
 continue
 if i + 2 ** (step-1) < len(numbers):
 rmq_map[(i,step)] = min(rmq_map[(i, step-1)], rmq_map[(i + 2 ** (step-1), step-1)])
 else:
 rmq_map[(i,step)] = rmq_map[(i, step-1)]
 return rmq_map
def rmq_query(start, end, rmq_map):
 j = 0
 while start + 2 ** j - 1 <= end:
 j += 1
 j -= 1
 x = rmq_map[(start, j)]
 y = rmq_map[(end + 1 - 2 ** j, j)]
 return min(x,y)
if __name__ == "__main__":
 numbers = [0,5,2,5,4,3,1,6,3]
 rmq_map = build_rmq(numbers)
 #print rmq_map
 start = random.randint(0, len(numbers)-1)
 end = random.randint(0, len(numbers)-1)
 if start > end:
 start, end = end, start
 print start, end, numbers[start:end+1]
 print rmq_query(start, end, rmq_map)

Question 2

1. Review

There are no docstrings. What do these functions do? What do they return? How do I use them?
The function build_rmq constructs a persistent data structure which can then be queried by calling the function rmq_query. When you have a persistent data structure and functions that operate on it, this is good candidate for making into a class.
The comment on this line is not quite right:
```
rmq_map = defaultdict(int) # key: tuple of (start, end) index, value: min value
```
the second element of the key is a step, not an end index.
There is no need to use a defaultdict here, since the code always fills in the entry before looking it up. An ordinary dictionary would do fine.
This code:
```
j = 0
while 2 ** j - 1 < len(numbers):
 j += 1
```
computes the smallest power of 2 that is greater than len(numbers). It would be simpler to use int.bit_length:
```
j = len(numbers).bit_length()
```
(In ChatterOne's answer it is suggested that you use int(log(len(numbers), 2)). I think this is a dangerous suggestion, because log makes approximations that can lead to errors. For example,
```
int(log(2**48 - 1, 2))
```
evaluates to 48, but the correct result is 47. In order to check that these errors can't actually cause your code to fail, you need to do numerical analysis. This seems complicated and error-prone to me, compared to sticking with integer arithmetic where you know the results are exact.)

Similarly, this loop:

j = 0
while start + 2 ** j - 1 <= end:
 j += 1

becomes:

j = (end - start + 1).bit_length()

This code subtracts 1 from j and then adds it back again, which is not necessary:
```
j -= 1
for step in range(0,j+1):
```
range(0, x) can be abbreviated to range(x).
rmq_map[(i,step)] does not need the parentheses: rmq_map[i, step] is the same.
The two loops over step and i can be combined into a single loop using itertools.product
I would avoid the special case if step == 0 by having a separate loop to handle that case:
```
rmq_map = {(i, 0): m for i, m in enumerate(numbers)}
```
and then looping over step starting from 1.
In Python it is usual to represent a range in half-open fashion (exclusive at the upper end). It would be convenient if rmq_query took its arguments in the same way.
The top-level code runs a single range query but does not check that the result is correct. It would be easy to check the result of rmq_query(start, end) against min(numbers[start:end+1]). It would then be easy to turn this into a unit test using the features in the unittest module.

2. Revised code

from itertools import product
class RangeMinimum(object):
 "Data structure providing efficient range-minimum queries."
 def __init__(self, numbers):
 "Build a RangeMinimum object for the given sequence of numbers."
 # Mapping from (start, step) to min(numbers[start:start + 2**step])
 self._rmq = rmq = {(i, 0): m for i, m in enumerate(numbers)}
 n = len(numbers)
 for step, i in product(range(1, n.bit_length()), range(n)):
 j = i + 2 ** (step-1)
 if j < n:
 rmq[i, step] = min(rmq[i, step-1], rmq[j, step-1])
 else:
 rmq[i, step] = rmq[i, step-1]
 def query(self, start, stop):
 "Return min(numbers[start:stop])."
 j = (stop - start).bit_length() - 1
 x = self._rmq[start, j]
 y = self._rmq[stop - 2 ** j, j]
 return min(x, y)
import unittest
import random
class TestRangeMinimum(unittest.TestCase):
 def test_range_minimum(self):
 n = 1000
 numbers = [random.randrange(n) for _ in range(n)]
 rmq = RangeMinimum(numbers)
 for _ in range(n):
 start, stop = sorted(random.sample(range(n + 1), 2))
 self.assertEqual(rmq.query(start, stop), min(numbers[start:stop]))

3. Generalization

There's nothing special about the minimum here — exactly the same algorithm can be used for range-maximum queries, or for querying other kinds of statistic on a range (such as the longest common prefix, or the union of sets).

So that suggests this kind of generalization:

from itertools import product
class RangeQuery(object):
 "Data structure providing efficient range queries."
 def __init__(self, items, fn):
 """Build a RangeQuery object for a sequence of items.
 fn -- function taking two items and returning their query
 result, for example "min" to query the range-minimum. It must be
 associative, commutative, and idempotent.
 """
 # Mapping from (start, step) to reduce(fn, items[start:start + 2**step])
 self._rq = rq = {(i, 0): item for i, item in enumerate(items)}
 self._fn = fn
 n = len(items)
 for step, i in product(range(1, n.bit_length()), range(n)):
 j = i + 2 ** (step-1)
 if j < n:
 rq[i, step] = fn(rq[i, step-1], rq[j, step-1])
 else:
 rq[i, step] = rq[i, step-1]
 def query(self, start, stop):
 "Return reduce(fn, items[start:stop])."
 j = (stop - start).bit_length() - 1
 x = self._rq[start, j]
 y = self._rq[stop - 2 ** j, j]
 return self._fn(x, y)
import random
import unittest
class TestRangeQuery(unittest.TestCase):
 def test_range_minmax(self):
 n = 1000
 items = [random.randrange(n) for _ in range(n)]
 rqmin = RangeQuery(items, min)
 rqmax = RangeQuery(items, max)
 for _ in range(n):
 start, stop = sorted(random.sample(range(n + 1), 2))
 self.assertEqual(rqmin.query(start, stop), min(items[start:stop]))
 self.assertEqual(rqmax.query(start, stop), max(items[start:stop]))

Question 3

Thanks for the great comments, Gareth and vote up. 1. 'because log makes approximations that can lead to errors' -- agree. 2. 'the second element of the key is a step, not an end index', agree.

Question 4

BTW, do you think there are any bugs (I mean functional) in our logic/code (either my original code or your revised)? For functional bug, I mean in any (could be corner) case, our code does not return the right minimal value for a range query.

Question 5

Re-read your code and one more question, in your generalization version, why it is generalized? I see very similar to your "Revised code".

Question 6

@LinMa: Did you compare the two versions to see where they differ?

Question 7

Hi Gareth, I might be lost in your intention in your code, I think your code is far from resolving LCA problem, but I could be wrong, please feel free to correct me. I have write a version of using MRQ to resolve LCA problem => codereview.stackexchange.com/questions/154319/…, if you have advice there, appreciated!

Question 8

Just some really quick thoughts.

This:

j = 0
while 2 ** j - 1 < len(numbers):
 j += 1
j -= 1

can be written as:

from math import log
j = int(log(len_numbers, 2))

You're repeating step - 1 and 2 ** (step - 1), it's probably a good idea to use variables for those. You're also accessing an array by both index and value, that's what enumerate is for:

 for index, value in enumerate(numbers):
 if step == 0:
 rmq_map[(index, step)] = value
 continue
 prev_step = step - 1
 step_power = 2 ** prev_step
 if index + step_power < len_numbers:
 rmq_map[(index, step)] = min(rmq_map[(index, rev_step)], rmq_map[(index + step_power, prev_step)])
 else:
 rmq_map[(index, step)] = rmq_map[(index, prev_step)]

Also, 2 ** (step - 1) is being calculated repeatedly, so there may be a slight performance increase if you keep a dictionary to cache the results. Not entirely sure though, because the overhead of using a dictionary may be too much for such a simple operation. The same goes for j.

Question 9

Thanks ChatterOne, love your comments and vote up. Besides the comments, do you think any code (functional/logical) bugs in my code or revised version?

Question 10

BTW, how do you handle the issue @Gareth raised, for the math log issue?

Question 11

@LinMa I know about the limitations of using log, it all depends on which space you're working. @Gareth did not chose 2^48 randomly, because up to 2^47 there's no problem. Integer arithmetic also has limits, they're just higher. Also nowadays depending on cpu/architecture, floating point math can be either slower or faster than integer operations. So as to which one to use I'd say "it depends".

Question 12

A tiny detail not mentionned by ChatterOne's excellent answer.

 if step == 0:
 rmq_map[(i,step)] = numbers[i]
 continue
 if i + 2 ** (step-1) < len(numbers):
 rmq_map[(i,step)] = min(rmq_map[(i, step-1)], rmq_map[(i + 2 ** (step-1), step-1)])
 else:
 rmq_map[(i,step)] = rmq_map[(i, step-1)]

could be written with an elif and no continue.

 for i in range(len(numbers)):
 if step == 0:
 rmq_map[(i,step)] = numbers[i]
 elif i + 2 ** (step-1) < len(numbers):
 rmq_map[(i,step)] = min(rmq_map[(i, step-1)], rmq_map[(i + 2 ** (step-1), step-1)])
 else:
 rmq_map[(i,step)] = rmq_map[(i, step-1)]

Question 13

Thanks Josay, love your comments and vote up. Besides the comments, do you think any code (functional/logical) bugs in my code or revised version?

score 8 · Answer 1 · 2017-01-31 12:18:06Z

1. Review

There are no docstrings. What do these functions do? What do they return? How do I use them?
The function build_rmq constructs a persistent data structure which can then be queried by calling the function rmq_query. When you have a persistent data structure and functions that operate on it, this is good candidate for making into a class.
The comment on this line is not quite right:
```
rmq_map = defaultdict(int) # key: tuple of (start, end) index, value: min value
```
the second element of the key is a step, not an end index.
There is no need to use a defaultdict here, since the code always fills in the entry before looking it up. An ordinary dictionary would do fine.
This code:
```
j = 0
while 2 ** j - 1 < len(numbers):
 j += 1
```
computes the smallest power of 2 that is greater than len(numbers). It would be simpler to use int.bit_length:
```
j = len(numbers).bit_length()
```
(In ChatterOne's answer it is suggested that you use int(log(len(numbers), 2)). I think this is a dangerous suggestion, because log makes approximations that can lead to errors. For example,
```
int(log(2**48 - 1, 2))
```
evaluates to 48, but the correct result is 47. In order to check that these errors can't actually cause your code to fail, you need to do numerical analysis. This seems complicated and error-prone to me, compared to sticking with integer arithmetic where you know the results are exact.)

Similarly, this loop:

j = 0
while start + 2 ** j - 1 <= end:
 j += 1

becomes:

j = (end - start + 1).bit_length()

This code subtracts 1 from j and then adds it back again, which is not necessary:
```
j -= 1
for step in range(0,j+1):
```
range(0, x) can be abbreviated to range(x).
rmq_map[(i,step)] does not need the parentheses: rmq_map[i, step] is the same.
The two loops over step and i can be combined into a single loop using itertools.product
I would avoid the special case if step == 0 by having a separate loop to handle that case:
```
rmq_map = {(i, 0): m for i, m in enumerate(numbers)}
```
and then looping over step starting from 1.
In Python it is usual to represent a range in half-open fashion (exclusive at the upper end). It would be convenient if rmq_query took its arguments in the same way.
The top-level code runs a single range query but does not check that the result is correct. It would be easy to check the result of rmq_query(start, end) against min(numbers[start:end+1]). It would then be easy to turn this into a unit test using the features in the unittest module.

2. Revised code

from itertools import product
class RangeMinimum(object):
 "Data structure providing efficient range-minimum queries."
 def __init__(self, numbers):
 "Build a RangeMinimum object for the given sequence of numbers."
 # Mapping from (start, step) to min(numbers[start:start + 2**step])
 self._rmq = rmq = {(i, 0): m for i, m in enumerate(numbers)}
 n = len(numbers)
 for step, i in product(range(1, n.bit_length()), range(n)):
 j = i + 2 ** (step-1)
 if j < n:
 rmq[i, step] = min(rmq[i, step-1], rmq[j, step-1])
 else:
 rmq[i, step] = rmq[i, step-1]
 def query(self, start, stop):
 "Return min(numbers[start:stop])."
 j = (stop - start).bit_length() - 1
 x = self._rmq[start, j]
 y = self._rmq[stop - 2 ** j, j]
 return min(x, y)
import unittest
import random
class TestRangeMinimum(unittest.TestCase):
 def test_range_minimum(self):
 n = 1000
 numbers = [random.randrange(n) for _ in range(n)]
 rmq = RangeMinimum(numbers)
 for _ in range(n):
 start, stop = sorted(random.sample(range(n + 1), 2))
 self.assertEqual(rmq.query(start, stop), min(numbers[start:stop]))

3. Generalization

There's nothing special about the minimum here — exactly the same algorithm can be used for range-maximum queries, or for querying other kinds of statistic on a range (such as the longest common prefix, or the union of sets).

So that suggests this kind of generalization:

from itertools import product
class RangeQuery(object):
 "Data structure providing efficient range queries."
 def __init__(self, items, fn):
 """Build a RangeQuery object for a sequence of items.
 fn -- function taking two items and returning their query
 result, for example "min" to query the range-minimum. It must be
 associative, commutative, and idempotent.
 """
 # Mapping from (start, step) to reduce(fn, items[start:start + 2**step])
 self._rq = rq = {(i, 0): item for i, item in enumerate(items)}
 self._fn = fn
 n = len(items)
 for step, i in product(range(1, n.bit_length()), range(n)):
 j = i + 2 ** (step-1)
 if j < n:
 rq[i, step] = fn(rq[i, step-1], rq[j, step-1])
 else:
 rq[i, step] = rq[i, step-1]
 def query(self, start, stop):
 "Return reduce(fn, items[start:stop])."
 j = (stop - start).bit_length() - 1
 x = self._rq[start, j]
 y = self._rq[stop - 2 ** j, j]
 return self._fn(x, y)
import random
import unittest
class TestRangeQuery(unittest.TestCase):
 def test_range_minmax(self):
 n = 1000
 items = [random.randrange(n) for _ in range(n)]
 rqmin = RangeQuery(items, min)
 rqmax = RangeQuery(items, max)
 for _ in range(n):
 start, stop = sorted(random.sample(range(n + 1), 2))
 self.assertEqual(rqmin.query(start, stop), min(items[start:stop]))
 self.assertEqual(rqmax.query(start, stop), max(items[start:stop]))

Thanks for the great comments, Gareth and vote up. 1. 'because log makes approximations that can lead to errors' -- agree. 2. 'the second element of the key is a step, not an end index', agree.
BTW, do you think there are any bugs (I mean functional) in our logic/code (either my original code or your revised)? For functional bug, I mean in any (could be corner) case, our code does not return the right minimal value for a range query.
Re-read your code and one more question, in your generalization version, why it is generalized? I see very similar to your "Revised code".
@LinMa: Did you compare the two versions to see where they differ?
Hi Gareth, I might be lost in your intention in your code, I think your code is far from resolving LCA problem, but I could be wrong, please feel free to correct me. I have write a version of using MRQ to resolve LCA problem => codereview.stackexchange.com/questions/154319/…, if you have advice there, appreciated!

ChatterOne ChatterOneChatterOne 2,84512 silver badges18 bronze badges · Answer 2 · 2017-01-31 09:08:26Z

Just some really quick thoughts.

This:

j = 0
while 2 ** j - 1 < len(numbers):
 j += 1
j -= 1

can be written as:

from math import log
j = int(log(len_numbers, 2))

You're repeating step - 1 and 2 ** (step - 1), it's probably a good idea to use variables for those. You're also accessing an array by both index and value, that's what enumerate is for:

 for index, value in enumerate(numbers):
 if step == 0:
 rmq_map[(index, step)] = value
 continue
 prev_step = step - 1
 step_power = 2 ** prev_step
 if index + step_power < len_numbers:
 rmq_map[(index, step)] = min(rmq_map[(index, rev_step)], rmq_map[(index + step_power, prev_step)])
 else:
 rmq_map[(index, step)] = rmq_map[(index, prev_step)]

Also, 2 ** (step - 1) is being calculated repeatedly, so there may be a slight performance increase if you keep a dictionary to cache the results. Not entirely sure though, because the overhead of using a dictionary may be too much for such a simple operation. The same goes for j.

Thanks ChatterOne, love your comments and vote up. Besides the comments, do you think any code (functional/logical) bugs in my code or revised version?
BTW, how do you handle the issue @Gareth raised, for the math log issue?
@LinMa I know about the limitations of using log, it all depends on which space you're working. @Gareth did not chose 2^48 randomly, because up to 2^47 there's no problem. Integer arithmetic also has limits, they're just higher. Also nowadays depending on cpu/architecture, floating point math can be either slower or faster than integer operations. So as to which one to use I'd say "it depends".

SylvainD SylvainDSylvainD 29.7k1 gold badge49 silver badges93 bronze badges · Answer 3 · 2017-01-31 10:06:33Z

A tiny detail not mentionned by ChatterOne's excellent answer.

 if step == 0:
 rmq_map[(i,step)] = numbers[i]
 continue
 if i + 2 ** (step-1) < len(numbers):
 rmq_map[(i,step)] = min(rmq_map[(i, step-1)], rmq_map[(i + 2 ** (step-1), step-1)])
 else:
 rmq_map[(i,step)] = rmq_map[(i, step-1)]

could be written with an elif and no continue.

 for i in range(len(numbers)):
 if step == 0:
 rmq_map[(i,step)] = numbers[i]
 elif i + 2 ** (step-1) < len(numbers):
 rmq_map[(i,step)] = min(rmq_map[(i, step-1)], rmq_map[(i + 2 ** (step-1), step-1)])
 else:
 rmq_map[(i,step)] = rmq_map[(i, step-1)]

Thanks Josay, love your comments and vote up. Besides the comments, do you think any code (functional/logical) bugs in my code or revised version?

Stack Exchange Network

Dynamic programming based range minimum query

3 Answers 3

1. Review

2. Revised code

3. Generalization

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Dynamic programming based range minimum query

3 Answers 3

1. Review

2. Revised code

3. Generalization

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions