Retrieving maximum value from a range in unsorted array

Question 1

I have an unsorted array. I have queries in which I give a range and then the maximum value from that range has to returned. For example:

array[]={23,17,9,45,78,2,4,6,90,1};
query(both inclusive): 2 6
answer: 78

Which algorithm or data structure do I construct to quickly retrieve maximum value from any range. (There are a lot of queries)

EDIT: This is indeed a simple version of the actual problem. I can have the array size as large as 100000 and the number of queries upto 100000. So I definitely require some preprocessing which'll facilitate a fast query response.

Question 2

Why is it unsorted? The problem is trivial if it's sorted, so the obvious approach is to sort it.

Question 3

@delnan Without some extra mechanism, you lose track of which values were originally in the range to be queried...

Question 4

Specify your whole problem. If this knowledge (or any other information) matters, one has to know to factor that into the solution.

Question 5

Am I missing something, or is this just a matter of visiting items 2 through 6 and finding the maximum value of those elements?

Question 6

@Blrfl: I don't think you're missing anything, except maybe the part about many queries. It's not really clear whether there's any point in building a structure that makes queries substantially cheaper than a sequential search. (Although there wouldn't be much point in asking the question here if that weren't the idea.)

Question 7

I think you could construct some kind of binary tree where each node represents the maximum value its children:

 78 
 45 78 
 23 45 78 6 
23 17 9 45 78 2 4 6

Then you only need to find a way to determine which nodes you minimally need to check to find the maximum value in the range queried. In this example, to get the maximum value in the index range [2, 6] (inclusive) you would have max(45, 78, 4) instead of max(9, 45, 78, 2, 4). As the tree grows, the gain will be larger.

Question 8

For this to work, there's information missing from your example tree: Each internal node must have both the maximum, and the total number of child nodes it has. Otherwise the search has no way of knowing that (for example) it doesn't have to look at all the children of 78 (and skip the 2), because for all it knows index 6 is in that subtree.

Question 9

Otherwise, +1 as I find this rather inventive

Question 10

+1: This is a powerful technique for answering queries about subranges of a list in log(N) time, usable whevever the data at the root node can be computed in constant time from the data at the children.

Question 11

This idea is awesome. It gives O(logn) query time. I think @Izkata made a good point too. We can augment the tree node with information about the left and right ranges it covers. So given a range, it knows how to split the problem into two. Space-wise, all the data are store at the leaf level. So it requires 2*N space, which is O(N) to store. I don't know what is a segment tree, but is this the idea behind the segment tree?

Question 12

And in terms of preprocessing, it takes O(n) to construct the tree.

Question 13

To complement ngoaho91's answer.

The best way to solve this problem is using the Segment Tree data structure. This allows you to answer such queries in O(log(n)), that would mean the total complexity of your algorithm would be O(Qlogn) where Q is the number of queries. If you used the naive algorithm, the total complexity would be O(Qn) which is obviouslly slower.

There is, however, a drawback of the usage of Segment Trees. It takes up a lot of memory, but a lot of times you care less about memory than about speed.

I will briefly describe the algorithms used by this DS:

The segment tree is just an special case of a Binary Search Tree, where every node holds the value of the range it is assigned to. The root node, is assigned the range [0, n]. The left child is assigned the range [0, (0+n)/2] and the right child [(0+n)/2+1, n]. This way the tree will be built.

Create Tree:

/*
 A[] -> array of original values
 tree[] -> Segment Tree Data Structure.
 node -> the node we are actually in: remember left child is 2*node, right child is 2*node+1
 a, b -> The limits of the actual array. This is used because we are dealing
 with a recursive function.
*/
int tree[SIZE];
void build_tree(vector<int> A, int node, int a, int b) {
 if (a == b) { // We get to a simple element
 tree[node] = A[a]; // This node stores the only value
 }
 else {
 int leftChild, rightChild, middle;
 leftChild = 2*node;
 rightChild = 2*node+1; // Or leftChild+1
 middle = (a+b) / 2;
 build_tree(A, leftChild, a, middle); // Recursively build the tree in the left child
 build_tree(A, rightChild, middle+1, b); // Recursively build the tree in the right child
 tree[node] = max(tree[leftChild], tree[rightChild]); // The Value of the actual node, 
 //is the max of both of the children.
 }
}

Query Tree

int query(int node, int a, int b, int p, int q) {
 if (b < p || a > q) // The actual range is outside this range
 return -INF; // Return a negative big number. Can you figure out why?
 else if (p >= a && b >= q) // Query inside the range
 return tree[node];
 int l, r, m;
 l = 2*node;
 r = l+1;
 m = (a+b) / 2;
 return max(query(l, a, m, p, q), query(r, m+1, b, p, q)); // Return the max of querying both children.
}

If you need further explanation, just let me know.

BTW, Segment Tree also supports update of a single element or a range of elements in O(log n)

Question 14

what's the complexity of filling the tree?

Question 15

You have to go through all the elements, and it takes O(log(n)) for each element to be added to the tree. Therefore, the total complexity is O(nlog(n))

Question 16

The best algorithm would be in O(n) time as below let start, end be the index of the bounds of range

int findMax(int[] a, start, end) {
 max = Integer.MIN; // initialize to minimum Integer
 for(int i=start; i <= end; i++) 
 if ( a[i] > max )
 max = a[i];
 return max; 
}

Question 17

-1 for merely repeating the algorithm the OP was trying to improve on.

Question 18

+1 for posting a solution to the as-stated problem. This really is the only way to do it if you have an array and don't know what the bounds are going to be a priori. (Although I would initialize max to a[i] and start the for loop at i+1.)

Question 19

@kevincline It's not just restating - it's also saying "Yes, you already have the best algorithm for this task", with a minor improvement (jump to start, stop at end). And I agree, this is the best for a one-time lookup. @ThijsvanDien's answer is only better if the lookup is going to happen multiple times, since it takes longer to set up initially.

Question 20

Granted, at the time of posting this answer, the question did not include the edit confirming that he'll be doing many queries over the same data.

Question 21

The binary tree/segment tree-based solutions are indeed pointing in the right direction. One might object that they require a lot of extra memory, however. There are two solutions to these problems:

Use an implicit data structure instead of a binary tree
Use an M-ary tree instead of a binary tree

The first point is that because the tree is highly structured, you can use a heap-like structure to implicitly define the tree rather than representing the tree with nodes, left and right pointers, intervals etc.. That saves a lot of memory with essentially no performance hit - you do need to perform a little more pointer arithmetic.

The second point is that, at the cost of a little more work during evaluation, you can use an M-ary tree rather than a binary tree. For instance if you use a 3-ary tree you will compute the max of 3 elements at a time, then 9 elements at a time, then 27, etc. The extra storage required is then N/(M-1) - you can prove using the geometric series formula. If you choose M = 11, for example, you will require 1/10th the storage of the binary tree method.

You can verify that these naive and optimized implementations in Python give the same results:

class RangeQuerier(object):
 #The naive way
 def __init__(self):
 pass
 def set_array(self,arr):
 #Set, and preprocess
 self.arr = arr
 def query(self,l,r):
 try:
 return max(self.arr[l:r])
 except ValueError:
 return None

vs.

class RangeQuerierMultiLevel(object):
 def __init__(self):
 self.arrs = []
 self.sub_factor = 3
 self.len_ = 0
 def set_array(self,arr):
 #Set, and preprocess
 tgt = arr
 self.len_ = len(tgt)
 self.arrs.append(arr)
 while len(tgt) > 1:
 tgt = self.maxify_one_array(tgt)
 self.arrs.append(tgt)
 def maxify_one_array(self,arr):
 sub_arr = []
 themax = float('-inf')
 for i,el in enumerate(arr):
 themax = max(el,themax)
 if i % self.sub_factor == self.sub_factor - 1:
 sub_arr.append(themax)
 themax = float('-inf')
 return sub_arr
 def query(self,l,r,level=None):
 if level is None:
 level = len(self.arrs)-1
 if r <= l:
 return None
 int_size = self.sub_factor ** level 
 lhs,mid,rhs = (float('-inf'),float('-inf'),float('-inf'))
 #Check if there's an imperfect match on the left hand side
 if l % int_size != 0:
 lnew = int(ceil(l/float(int_size)))*int_size
 lhs = self.query(l,min(lnew,r),level-1)
 l = lnew
 #Check if there's an imperfect match on the right hand side
 if r % int_size != 0:
 rnew = int(floor(r/float(int_size)))*int_size
 rhs = self.query(max(rnew,l),r,level-1)
 r = rnew
 if r > l:
 #Handle the middle elements
 mid = max(self.arrs[level][l/int_size:r/int_size])
 return max(max(lhs,mid),rhs)

Question 22

try "segment tree" data structure
there are 2 step
build_tree() O(n)
query(int min, int max) O(nlogn)

http://en.wikipedia.org/wiki/Segment_tree

edit:

you guys just don't read the wiki i sent!

this algorithm is:
- you traverse the array 1 time to build tree. O(n)
- next 100000000+ times you want to know max of any part of array, just call the query function. O(logn) for every query
- c++ implement here geeksforgeeks.org/segment-tree-set-1-range-minimum-query/
old algorithm is:
every query, just traverse the selected area and find.

so, if you gonna use this algorithm to process once, OK, it slower than old way. but if you gonna process huge number of query(billion), it's very efficient you can generate text file like this, for test

line 1: 50000 random number from 0-1000000, split by '(space)'(it's the array)
line 2: 2 random number from 1 to 50000, split by '(space)'(it's the query)
...
line 200000: likes line 2, it's random query too

this is the example problem, sorry but this is in vietnamese
http://vn.spoj.com/problems/NKLINEUP/
if you solve it by old way, you never pass.

Question 23

I don't think that's relevant. An interval tree holds intervals, not integers, and the operations they permit look nothing like what OP asks for. You could, of course, generate all possible intervals and store them in an interval tree, but (1) there are exponentially many of them, so this doesn't scale, and (2) the operations still don't look like what OP asks for.

Question 24

my mistake, i mean segment tree, not interval tree.

Question 25

Interesting, I think I've never come across this tree! IIUC this still requires storing all possible intervals, though. I think there's O(n^2) of those, which is rather expensive. (Also, shouldn't query be O(log n + k) for k results?

Question 26

yes, void build_tree() must travel cross the array. and store max(or min) value for every nodes. but in many case, memory cost is not important than speed.

Question 27

I can't imagine this being any faster than a plain O(n) search of the array, as described in tarun_telang's answer. First instinct is that O(log n + k) is faster than O(n), but the O(log n + k) is just retrieval of the sub-array - equivalent to O(1) array access given the start and end points. You would still need to traverse it to find the maximum.

Question 28

You can achieve O(1) per query (with O(n log n) construction) using data structure called sparse table. For each power of 2 let's save maximum for each segment of this length. Now given segment [l, r) you get maximum of maximums on [l+2^k) and [r-2^k,r) for appropriate k. They overlap but it's OK

Thijs van Dien Thijs van Dien 1,0911 gold badge7 silver badges13 bronze badges · Answer 1 · 2013-05-04 09:48:03Z

14

I think you could construct some kind of binary tree where each node represents the maximum value its children:

 78 
 45 78 
 23 45 78 6 
23 17 9 45 78 2 4 6

Then you only need to find a way to determine which nodes you minimally need to check to find the maximum value in the range queried. In this example, to get the maximum value in the index range [2, 6] (inclusive) you would have max(45, 78, 4) instead of max(9, 45, 78, 2, 4). As the tree grows, the gain will be larger.

Share

Improve this answer

edited Jan 4, 2020 at 14:47

answered May 4, 2013 at 9:48

Thijs van Dien's user avatar

Thijs van Dien Thijs van Dien

1,0911 gold badge7 silver badges13 bronze badges

10

1

For this to work, there's information missing from your example tree: Each internal node must have both the maximum, and the total number of child nodes it has. Otherwise the search has no way of knowing that (for example) it doesn't have to look at all the children of 78 (and skip the 2), because for all it knows index 6 is in that subtree.

Izkata
– Izkata

2013年05月04日 18:14:32 +00:00
Commented May 4, 2013 at 18:14
Otherwise, +1 as I find this rather inventive

Izkata
– Izkata

2013年05月04日 18:16:38 +00:00
Commented May 4, 2013 at 18:16
+1: This is a powerful technique for answering queries about subranges of a list in log(N) time, usable whevever the data at the root node can be computed in constant time from the data at the children.

kevin cline
– kevin cline

2013年05月04日 19:40:14 +00:00
Commented May 4, 2013 at 19:40
This idea is awesome. It gives O(logn) query time. I think @Izkata made a good point too. We can augment the tree node with information about the left and right ranges it covers. So given a range, it knows how to split the problem into two. Space-wise, all the data are store at the leaf level. So it requires 2*N space, which is O(N) to store. I don't know what is a segment tree, but is this the idea behind the segment tree?

Kay
– Kay

2018年01月26日 20:43:24 +00:00
Commented Jan 26, 2018 at 20:43
And in terms of preprocessing, it takes O(n) to construct the tree.

Kay
– Kay

2018年01月26日 20:51:41 +00:00
Commented Jan 26, 2018 at 20:51

| Show 5 more comments

Andrés Andrés 1414 bronze badges · Answer 2 · 2013-05-06 16:46:29Z

To complement ngoaho91's answer.

The best way to solve this problem is using the Segment Tree data structure. This allows you to answer such queries in O(log(n)), that would mean the total complexity of your algorithm would be O(Qlogn) where Q is the number of queries. If you used the naive algorithm, the total complexity would be O(Qn) which is obviouslly slower.

There is, however, a drawback of the usage of Segment Trees. It takes up a lot of memory, but a lot of times you care less about memory than about speed.

I will briefly describe the algorithms used by this DS:

The segment tree is just an special case of a Binary Search Tree, where every node holds the value of the range it is assigned to. The root node, is assigned the range [0, n]. The left child is assigned the range [0, (0+n)/2] and the right child [(0+n)/2+1, n]. This way the tree will be built.

Create Tree:

/*
 A[] -> array of original values
 tree[] -> Segment Tree Data Structure.
 node -> the node we are actually in: remember left child is 2*node, right child is 2*node+1
 a, b -> The limits of the actual array. This is used because we are dealing
 with a recursive function.
*/
int tree[SIZE];
void build_tree(vector<int> A, int node, int a, int b) {
 if (a == b) { // We get to a simple element
 tree[node] = A[a]; // This node stores the only value
 }
 else {
 int leftChild, rightChild, middle;
 leftChild = 2*node;
 rightChild = 2*node+1; // Or leftChild+1
 middle = (a+b) / 2;
 build_tree(A, leftChild, a, middle); // Recursively build the tree in the left child
 build_tree(A, rightChild, middle+1, b); // Recursively build the tree in the right child
 tree[node] = max(tree[leftChild], tree[rightChild]); // The Value of the actual node, 
 //is the max of both of the children.
 }
}

Query Tree

int query(int node, int a, int b, int p, int q) {
 if (b < p || a > q) // The actual range is outside this range
 return -INF; // Return a negative big number. Can you figure out why?
 else if (p >= a && b >= q) // Query inside the range
 return tree[node];
 int l, r, m;
 l = 2*node;
 r = l+1;
 m = (a+b) / 2;
 return max(query(l, a, m, p, q), query(r, m+1, b, p, q)); // Return the max of querying both children.
}

If you need further explanation, just let me know.

BTW, Segment Tree also supports update of a single element or a range of elements in O(log n)

You have to go through all the elements, and it takes O(log(n)) for each element to be added to the tree. Therefore, the total complexity is O(nlog(n))

Tarun Tarun 1432 bronze badges · Answer 3 · 2013-05-04 17:45:42Z

1

The best algorithm would be in O(n) time as below let start, end be the index of the bounds of range

int findMax(int[] a, start, end) {
 max = Integer.MIN; // initialize to minimum Integer
 for(int i=start; i <= end; i++) 
 if ( a[i] > max )
 max = a[i];
 return max; 
}

Share

Improve this answer

answered May 4, 2013 at 17:45

Tarun's user avatar

Tarun Tarun

1432 bronze badges

4

4

-1 for merely repeating the algorithm the OP was trying to improve on.

kevin cline
– kevin cline

2013年05月04日 19:42:12 +00:00
Commented May 4, 2013 at 19:42
1

+1 for posting a solution to the as-stated problem. This really is the only way to do it if you have an array and don't know what the bounds are going to be a priori. (Although I would initialize max to a[i] and start the for loop at i+1.)

Blrfl
– Blrfl

2013年05月04日 20:36:13 +00:00
Commented May 4, 2013 at 20:36
@kevincline It's not just restating - it's also saying "Yes, you already have the best algorithm for this task", with a minor improvement (jump to start, stop at end). And I agree, this is the best for a one-time lookup. @ThijsvanDien's answer is only better if the lookup is going to happen multiple times, since it takes longer to set up initially.

Izkata
– Izkata

2013年05月06日 17:03:29 +00:00
Commented May 6, 2013 at 17:03
Granted, at the time of posting this answer, the question did not include the edit confirming that he'll be doing many queries over the same data.

Izkata
– Izkata

2013年05月06日 18:44:13 +00:00
Commented May 6, 2013 at 18:44

Add a comment |

Patrick Mineault Patrick Mineault 1112 bronze badges · Answer 4 · 2015-06-20 06:59:44Z

The binary tree/segment tree-based solutions are indeed pointing in the right direction. One might object that they require a lot of extra memory, however. There are two solutions to these problems:

Use an implicit data structure instead of a binary tree
Use an M-ary tree instead of a binary tree

The first point is that because the tree is highly structured, you can use a heap-like structure to implicitly define the tree rather than representing the tree with nodes, left and right pointers, intervals etc.. That saves a lot of memory with essentially no performance hit - you do need to perform a little more pointer arithmetic.

The second point is that, at the cost of a little more work during evaluation, you can use an M-ary tree rather than a binary tree. For instance if you use a 3-ary tree you will compute the max of 3 elements at a time, then 9 elements at a time, then 27, etc. The extra storage required is then N/(M-1) - you can prove using the geometric series formula. If you choose M = 11, for example, you will require 1/10th the storage of the binary tree method.

You can verify that these naive and optimized implementations in Python give the same results:

class RangeQuerier(object):
 #The naive way
 def __init__(self):
 pass
 def set_array(self,arr):
 #Set, and preprocess
 self.arr = arr
 def query(self,l,r):
 try:
 return max(self.arr[l:r])
 except ValueError:
 return None

vs.

class RangeQuerierMultiLevel(object):
 def __init__(self):
 self.arrs = []
 self.sub_factor = 3
 self.len_ = 0
 def set_array(self,arr):
 #Set, and preprocess
 tgt = arr
 self.len_ = len(tgt)
 self.arrs.append(arr)
 while len(tgt) > 1:
 tgt = self.maxify_one_array(tgt)
 self.arrs.append(tgt)
 def maxify_one_array(self,arr):
 sub_arr = []
 themax = float('-inf')
 for i,el in enumerate(arr):
 themax = max(el,themax)
 if i % self.sub_factor == self.sub_factor - 1:
 sub_arr.append(themax)
 themax = float('-inf')
 return sub_arr
 def query(self,l,r,level=None):
 if level is None:
 level = len(self.arrs)-1
 if r <= l:
 return None
 int_size = self.sub_factor ** level 
 lhs,mid,rhs = (float('-inf'),float('-inf'),float('-inf'))
 #Check if there's an imperfect match on the left hand side
 if l % int_size != 0:
 lnew = int(ceil(l/float(int_size)))*int_size
 lhs = self.query(l,min(lnew,r),level-1)
 l = lnew
 #Check if there's an imperfect match on the right hand side
 if r % int_size != 0:
 rnew = int(floor(r/float(int_size)))*int_size
 rhs = self.query(max(rnew,l),r,level-1)
 r = rnew
 if r > l:
 #Handle the middle elements
 mid = max(self.arrs[level][l/int_size:r/int_size])
 return max(max(lhs,mid),rhs)

ngoaho91 ngoaho91 1173 bronze badges · Answer 5 · 2013-05-04 10:11:07Z

try "segment tree" data structure
there are 2 step
build_tree() O(n)
query(int min, int max) O(nlogn)

http://en.wikipedia.org/wiki/Segment_tree

edit:

you guys just don't read the wiki i sent!

this algorithm is:
- you traverse the array 1 time to build tree. O(n)
- next 100000000+ times you want to know max of any part of array, just call the query function. O(logn) for every query
- c++ implement here geeksforgeeks.org/segment-tree-set-1-range-minimum-query/
old algorithm is:
every query, just traverse the selected area and find.

so, if you gonna use this algorithm to process once, OK, it slower than old way. but if you gonna process huge number of query(billion), it's very efficient you can generate text file like this, for test

line 1: 50000 random number from 0-1000000, split by '(space)'(it's the array)
line 2: 2 random number from 1 to 50000, split by '(space)'(it's the query)
...
line 200000: likes line 2, it's random query too

this is the example problem, sorry but this is in vietnamese
http://vn.spoj.com/problems/NKLINEUP/
if you solve it by old way, you never pass.

I don't think that's relevant. An interval tree holds intervals, not integers, and the operations they permit look nothing like what OP asks for. You could, of course, generate all possible intervals and store them in an interval tree, but (1) there are exponentially many of them, so this doesn't scale, and (2) the operations still don't look like what OP asks for.
Interesting, I think I've never come across this tree! IIUC this still requires storing all possible intervals, though. I think there's O(n^2) of those, which is rather expensive. (Also, shouldn't query be O(log n + k) for k results?
yes, void build_tree() must travel cross the array. and store max(or min) value for every nodes. but in many case, memory cost is not important than speed.
I can't imagine this being any faster than a plain O(n) search of the array, as described in tarun_telang's answer. First instinct is that O(log n + k) is faster than O(n), but the O(log n + k) is just retrieval of the sub-array - equivalent to O(1) array access given the start and end points. You would still need to traverse it to find the maximum.

RiaD RiaD 1,7102 gold badges12 silver badges14 bronze badges · Answer 6 · 2020-01-05 06:55:35Z

You can achieve O(1) per query (with O(n log n) construction) using data structure called sparse table. For each power of 2 let's save maximum for each segment of this length. Now given segment [l, r) you get maximum of maximums on [l+2^k) and [r-2^k,r) for appropriate k. They overlap but it's OK

Stack Exchange Network

Retrieving maximum value from a range in unsorted array

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Retrieving maximum value from a range in unsorted array

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions