Find the second largest number in an array with at most n + log2 n - 2 comparisons

Question 1

This exercise is from Roughgarden's (excellent) course on algorithms on Coursera:

You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most \$n + \log_2(n) - 2 \$ comparisons.

Note that it doesn't ask just ask to solve the problem in linear time. If we use \2ドル n\$ comparisons the "big-O" time complexity would be linear but greater than \$n + \log_2(n) - 2\$.

Any feedback on the algorithm, implementation, style, and Python is very welcome.

import random
import sys
from math import log
count = 0
def is_greater(a, b):
 global count
 count += 1
 return a > b
def _get_highest_two(arr0, arr1):
 if is_greater(arr0[0], arr1[0]):
 first = arr0[0]
 seconds = arr0[1]
 seconds.append(arr1[0])
 else:
 first = arr1[0]
 seconds = arr1[1]
 seconds.append(arr0[0])
 return (first, seconds)
def get_second(*args):
 if len(args) == 1:
 seconds = args[0][1]
 second_best = seconds.pop()
 for sb in seconds:
 if is_greater(sb, second_best):
 second_best = sb
 return second_best
 _pairs = zip(*[iter(args)]*2)
 _out = [_get_highest_two(*p) for p in _pairs]
 return get_second(*_out)
def main():
 n = int(sys.argv[1])
 arr = [random.randint(0,100000000) for _ in xrange(n)]
 print arr
 print n, sorted(arr)[-2]
 print count
 # is_greater(0,0)
 _pairs = zip(*[iter(arr)]*2)
 _pairs = [(a, [b]) if is_greater(a, b) else (b, [a]) for (a, b) in _pairs]
 print get_second(*_pairs)
 print count, n + log(n, 2) - 2
 return 0
if __name__ == '__main__':
 sys.exit(main())

Question 2

Algorithm and implementation

The algorithm is correct. Essentially it compares the elements pairwise, recursively until all pairs are eliminated, while at each comparison, tracking the compared elements in a list. At the end, the algorithm finds the maximum value, and a list of \$\log_2(n)-1\$ elements, which must contain the 2nd maximum value.

The implementation looks more complicated than it needs to be, and I find it hard to read. I suggest to organize and implement differently:

Create a function find_max_with_tracking that takes a list of values and returns a tuple of the maximum value and the list of values the maximum was compared to:
- Map the input list of values to a list of tuples: [(x, []) for x in input]
- Recursively reduce the list of tuples, comparing then pairwise, keeping the tuple with the higher value, and adding to its list the value of the other tuple. This could be implemented either with a recursive function, or with a loop.
- Return the single tuple that remains, containing the maximum element, and a list of \$\log_2(n)-1\$ values that must contain the second largest value.
This point is reached by n - 1 comparisons, and we have \$\log_2(n)\$ values to scan to find the second largest value. You can use a simple linear scan for this step.

This implementation should be simpler and easier to read.

Programming in general

Avoid global variables as much as possible. If you need to track something, such as the count in this example, then you can wrap the operation in an abstract data type that manages the state and operations (functions) on the state.

Excessive array creation in a loop or recursive function as in get_second is usually a bad sign, to investigate closely and try to eliminate. In this example this was an important hint that the implementation is probably inefficient.

Programming in Python

The implementation uses Python 2 features that are obsolete today, such as print without parentheses (the old keyword instead of the modern function) and xrange instead of range. I suggest to test your scripts using Python 3.

Question 3

One more question about using the "abstract data type" [rather than an evil global variable] to keep track of the number of time a function is called: are you suggesting to use a mutable object - passed as an additional argument - to keep track of that?

Question 4

I didn't dive into the algorithm, but from understanding and syntactical point of view, this is what I'd complain about in a professional code review (in order of priority):

No comments on what the script or any of its parts are about.
No error handling whatsoever.
Globals. If you need to keep state, pass it around or create a class to hold it. This also applies to small scripts like this one. You can never know whether you'll ever need to extend or reuse it.
Inconsistent parameter types. In one function you expect arrays (I guess you rather mean iterables), in another a variable number of parameters. Pls stick to either.
Some variables have non-descriptive names, e.g. what does count represent? count of what? What is seconds? Probably not time.
Why are some of the local variables underscore prefixed?
Missing shebang. It's not obvious which Python version you're using.
Some newlines missing.
Missing spaces around operators.
Some unnecessary newlines (e.g. in main())

The last four points can be corrected automatically by using a proper editor with PEP8 integration. Any solid linter would also complain about the missing comments and globals.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Answer 1 · 2018-07-14 08:15:42Z

Algorithm and implementation

The algorithm is correct. Essentially it compares the elements pairwise, recursively until all pairs are eliminated, while at each comparison, tracking the compared elements in a list. At the end, the algorithm finds the maximum value, and a list of \$\log_2(n)-1\$ elements, which must contain the 2nd maximum value.

The implementation looks more complicated than it needs to be, and I find it hard to read. I suggest to organize and implement differently:

Create a function find_max_with_tracking that takes a list of values and returns a tuple of the maximum value and the list of values the maximum was compared to:
- Map the input list of values to a list of tuples: [(x, []) for x in input]
- Recursively reduce the list of tuples, comparing then pairwise, keeping the tuple with the higher value, and adding to its list the value of the other tuple. This could be implemented either with a recursive function, or with a loop.
- Return the single tuple that remains, containing the maximum element, and a list of \$\log_2(n)-1\$ values that must contain the second largest value.
This point is reached by n - 1 comparisons, and we have \$\log_2(n)\$ values to scan to find the second largest value. You can use a simple linear scan for this step.

This implementation should be simpler and easier to read.

Programming in general

Avoid global variables as much as possible. If you need to track something, such as the count in this example, then you can wrap the operation in an abstract data type that manages the state and operations (functions) on the state.

Excessive array creation in a loop or recursive function as in get_second is usually a bad sign, to investigate closely and try to eliminate. In this example this was an important hint that the implementation is probably inefficient.

Programming in Python

The implementation uses Python 2 features that are obsolete today, such as print without parentheses (the old keyword instead of the modern function) and xrange instead of range. I suggest to test your scripts using Python 3.

One more question about using the "abstract data type" [rather than an evil global variable] to keep track of the number of time a function is called: are you suggesting to use a mutable object - passed as an additional argument - to keep track of that?

Hugo G Hugo G 3741 silver badge10 bronze badges · Answer 2 · 2018-07-15 12:04:38Z

I didn't dive into the algorithm, but from understanding and syntactical point of view, this is what I'd complain about in a professional code review (in order of priority):

No comments on what the script or any of its parts are about.
No error handling whatsoever.
Globals. If you need to keep state, pass it around or create a class to hold it. This also applies to small scripts like this one. You can never know whether you'll ever need to extend or reuse it.
Inconsistent parameter types. In one function you expect arrays (I guess you rather mean iterables), in another a variable number of parameters. Pls stick to either.
Some variables have non-descriptive names, e.g. what does count represent? count of what? What is seconds? Probably not time.
Why are some of the local variables underscore prefixed?
Missing shebang. It's not obvious which Python version you're using.
Some newlines missing.
Missing spaces around operators.
Some unnecessary newlines (e.g. in main())

The last four points can be corrected automatically by using a proper editor with PEP8 integration. Any solid linter would also complain about the missing comments and globals.

Stack Exchange Network

Find the second largest number in an array with at most n + log2 n - 2 comparisons

2 Answers 2

Algorithm and implementation

Programming in general

Programming in Python

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find the second largest number in an array with at most n + log2 n - 2 comparisons

2 Answers 2

Algorithm and implementation

Programming in general

Programming in Python

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions