This exercise is from Roughgarden's (excellent) course on algorithms on Coursera:
You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most \$n + \log_2(n) - 2 \$ comparisons.
Note that it doesn't ask just ask to solve the problem in linear time. If we use \2ドル n\$ comparisons the "big-O" time complexity would be linear but greater than \$n + \log_2(n) - 2\$.
Any feedback on the algorithm, implementation, style, and Python is very welcome.
import random
import sys
from math import log
count = 0
def is_greater(a, b):
global count
count += 1
return a > b
def _get_highest_two(arr0, arr1):
if is_greater(arr0[0], arr1[0]):
first = arr0[0]
seconds = arr0[1]
seconds.append(arr1[0])
else:
first = arr1[0]
seconds = arr1[1]
seconds.append(arr0[0])
return (first, seconds)
def get_second(*args):
if len(args) == 1:
seconds = args[0][1]
second_best = seconds.pop()
for sb in seconds:
if is_greater(sb, second_best):
second_best = sb
return second_best
_pairs = zip(*[iter(args)]*2)
_out = [_get_highest_two(*p) for p in _pairs]
return get_second(*_out)
def main():
n = int(sys.argv[1])
arr = [random.randint(0,100000000) for _ in xrange(n)]
print arr
print n, sorted(arr)[-2]
print count
# is_greater(0,0)
_pairs = zip(*[iter(arr)]*2)
_pairs = [(a, [b]) if is_greater(a, b) else (b, [a]) for (a, b) in _pairs]
print get_second(*_pairs)
print count, n + log(n, 2) - 2
return 0
if __name__ == '__main__':
sys.exit(main())
2 Answers 2
Algorithm and implementation
The algorithm is correct. Essentially it compares the elements pairwise, recursively until all pairs are eliminated, while at each comparison, tracking the compared elements in a list. At the end, the algorithm finds the maximum value, and a list of \$\log_2(n)-1\$ elements, which must contain the 2nd maximum value.
The implementation looks more complicated than it needs to be, and I find it hard to read. I suggest to organize and implement differently:
Create a function
find_max_with_tracking
that takes a list of values and returns a tuple of the maximum value and the list of values the maximum was compared to:- Map the input list of values to a list of tuples:
[(x, []) for x in input]
- Recursively reduce the list of tuples, comparing then pairwise, keeping the tuple with the higher value, and adding to its list the value of the other tuple. This could be implemented either with a recursive function, or with a loop.
- Return the single tuple that remains, containing the maximum element, and a list of \$\log_2(n)-1\$ values that must contain the second largest value.
- Map the input list of values to a list of tuples:
- This point is reached by
n
- 1 comparisons, and we have \$\log_2(n)\$ values to scan to find the second largest value. You can use a simple linear scan for this step.
This implementation should be simpler and easier to read.
Programming in general
Avoid global variables as much as possible. If you need to track something, such as the count in this example, then you can wrap the operation in an abstract data type that manages the state and operations (functions) on the state.
Excessive array creation in a loop or recursive function as in get_second
is usually a bad sign, to investigate closely and try to eliminate. In this example this was an important hint that the implementation is probably inefficient.
Programming in Python
The implementation uses Python 2 features that are obsolete today, such as print
without parentheses (the old keyword instead of the modern function) and xrange
instead of range
. I suggest to test your scripts using Python 3.
-
\$\begingroup\$ One more question about using the "abstract data type" [rather than an evil global variable] to keep track of the number of time a function is called: are you suggesting to use a mutable object - passed as an additional argument - to keep track of that? \$\endgroup\$ajeje– ajeje2018年07月14日 11:20:41 +00:00Commented Jul 14, 2018 at 11:20
I didn't dive into the algorithm, but from understanding and syntactical point of view, this is what I'd complain about in a professional code review (in order of priority):
- No comments on what the script or any of its parts are about.
- No error handling whatsoever.
- Globals. If you need to keep state, pass it around or create a class to hold it. This also applies to small scripts like this one. You can never know whether you'll ever need to extend or reuse it.
- Inconsistent parameter types. In one function you expect arrays (I guess you rather mean iterables), in another a variable number of parameters. Pls stick to either.
- Some variables have non-descriptive names, e.g. what does
count
represent? count of what? What isseconds
? Probably not time. - Why are some of the local variables underscore prefixed?
- Missing shebang. It's not obvious which Python version you're using.
- Some newlines missing.
- Missing spaces around operators.
- Some unnecessary newlines (e.g. in
main()
)
The last four points can be corrected automatically by using a proper editor with PEP8 integration. Any solid linter would also complain about the missing comments and globals.
Explore related questions
See similar questions with these tags.