6
\$\begingroup\$

This exercise is from Roughgarden's (excellent) course on algorithms on Coursera:

You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most \$n + \log_2(n) - 2 \$ comparisons.

Note that it doesn't ask just ask to solve the problem in linear time. If we use \2ドル n\$ comparisons the "big-O" time complexity would be linear but greater than \$n + \log_2(n) - 2\$.

Any feedback on the algorithm, implementation, style, and Python is very welcome.

import random
import sys
from math import log
count = 0
def is_greater(a, b):
 global count
 count += 1
 return a > b
def _get_highest_two(arr0, arr1):
 if is_greater(arr0[0], arr1[0]):
 first = arr0[0]
 seconds = arr0[1]
 seconds.append(arr1[0])
 else:
 first = arr1[0]
 seconds = arr1[1]
 seconds.append(arr0[0])
 return (first, seconds)
def get_second(*args):
 if len(args) == 1:
 seconds = args[0][1]
 second_best = seconds.pop()
 for sb in seconds:
 if is_greater(sb, second_best):
 second_best = sb
 return second_best
 _pairs = zip(*[iter(args)]*2)
 _out = [_get_highest_two(*p) for p in _pairs]
 return get_second(*_out)
def main():
 n = int(sys.argv[1])
 arr = [random.randint(0,100000000) for _ in xrange(n)]
 print arr
 print n, sorted(arr)[-2]
 print count
 # is_greater(0,0)
 _pairs = zip(*[iter(arr)]*2)
 _pairs = [(a, [b]) if is_greater(a, b) else (b, [a]) for (a, b) in _pairs]
 print get_second(*_pairs)
 print count, n + log(n, 2) - 2
 return 0
if __name__ == '__main__':
 sys.exit(main())
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jul 13, 2018 at 16:22
\$\endgroup\$

2 Answers 2

5
\$\begingroup\$

Algorithm and implementation

The algorithm is correct. Essentially it compares the elements pairwise, recursively until all pairs are eliminated, while at each comparison, tracking the compared elements in a list. At the end, the algorithm finds the maximum value, and a list of \$\log_2(n)-1\$ elements, which must contain the 2nd maximum value.

The implementation looks more complicated than it needs to be, and I find it hard to read. I suggest to organize and implement differently:

  • Create a function find_max_with_tracking that takes a list of values and returns a tuple of the maximum value and the list of values the maximum was compared to:

    • Map the input list of values to a list of tuples: [(x, []) for x in input]
    • Recursively reduce the list of tuples, comparing then pairwise, keeping the tuple with the higher value, and adding to its list the value of the other tuple. This could be implemented either with a recursive function, or with a loop.
    • Return the single tuple that remains, containing the maximum element, and a list of \$\log_2(n)-1\$ values that must contain the second largest value.
  • This point is reached by n - 1 comparisons, and we have \$\log_2(n)\$ values to scan to find the second largest value. You can use a simple linear scan for this step.

This implementation should be simpler and easier to read.

Programming in general

Avoid global variables as much as possible. If you need to track something, such as the count in this example, then you can wrap the operation in an abstract data type that manages the state and operations (functions) on the state.

Excessive array creation in a loop or recursive function as in get_second is usually a bad sign, to investigate closely and try to eliminate. In this example this was an important hint that the implementation is probably inefficient.

Programming in Python

The implementation uses Python 2 features that are obsolete today, such as print without parentheses (the old keyword instead of the modern function) and xrange instead of range. I suggest to test your scripts using Python 3.

answered Jul 14, 2018 at 8:15
\$\endgroup\$
1
  • \$\begingroup\$ One more question about using the "abstract data type" [rather than an evil global variable] to keep track of the number of time a function is called: are you suggesting to use a mutable object - passed as an additional argument - to keep track of that? \$\endgroup\$ Commented Jul 14, 2018 at 11:20
2
\$\begingroup\$

I didn't dive into the algorithm, but from understanding and syntactical point of view, this is what I'd complain about in a professional code review (in order of priority):

  • No comments on what the script or any of its parts are about.
  • No error handling whatsoever.
  • Globals. If you need to keep state, pass it around or create a class to hold it. This also applies to small scripts like this one. You can never know whether you'll ever need to extend or reuse it.
  • Inconsistent parameter types. In one function you expect arrays (I guess you rather mean iterables), in another a variable number of parameters. Pls stick to either.
  • Some variables have non-descriptive names, e.g. what does count represent? count of what? What is seconds? Probably not time.
  • Why are some of the local variables underscore prefixed?
  • Missing shebang. It's not obvious which Python version you're using.
  • Some newlines missing.
  • Missing spaces around operators.
  • Some unnecessary newlines (e.g. in main())

The last four points can be corrected automatically by using a proper editor with PEP8 integration. Any solid linter would also complain about the missing comments and globals.

answered Jul 15, 2018 at 12:04
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.