1
$\begingroup$

Question: Suppose you are a given a csv with N lines of the form StudentID, score. Multiple lines can have same student-ID. Find the student with maximum average score

I can't think of any way to make it faster than O(N*logN).

  1. Maintain a hash-table that keeps track of STudentID = Key and Value being an object with sum of scores and number of scores seen with that studentID --> O(N)

  2. Create an array with values in hash table, maybe convert each entry into average while doing so

  3. Sort it worst-case N(logN), all entries belong to same student

Is there any way to do it faster?

Raphael
73.3k31 gold badges183 silver badges403 bronze badges
asked Sep 25, 2016 at 22:50
$\endgroup$
2
  • $\begingroup$ Note that hashtables don't usually have $O(1)$ worst-case performance. $\endgroup$ Commented Sep 28, 2016 at 6:14
  • 4
    $\begingroup$ Finding maximums is trivially in $O(n)$. I don't get what your problem is. $\endgroup$ Commented Sep 28, 2016 at 6:14

2 Answers 2

5
$\begingroup$

You are on a good path, but seem to be confused for no reason. After step 2 you have created an array with average scores. This is a worst case O(N) size array. What is the complexity of finding the highest average score now? Do you need to sort the array?

answered Sep 25, 2016 at 23:11
$\endgroup$
1
  • $\begingroup$ What's the complexity if you insist on sorting the array, for no good reason, and what is the complexity if you focus on the actual problem, which is finding the highest score? $\endgroup$ Commented Sep 28, 2016 at 10:05
0
$\begingroup$

@megas has the right approach. All I want to add is another, yet more involved solution:

  • you can calculate averages online, instead of computing it at the end. You need to maintain the count and average. For every score update and before updating the count, new_avg = (old_avg*count + score)/(count + 1). Then update the count.
  • now that you have all student averages at any point in time, you can take advantage of order-maintenance solutions to keep your set of student scores ordered at all times.

Note that by doing this, you're paying the time tax on keeping all student scores sorted. That's why @megas has the better solution because you're interested in the max score.

answered Sep 28, 2016 at 3:08
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.