I have an algorithmic problem.
Given an array (or a set) $T$ of $n$ nonnegative integers. Find the maximal set $S$ of $T$ such that for all $a\in S,ドル $a\geqslant |S|$.
For example:
- If $T$=[1, 3, 4, 1, 3, 6], then $S$ can be [3, 3, 6] or [3, 4, 6] or [4, 3, 6].
- In $T$=[7, 5, 1, 1, 7, 4], then $S$ is [7, 5, 7, 4].
I have tried this recursive function.
function(T):
if minimum(T) >= length(T):
return T
else:
return function(T\minimum(T))
Is there any non-recursive algorithm. (I did not check my recursive algorithm, so it could has some flaw.)
5 Answers 5
Sort T. Then take elements while T[i] >= i+1
.
For example sorted(T)=[6,4,3,3,1,1]
. Then, T[0] = 6 > 1
, T[1] = 4 > 2
, T[2] = 3 <= 3
and finally, T[3] = 3 < 4
so we have S = [T[0], T[1], T[2]]
.
-
3$\begingroup$ This, of course, misses the other solution $\{6,3,3\},ドル but it appears the OP was looking for any solution, rather than all solutions. $\endgroup$Rick Decker– Rick Decker2016年11月14日 20:57:43 +00:00Commented Nov 14, 2016 at 20:57
-
2$\begingroup$ It gets the number of elements right. We know we have solutions with 3 elements but not with 4; in this case we have 4 elements ≥ 3, so we know we can pick any 3 of those for a solution. $\endgroup$gnasher729– gnasher7292016年11月14日 21:32:53 +00:00Commented Nov 14, 2016 at 21:32
-
3$\begingroup$ I'd appreciate an argument of correctness. $\endgroup$Raphael– Raphael2016年11月14日 22:31:51 +00:00Commented Nov 14, 2016 at 22:31
-
$\begingroup$ I think you could probably do it in O(n) time with a variant of introselect. $\endgroup$user2357112– user23571122016年11月15日 01:03:34 +00:00Commented Nov 15, 2016 at 1:03
From my comment originally: This is closely related to a quantity ubiquitous in academic productivity assessment, the Hirsh index, better known as the $h$-index. In short it is defined as the number of publications $h$ one has such that each of them has at least $h$ citations (the largest such $h$).
The only way your problem differs is that you would be interested not only in how many publications satisfy the criterion but also what their citation counts are, but that's a trivial modification. The data's already there, the original algorithm just drops it.
The generally implemented calculation is rather straightforward and agrees with Karolis Juodelė's answer.
Update: Depending on the size and character of your data, it may be worth exploring methods which partially sort the array by filtering data above and below a pivotal point (quicksort comes to mind). Then depending on whether there's too little or too many adjust the pivot and redo on the subset that contains it and so on. You don't need an order between elements higher than $h,ドル and certainly not between those lower than that. So for example, once you found all elements greater or equal to $h_1$ and there's less than $h_1$ of them, you don't need to touch that subset again, just add to it. This converts the recursion inherent to quicksort to a tail recursion and thus can be rewritten as a loop.
My Haskell is a bit rusty but this should do what I described above and seems to work. Hope it can be understood to some degree, I am happy to provide further explanation.
-- just a utility function
merge :: [a] -> [a] -> [a]
merge [] ys = ys
merge (x:xs) ys = x : merge xs ys
-- the actual implementation
topImpl :: [Int] -> [Int] -> [Int]
topImpl [] granted = granted
topImpl (x:xs) granted
| x == (1 + lGreater + lGranted) = x : merge greater granted
| x > (1 + lGreater + lGranted) = topImpl smaller (x : merge greater granted)
| otherwise = topImpl greater granted
where smaller = [y | y <- xs, y < x]
greater = [y | y <- xs, y >= x]
lGreater = length greater
lGranted = length granted
-- starting point is: top of whole array, granted is empty
top :: [Int] -> [Int]
top arr = topImpl arr []
The idea is to collect in granted
what you know will definitely participate in the result, and not sort it any further. If greater
together with x
fits, we're lucky, otherwise we need to try with a smaller subset. (The pivot x
is simply whatever happened to be the first item of the sublist that's currently considered.) Note that the significant advantage against taking largest elements one by one is that we do this on blocks of average size $remaining/2$ and don't need to sort them further.
Example:
Let's take your set [1,3,4,1,3,6]
.
x = 1
,granted = []
,greater = [3,4,1,3,6]
. Ouch, we hit a pathological case when the pivot is too small (actually so small thatsmaller
is empty) right in the first step. Luckily our algo is ready for that. It discardsx
and tries again withgreater
alone.x = 3
,granted = []
,greater = [4,3,6]
. Together, they form an array of length 4 but we only have that limited from below by 3 so that's too much. Repeat ongreater
alone.x = 4
,granted = []
,greater = [6]
. This gives an array of 2 elements ≥ 4 each, seems we might have use for some more of them. Keep this and repeat onsmaller = [3]
.x = 3
,granted = [4,6]
,greater = []
. This together gives an array of 3 elements ≥ 3 each, so we have our solution[3,4,6]
and we can return. (Note that the permutation may vary depending on the ordering of the input, but will always contain the highest possible terms, never[3,3,6]
or[3,3,4]
for your example.)
(Btw. note that the recursion indeed just collapsed to a cycle.) The complexity is somewhat better than quicksort because of the many saved comparisons:
Best case (e.g. [2,2,1,1,1,1]): a single step, $n-1$ comparisons
Average case: $O(\log n)$ steps, $O(n)$ comparisons in total
Worst case (e.g. [1,1,1,1,1,1]): $n$ steps, $O(n^2)$ comparisons in total
There are a few needless comparisons in the code above, like calculating smaller
whether we need it or not, they can be easily removed. (I think lazy evaluation will take care of that though.)
There is nothing wrong with your algorithm, and of course most of recursive algorithms can be converted into loops, here a loop version of your recursive code:
function(T):
while minimum(T) <= lenght(T):
remove minimum(T) from T
loop
-
6$\begingroup$ All recursive algorithms can be converted to loops. After all, a Turing machine knows nothing about recursion. $\endgroup$David Richerby– David Richerby2016年11月15日 13:48:23 +00:00Commented Nov 15, 2016 at 13:48
Any recursive algorithm can be rewritten to use iteration. After all, a Turing machine knows nothing about recursion but can implement any algorithm. In principle, you can rewrite your recursive function by writing your own stack manipulation code to remember the values of the function's parameters and any local variables it might have. In this particular case, your function is tail-recursive (once a recursive call returns, the thing that called it immediately returns, too) so you don't even need to maintain the stack.
Use a min-heap to do a partial heapsort, since you don't need the whole array to be sorted.
Keep popping elements greedily until you exceed the threshold given.
-
2$\begingroup$ Here also, I'd appreciate an idea of correctness. $\endgroup$Raphael– Raphael2016年11月15日 21:17:43 +00:00Commented Nov 15, 2016 at 21:17