Is there a name for a randomized heap?

Question 1

I've made a container which functions like a Binary Heap in terms of insertions and popping the root element. The key difference is that the comparison steps are all replaced with a random call for deciding when to go left/right. (The underlying data structure is a Vector, written as an expandable array, so you don't need to know how many elements the heap will contain in advance.) The idea is, you can insert items, and when you pop them, you'll get any element at random. You can also iterate through the whole collection, after which the heap is reshuffled (by simply popping and reinserting N times).

Insertion and Removal are both O(log N), reshuffling is O(N log N). Its main use is for games, think a deck of cards or a Bingo-tumbler. It isn't really a true Heap in that there's no guarantee that elements will be less than or greater than their children. But insertion and removal are both done Heap-style, just at random. (Also, unlike a regular Heap, the items do not need to be comparable to eachother.)

Is there a standard name for this sort of thing? Or maybe a better way of achieving the same results? I'm calling it a "GrabBag" for now, since that seems to describe it pretty well, but I was just wondering if there was a more widely-recognized name for it.

I looked around for possible standard names for this, but couldn't find anything. All I could find was either Randomized Meldable Heaps, which are a different thing, or people complaining about random heap corruption, which isn't even related.

Question 2

Why do you shuffle it? If you inserted at the end and extracted random element (when you extract from the middle, you move the last element there to avoid shifting the array), it'd have O(1) insert , O(1) extraction and no need for reshuffle (I don't think the reshuffle as you defined it does anything useful) and the behaviour would be easier to understand. There would be equal probability for any element to be returned next, while the probabilities are probably skewed in some non-obvious way in the "grab-bag".

Question 3

That's because there's two ways to use it - You can pop elements from the top one at a time, which doesn't require a reshuffle, or you can use an iterator to get a whole list without popping - in the latter case you must reshuffle or you'll get the same order every time.

Question 4

But if it was an array and you just picked random element, you'd have to pop the elements if you wanted to get each element once (iterate whole array), but it would still not be any slower than the reshuffle.

Question 5

True, performance-wise, there's no real difference, but if I just wanted a random ordering without depopulating the container, I'd have to pop everything, store it all in another location, and then reinsert them all again. The iterator method is merely a convenient way to avoid all that extra code. The one-at-a-time pop would probably be the more common usage though.

Question 6

You wouldn't have to store it anywhere else. You would each time swap the selected element with the last and consider the array 1 shorter, so the used element would get beyond it. When done, you'd just mark the array the original size again as the order does not matter.

Question 7

This need comes up a lot in game programming when you want to guarantee a certain probability distribution over time. For example, if the orc should hit 30% of the time, you create a Bag with 30 hits and 70 misses, in random order, then pop a value from the end of the array each time you need one.

Usually it is implemented as an array, so that creation is O(n) (i.e. insertion is O(1)) and removal are O(1). (No reshuffle is necessary; you just recreate it when you need a new one. Your reshuffle can be implemented by pushing each popped item into a new heap as you pop it, so these are similar.) The array solution is going to be a superior performer both in retrieval time unless you can't batch inserts.

Also, I am concerned about the actual randomness of your algorithm. Unless you randomly sink non-leaf nodes during insertion, earlier nodes are going to tend to be at the top of the tree. Am I missing something?

Question 8

The difference with this is that you must know the size of the array before creating it. With my implementation, you can put any arbitrary number of elements in, either at the beginning or at any point later on. It is true that all inserted elements will end up as leaf nodes, but since popping the top causes random percolation, they might not stay down there. It means that a recently added element is less likely to reappear right away, which is not necessarily a bad thing. (Think of the shuffle feature on an MP3 player - you don't WANT the same song twice in a row.)

Question 9

But you've explained that under the covers, you use an array that gets resized. The same method can be used to resize the array here. It means an amortized cost on insertion of...hmm. I suspect it's O(1).

Question 10

I'm not sure your example about the orc's 30% hit rate is the best example - you could get close to that simply by calculating a random number at the time of attack. (e.g. if (rand(100) < 30)) It might not end up being exactly 30% due to randomness, but nobody would notice. If the Ace of Spades turns up twice in a deck of cards, however, it's a bigger problem. Of course you know the size of a standard deck in advance: 52 cards. But imagine a scenario where you don't know that, and it might change frequently over time. I'd say both methods have valid use-cases.

Question 11

Darrel, the distinction is important. Players don't want randomness--they want fairness. A really random number may miss 100% of the time despite the probability being 30%. Players often want (when you press on it) a guaranteed 30% hit rate over some number of swings--which is decidedly NOT random.

Question 12

It's a fine balance, though - Over how many hits should the 30% hit rate be a guarantee? Over too small a range (e.g. 3 out of 10), and it's only slightly random, almost predictable. Make the range too large (300 out of 1000), and it's almost no different from just purely random. In either case, it'd be very different to tell whether the chances were calculated ahead of time or on the fly. Do you expect the average player to sit and count exactly how many hits per 100 swings occur? A few might, but most wouldn't care.

Alex Feinman Alex Feinman 5,8022 gold badges30 silver badges48 bronze badges · Answer 1 · 2013-07-31 13:13:06Z

1

This need comes up a lot in game programming when you want to guarantee a certain probability distribution over time. For example, if the orc should hit 30% of the time, you create a Bag with 30 hits and 70 misses, in random order, then pop a value from the end of the array each time you need one.

Usually it is implemented as an array, so that creation is O(n) (i.e. insertion is O(1)) and removal are O(1). (No reshuffle is necessary; you just recreate it when you need a new one. Your reshuffle can be implemented by pushing each popped item into a new heap as you pop it, so these are similar.) The array solution is going to be a superior performer both in retrieval time unless you can't batch inserts.

Also, I am concerned about the actual randomness of your algorithm. Unless you randomly sink non-leaf nodes during insertion, earlier nodes are going to tend to be at the top of the tree. Am I missing something?

Share

Improve this answer

answered Jul 31, 2013 at 13:13

Alex Feinman's user avatar

Alex Feinman Alex Feinman

5,8022 gold badges30 silver badges48 bronze badges

6

The difference with this is that you must know the size of the array before creating it. With my implementation, you can put any arbitrary number of elements in, either at the beginning or at any point later on. It is true that all inserted elements will end up as leaf nodes, but since popping the top causes random percolation, they might not stay down there. It means that a recently added element is less likely to reappear right away, which is not necessarily a bad thing. (Think of the shuffle feature on an MP3 player - you don't WANT the same song twice in a row.)

Darrel Hoffman
– Darrel Hoffman

2013年07月31日 14:57:38 +00:00
Commented Jul 31, 2013 at 14:57
But you've explained that under the covers, you use an array that gets resized. The same method can be used to resize the array here. It means an amortized cost on insertion of...hmm. I suspect it's O(1).

Alex Feinman
– Alex Feinman

2013年07月31日 16:41:46 +00:00
Commented Jul 31, 2013 at 16:41
I'm not sure your example about the orc's 30% hit rate is the best example - you could get close to that simply by calculating a random number at the time of attack. (e.g. if (rand(100) < 30)) It might not end up being exactly 30% due to randomness, but nobody would notice. If the Ace of Spades turns up twice in a deck of cards, however, it's a bigger problem. Of course you know the size of a standard deck in advance: 52 cards. But imagine a scenario where you don't know that, and it might change frequently over time. I'd say both methods have valid use-cases.

Darrel Hoffman
– Darrel Hoffman

2013年07月31日 19:07:12 +00:00
Commented Jul 31, 2013 at 19:07
1

Darrel, the distinction is important. Players don't want randomness--they want fairness. A really random number may miss 100% of the time despite the probability being 30%. Players often want (when you press on it) a guaranteed 30% hit rate over some number of swings--which is decidedly NOT random.

Alex Feinman
– Alex Feinman

2013年08月01日 14:07:32 +00:00
Commented Aug 1, 2013 at 14:07
It's a fine balance, though - Over how many hits should the 30% hit rate be a guarantee? Over too small a range (e.g. 3 out of 10), and it's only slightly random, almost predictable. Make the range too large (300 out of 1000), and it's almost no different from just purely random. In either case, it'd be very different to tell whether the chances were calculated ahead of time or on the fly. Do you expect the average player to sit and count exactly how many hits per 100 swings occur? A few might, but most wouldn't care.

Darrel Hoffman
– Darrel Hoffman

2013年08月02日 02:12:53 +00:00
Commented Aug 2, 2013 at 2:12

| Show 1 more comment

Stack Exchange Network

Is there a name for a randomized heap?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is there a name for a randomized heap?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions