I've made a container which functions like a Binary Heap in terms of insertions and popping the root element. The key difference is that the comparison steps are all replaced with a random call for deciding when to go left/right. (The underlying data structure is a Vector, written as an expandable array, so you don't need to know how many elements the heap will contain in advance.) The idea is, you can insert items, and when you pop them, you'll get any element at random. You can also iterate through the whole collection, after which the heap is reshuffled (by simply popping and reinserting N times).
Insertion and Removal are both O(log N), reshuffling is O(N log N). Its main use is for games, think a deck of cards or a Bingo-tumbler. It isn't really a true Heap in that there's no guarantee that elements will be less than or greater than their children. But insertion and removal are both done Heap-style, just at random. (Also, unlike a regular Heap, the items do not need to be comparable to eachother.)
Is there a standard name for this sort of thing? Or maybe a better way of achieving the same results? I'm calling it a "GrabBag" for now, since that seems to describe it pretty well, but I was just wondering if there was a more widely-recognized name for it.
I looked around for possible standard names for this, but couldn't find anything. All I could find was either Randomized Meldable Heaps, which are a different thing, or people complaining about random heap corruption, which isn't even related.
1 Answer 1
This need comes up a lot in game programming when you want to guarantee a certain probability distribution over time. For example, if the orc should hit 30% of the time, you create a Bag with 30 hits and 70 misses, in random order, then pop a value from the end of the array each time you need one.
Usually it is implemented as an array, so that creation is O(n)
(i.e. insertion is O(1)
) and removal are O(1)
. (No reshuffle is necessary; you just recreate it when you need a new one. Your reshuffle can be implemented by pushing each popped item into a new heap as you pop it, so these are similar.) The array solution is going to be a superior performer both in retrieval time unless you can't batch inserts.
Also, I am concerned about the actual randomness of your algorithm. Unless you randomly sink non-leaf nodes during insertion, earlier nodes are going to tend to be at the top of the tree. Am I missing something?
-
The difference with this is that you must know the size of the array before creating it. With my implementation, you can put any arbitrary number of elements in, either at the beginning or at any point later on. It is true that all inserted elements will end up as leaf nodes, but since popping the top causes random percolation, they might not stay down there. It means that a recently added element is less likely to reappear right away, which is not necessarily a bad thing. (Think of the shuffle feature on an MP3 player - you don't WANT the same song twice in a row.)Darrel Hoffman– Darrel Hoffman2013年07月31日 14:57:38 +00:00Commented Jul 31, 2013 at 14:57
-
But you've explained that under the covers, you use an array that gets resized. The same method can be used to resize the array here. It means an amortized cost on insertion of...hmm. I suspect it's O(1).Alex Feinman– Alex Feinman2013年07月31日 16:41:46 +00:00Commented Jul 31, 2013 at 16:41
-
I'm not sure your example about the orc's 30% hit rate is the best example - you could get close to that simply by calculating a random number at the time of attack. (e.g.
if (rand(100) < 30)
) It might not end up being exactly 30% due to randomness, but nobody would notice. If the Ace of Spades turns up twice in a deck of cards, however, it's a bigger problem. Of course you know the size of a standard deck in advance: 52 cards. But imagine a scenario where you don't know that, and it might change frequently over time. I'd say both methods have valid use-cases.Darrel Hoffman– Darrel Hoffman2013年07月31日 19:07:12 +00:00Commented Jul 31, 2013 at 19:07 -
1Darrel, the distinction is important. Players don't want randomness--they want fairness. A really random number may miss 100% of the time despite the probability being 30%. Players often want (when you press on it) a guaranteed 30% hit rate over some number of swings--which is decidedly NOT random.Alex Feinman– Alex Feinman2013年08月01日 14:07:32 +00:00Commented Aug 1, 2013 at 14:07
-
It's a fine balance, though - Over how many hits should the 30% hit rate be a guarantee? Over too small a range (e.g. 3 out of 10), and it's only slightly random, almost predictable. Make the range too large (300 out of 1000), and it's almost no different from just purely random. In either case, it'd be very different to tell whether the chances were calculated ahead of time or on the fly. Do you expect the average player to sit and count exactly how many hits per 100 swings occur? A few might, but most wouldn't care.Darrel Hoffman– Darrel Hoffman2013年08月02日 02:12:53 +00:00Commented Aug 2, 2013 at 2:12
O(1)
insert ,O(1)
extraction and no need for reshuffle (I don't think the reshuffle as you defined it does anything useful) and the behaviour would be easier to understand. There would be equal probability for any element to be returned next, while the probabilities are probably skewed in some non-obvious way in the "grab-bag".