Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

In-place sampling #8

Open
Open
@make-github-pseudonymous-again

Description

If you want to sample k items out of n, Fisher Yates shuffles your item array.
Hereunder is an efficient no shuffle implementation. Something better can probably be achieved with prefix sums and rank/select.

Theorem

We can sample k items out of n with k random draws and O(k log k) time, without modifying the items array.

Proof

TODO fix proof

Given n items, at draw i=1,2,...,n with a binary search tree containing all previously drawn items indices between 0 and n-1, pick a random integer x between 0 and n - i. Binary search for its corresponding index value as follows: start with p = 0 the number of predecessors of the current node, let l be the size of the left subtree of the current node, let v be the value of the current node, if x + p + l + 1 < v
the current node becomes the left child, otherwise the current node becomes the right child and p = p + l + 1. If there is no such child, insert x + p in the case of a left child, x + p + l + 1 in the case of a right child.

If x + p + l + 1 < v, then x cannot be in the right subtree of v. The item index corresponding to x would be x + p + l + 1 if it was the right child of v which is smaller than v. Pick any node c in the right subtree of v. Let g be the number of predecessors of c in the right subtree of v. The value of c is at least v + 1 + g because all indices are distinct. Suppose c has no right child. The item index of x as a right child of c is x + p + l + 1 + g + 1 < v + g + 1 <= c. Suppose c has no left child. The item index of x as a left child of c is x + p + l + 1 + g < v + g < v + g + 1 <= c.

If x + p + l + 1 > v, then x cannot be in the left subtree of v. If v has no left child, then l = 0 and the value of x as the left child of v would be x + p which is larger or equal to v (indices must be distinct). Pick any node c in the left subtree of v.
Let s be the number of successors of s in the left subtree of v. The value of c is at most v - s - 1 because all indices are distinct. Suppose c has no left child. The item index of x as a left child of c is x + p + l - s - 1 > v - s - 2 so x + p + l - s - 1 >= v - s - 1 = c.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      AltStyle によって変換されたページ (->オリジナル) /