You updated your question to say that you are trying to simulate a Chinese restaurant process. I have soto say that I can't relate the code in your question to this process, which has nothing to do with numbers that sum to a particular total. So I'm puzzled! Maybe you can say more about how the code you posted relates to this process?
Partition Count
---------- ------
1234 249231
134, 2 83453
124, 3 82664
123, 4 83569
1, 234 83161
14, 23 42126
1413, 2,24 3 41891 41688
1312, 2434 4168841726
1314, 2, 43 4156941891
1213, 34 2, 4 4172641569
12, 3, 4 42022
1, 24, 3 41522
1, 23, 4 41719
1, 2, 34 41867
1, 2, 3, 4 41792
You updated your question to say that you are trying to simulate a Chinese restaurant process. I have so say that I can't relate the code in your question to this process, which has nothing to do with numbers that sum to a particular total. So I'm puzzled! Maybe you can say more about how the code you posted relates to this process?
Partition Count
---------- ------
1234 249231
134, 2 83453
124, 3 82664
123, 4 83569
1, 234 83161
14, 23 42126
14, 2, 3 41891
13, 24 41688
13, 2, 4 41569
12, 34 41726
12, 3, 4 42022
1, 24, 3 41522
1, 23, 4 41719
1, 2, 34 41867
1, 2, 3, 4 41792
You updated your question to say that you are trying to simulate a Chinese restaurant process. I have to say that I can't relate the code in your question to this process, which has nothing to do with numbers that sum to a particular total. So I'm puzzled! Maybe you can say more about how the code you posted relates to this process?
Partition Count
---------- ------
1234 249231
134, 2 83453
124, 3 82664
123, 4 83569
1, 234 83161
14, 23 42126
13, 24 41688
12, 34 41726
14, 2, 3 41891
13, 2, 4 41569
12, 3, 4 42022
1, 24, 3 41522
1, 23, 4 41719
1, 2, 34 41867
1, 2, 3, 4 41792
5. Chinese restaurant process
You updated your question to say that you are trying to simulate a Chinese restaurant process . I have so say that I can't relate the code in your question to this process, which has nothing to do with numbers that sum to a particular total. So I'm puzzled! Maybe you can say more about how the code you posted relates to this process?
Anyway, if you want to generate a sample from this distribution, the easiest thing to do is to simulate the process directly:
def chinese_restaurant_partition(n):
"""
Return a random partition of the numbers from 1 to n generated by
the Chinese restaurant process.
"""
partition = [] # list of sets
assignment = [] # set each number is assigned to
for i in xrange(n):
r = random.randrange(i + 1)
if r < i:
# assign i + 1 to existing block
block = assignment[r]
else:
# create new block
block = set()
partition.append(block)
block.add(i + 1)
assignment.append(block)
return partition
For example:
>>> chinese_restaurant_partition(4)
[set([1]), set([2, 4]), set([3])]
This distribution looks like this (again, after generating a million samples):
Partition Count
---------- ------
1234 249231
134, 2 83453
124, 3 82664
123, 4 83569
1, 234 83161
14, 23 42126
14, 2, 3 41891
13, 24 41688
13, 2, 4 41569
12, 34 41726
12, 3, 4 42022
1, 24, 3 41522
1, 23, 4 41719
1, 2, 34 41867
1, 2, 3, 4 41792
5. Chinese restaurant process
You updated your question to say that you are trying to simulate a Chinese restaurant process . I have so say that I can't relate the code in your question to this process, which has nothing to do with numbers that sum to a particular total. So I'm puzzled! Maybe you can say more about how the code you posted relates to this process?
Anyway, if you want to generate a sample from this distribution, the easiest thing to do is to simulate the process directly:
def chinese_restaurant_partition(n):
"""
Return a random partition of the numbers from 1 to n generated by
the Chinese restaurant process.
"""
partition = [] # list of sets
assignment = [] # set each number is assigned to
for i in xrange(n):
r = random.randrange(i + 1)
if r < i:
# assign i + 1 to existing block
block = assignment[r]
else:
# create new block
block = set()
partition.append(block)
block.add(i + 1)
assignment.append(block)
return partition
For example:
>>> chinese_restaurant_partition(4)
[set([1]), set([2, 4]), set([3])]
This distribution looks like this (again, after generating a million samples):
Partition Count
---------- ------
1234 249231
134, 2 83453
124, 3 82664
123, 4 83569
1, 234 83161
14, 23 42126
14, 2, 3 41891
13, 24 41688
13, 2, 4 41569
12, 34 41726
12, 3, 4 42022
1, 24, 3 41522
1, 23, 4 41719
1, 2, 34 41867
1, 2, 3, 4 41792
- 50.1k
- 3
- 130
- 210
4. Sampling partitions uniformly
Supposing that you wanted to sample the space of ordered integer partitions uniformly, how could you implement it?
First we have to know how to count ordered integer partitions. And to do that, there's a neat combinatorial observation. Consider a group of objects waiting to be partitioned:
Four objects with three locations for partition
If there are n objects, there are n − 1 locations where you could insert a partition (indicated by the dotted lines). For example, if we insert partitions at the first and the third locations, we get the partition 1, 2, 1:
Four objects partitioned as 1, 2, 1
Each subset of locations corresponds to a different ordered partition, and each ordered partition corresponds to a different subset of locations. This means that we can choose an ordered partition uniformly by choosing a subset of locations uniformly. There are 2n − 1 subsets of the n − 1 locations (for example, 4 objects have 23 = 8 ordered partitions, as noted above), and we can choose one of these subsets uniformly by putting each location into the subset with probability 1⁄2.
Here's a revised function that implements the new algorithm:
def random_ints_with_sum(n):
"""
Generate positive random integers summing to `n`, sampled
uniformly from the ordered integer partitions of `n`.
"""
p = 0
for _ in xrange(n - 1):
p += 1
if random.randrange(2):
yield p
p = 0
yield p + 1
And here's an example distribution after a million calls to the revised function:
Partition Count
----------- ------
4, 125042
3, 1 124848
2, 2 125189
2, 1, 1 125126
1, 3 124861
1, 2, 1 125152
1, 1, 2 124248
1, 1, 1, 1 125534
You'll see that this is very close to uniform.
Again, I should emphasize that since I don't know what your function is going to be used for, I don't know whether changing it so that it samples uniformly from the ordered integer partitions is the right thing to do. But if it is, now you know how to do it.
(Who would have thought there would be so much to write about a six-line function?)
4. Sampling partitions uniformly
Supposing that you wanted to sample the space of ordered integer partitions uniformly, how could you implement it?
First we have to know how to count ordered integer partitions. And to do that, there's a neat combinatorial observation. Consider a group of objects waiting to be partitioned:
Four objects with three locations for partition
If there are n objects, there are n − 1 locations where you could insert a partition (indicated by the dotted lines). For example, if we insert partitions at the first and the third locations, we get the partition 1, 2, 1:
Four objects partitioned as 1, 2, 1
Each subset of locations corresponds to a different ordered partition, and each ordered partition corresponds to a different subset of locations. This means that we can choose an ordered partition uniformly by choosing a subset of locations uniformly. There are 2n − 1 subsets of the n − 1 locations (for example, 4 objects have 23 = 8 ordered partitions, as noted above), and we can choose one of these subsets uniformly by putting each location into the subset with probability 1⁄2.
Here's a revised function that implements the new algorithm:
def random_ints_with_sum(n):
"""
Generate positive random integers summing to `n`, sampled
uniformly from the ordered integer partitions of `n`.
"""
p = 0
for _ in xrange(n - 1):
p += 1
if random.randrange(2):
yield p
p = 0
yield p + 1
And here's an example distribution after a million calls to the revised function:
Partition Count
----------- ------
4, 125042
3, 1 124848
2, 2 125189
2, 1, 1 125126
1, 3 124861
1, 2, 1 125152
1, 1, 2 124248
1, 1, 1, 1 125534
You'll see that this is very close to uniform.
Again, I should emphasize that since I don't know what your function is going to be used for, I don't know whether changing it so that it samples uniformly from the ordered integer partitions is the right thing to do. But if it is, now you know how to do it.
(Who would have thought there would be so much to write about a six-line function?)