Python Cartesian Product in a constrained dictonary

Question 1

I want to calculate the Cartesian product of n copies of a small list, marker=[0,1,2]. I want to use these Cartesian product tuples as keys in a dictionary. The value per each key is to be a numpy array with n random floats between 0 and 1.

The only twist is that for each key:value pair in the dictionary, if the key has a non-zero number in its index a, I want the corresponding value np.array to have np.nan for the same index.

Below is the function I wrote for that. My question is whether there is a quicker / more efficient way to get the same result.

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 '''
 Create cartesian product of a the same list repeated n times
 It returns a dictionary whose keys are the cartesian products of the
 input lists. The values of the dictionary are numpy arrays of length 'n'.
 If the corresponding dictionary key element for a value is not zero, we replace the value
 with np.nan.
 Belwo is an example:
 So for some key-value pair, NaN's would be lcoated as follows: 
 d={(0,0,1): np.array([0.1234, 0.7543, np.nan]),
 (1,2,1): np.array([np.nan, np.nan, np.nan]),
 (1,0,1): np.array([np.nan, 0.2634, np.nan]),
 } 
 '''
 d = dict()
 for element in itertools.product(*[markers for i in xrange(n)]):
 d[element] = np.random.uniform(0, 1,n)
 for i in xrange(n):
 if element[i] !=0:
 d[element][i]= np.nan
 return d
rep_num = 3
marker = [0,1,2]
d = create_constrained_dict(rep_num, marker)

The output looks like this:

print d
{
 (0, 1, 1): array([ 0.84049621, nan, nan]),
 (0, 1, 2): array([ 0.17520962, nan, nan]),
 (1, 0, 1): array([ nan, 0.96110224, nan]),
 (0, 2, 1): array([ 0.10395044, nan, nan]),
 (2, 2, 0): array([ nan, nan, 0.60131589]),
 (0, 2, 0): array([ 0.64515576, nan, 0.05946614]),
 (0, 2, 2): array([ 0.02054272, nan, nan]),
 (1, 0, 0): array([ nan, 0.98472074, 0.93688277]),
 (2, 0, 1): array([ nan, 0.64348266, nan]),
 (1, 2, 0): array([ nan, nan, 0.71462777]),
 (2, 0, 0): array([ nan, 0.98370414, 0.3517195 ]),
 (1, 2, 1): array([ nan, nan, nan]),
 (0, 0, 2): array([ 0.29771489, 0.83521032, nan]),
 (2, 2, 2): array([ nan, nan, nan]),
 (1, 2, 2): array([ nan, nan, nan]),
 (2, 0, 2): array([ nan, 0.95682699, nan]),
 (0, 0, 1): array([ 0.26649784, 0.38120757, nan]),
 (0, 0, 0): array([ 0.98960411, 0.70080955, 0.25540202]),
 (2, 1, 2): array([ nan, nan, nan]),
 (1, 1, 1): array([ nan, nan, nan]),
 (0, 1, 0): array([ 0.94015447, nan, 0.56849242]),
 (1, 1, 0): array([ nan, nan, 0.30593067]),
 (2, 1, 0): array([ nan, nan, 0.74205853]),
 (2, 2, 1): array([ nan, nan, nan]),
 (2, 1, 1): array([ nan, nan, nan]),
 (1, 1, 2): array([ nan, nan, nan]),
 (1, 0, 2): array([ nan, 0.27788722, nan])
}

Question 2

Instead of itertools.product(*[markers for i in xrange(n)]) use itertools.product(markers, repeat=n)
Instead of creating three random values and replace it with nan use List Comprehensions.
dict([(key, value) for key, value in ...]) creates dict object.
[bool and [a] or [b]][0] - safer version of bool and a or b - one-linear version of:
```
if bool:
 a
else:
 b
```

And final version:

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 d = dict([(element, np.array([(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
 for i in element]))
 for element in itertools.product(markers, repeat=n)])
 return d

EDIT

Version without np.array - 2 times faster (thanks @JoeWallis):

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 d = dict([(element, [(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
 for i in element]))
 for element in itertools.product(markers, repeat=n)])
 return d

Question 3

Thank you, this looks neater. Is it faster as well? I will test.

Question 4

i == 0 and [np.random.uniform(0, 1)] or [np.nan] Ternary operator was added at 2.5 . d = dict([(...)...]) Dict comprehensions are present in 2.7 as well. Also, line break or two would not hurt.

Question 5

I tried with timeit, your solution is marginally faster as well - thanks..

Question 6

@Zhubarb I tried editing your question to be faster, but it's hard, np.array has a large overhead, and numpy doesn't implement itertools.product very efficiently either, so using a numpy solution was regularly 2 times slower.

Question 7

@JoeWallis, So have you tried a version where dict values are lists (instead of numpy arrays) and it was 2 times faster? If so, it would be very good to have it as an answer.

Question 8

Your display looked a lot like a n-d array; so I set about trying to create the same pattern, with numpy operations.

Here's what I've come up with so far:

Start with a 4d array of nan:

In [112]: z=np.ones((3,3,3,3))*np.nan

Fill selected subarrays with random numbers

In [113]: z[0,:,:,0]=np.random.rand(3,3)
In [114]: z[:,0,:,1]=np.random.rand(3,3)
In [115]: z[:,:,0,2]=np.random.rand(3,3)

Verify that the resulting array is patterned like the desired dictionary:

In [116]: for i,j,k in np.ndindex(3,3,3): 
 ...: print((i,j,k),z[i,j,k])
 ...: 
(0, 0, 0) [ 0.03527323 0.72731859 0.02793814]
(0, 0, 1) [ 0.9925641 0.47560692 nan]
(0, 0, 2) [ 0.9312088 0.35077862 nan]
(0, 1, 0) [ 0.72458335 nan 0.04496767]
(0, 1, 1) [ 0.42424677 nan nan]
(0, 1, 2) [ 0.11619154 nan nan]
(0, 2, 0) [ 0.64655329 nan 0.24431279]
....
(2, 2, 0) [ nan nan 0.81627296]
(2, 2, 1) [ nan nan nan]
(2, 2, 2) [ nan nan nan]

In 4d display:

In [117]: z
Out[117]: 
array([[[[ 0.03527323, 0.72731859, 0.02793814],
 [ 0.9925641 , 0.47560692, nan],
 [ 0.9312088 , 0.35077862, nan]],
 [[ 0.72458335, nan, 0.04496767],
 [ 0.42424677, nan, nan],
 [ 0.11619154, nan, nan]],
 ....
 [[ nan, nan, 0.81627296],
 [ nan, nan, nan],
 [ nan, nan, nan]]]])

The random fill could be written as a in iteration (details missing)

 for i in range(3):
 z[???,i] = np.random.rand(3,3)

It's probably not worth trying to avoid the loop.

intertools.product is faster than ndindex;

The iteration could also be use to map z on to a dictionary.

{(i,j,k):z[i,j,k,:] for i,j,k in np.ndindex(3,3,3)}

But I'm mostly interested in what kind of n-d array structure this problem is creating.

==================

The iterative z setting code:

In [127]: zr=np.random.rand(3,3,3)
In [128]: for i in range(3):
 ...: idx=[slice(None) for _ in range(4)]
 ...: idx[-1]=i
 ...: idx[i]=0
 ...: z[idx]=zr[i,...]

==================

while I'm at it, here's a direct-to-dictionary version:

from itertools import product
def foo(*args):
 return np.where(np.array(args)>0, np.nan, np.random.rand(3)) 
{ijk:foo(*ijk) for ijk in product(range(3),repeat=3)}

Tomasz Jakub Rup Tomasz Jakub Rup 4093 silver badges12 bronze badges · Accepted Answer · 2016-10-11 10:13:08Z

Instead of itertools.product(*[markers for i in xrange(n)]) use itertools.product(markers, repeat=n)
Instead of creating three random values and replace it with nan use List Comprehensions.
dict([(key, value) for key, value in ...]) creates dict object.
[bool and [a] or [b]][0] - safer version of bool and a or b - one-linear version of:
```
if bool:
 a
else:
 b
```

And final version:

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 d = dict([(element, np.array([(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
 for i in element]))
 for element in itertools.product(markers, repeat=n)])
 return d

EDIT

Version without np.array - 2 times faster (thanks @JoeWallis):

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 d = dict([(element, [(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
 for i in element]))
 for element in itertools.product(markers, repeat=n)])
 return d

Thank you, this looks neater. Is it faster as well? I will test.
i == 0 and [np.random.uniform(0, 1)] or [np.nan] Ternary operator was added at 2.5 . d = dict([(...)...]) Dict comprehensions are present in 2.7 as well. Also, line break or two would not hurt.
I tried with timeit, your solution is marginally faster as well - thanks..
@Zhubarb I tried editing your question to be faster, but it's hard, np.array has a large overhead, and numpy doesn't implement itertools.product very efficiently either, so using a numpy solution was regularly 2 times slower.
@JoeWallis, So have you tried a version where dict values are lists (instead of numpy arrays) and it was 2 times faster? If so, it would be very good to have it as an answer.

Stack Exchange Network

Python Cartesian Product in a constrained dictonary

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python Cartesian Product in a constrained dictonary

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions