5
\$\begingroup\$

I want to calculate the Cartesian product of n copies of a small list, marker=[0,1,2]. I want to use these Cartesian product tuples as keys in a dictionary. The value per each key is to be a numpy array with n random floats between 0 and 1.

The only twist is that for each key:value pair in the dictionary, if the key has a non-zero number in its index a, I want the corresponding value np.array to have np.nan for the same index.

Below is the function I wrote for that. My question is whether there is a quicker / more efficient way to get the same result.

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 '''
 Create cartesian product of a the same list repeated n times
 It returns a dictionary whose keys are the cartesian products of the
 input lists. The values of the dictionary are numpy arrays of length 'n'.
 If the corresponding dictionary key element for a value is not zero, we replace the value
 with np.nan.
 Belwo is an example:
 So for some key-value pair, NaN's would be lcoated as follows: 
 d={(0,0,1): np.array([0.1234, 0.7543, np.nan]),
 (1,2,1): np.array([np.nan, np.nan, np.nan]),
 (1,0,1): np.array([np.nan, 0.2634, np.nan]),
 } 
 '''
 d = dict()
 for element in itertools.product(*[markers for i in xrange(n)]):
 d[element] = np.random.uniform(0, 1,n)
 for i in xrange(n):
 if element[i] !=0:
 d[element][i]= np.nan
 return d
rep_num = 3
marker = [0,1,2]
d = create_constrained_dict(rep_num, marker)

The output looks like this:

print d
{
 (0, 1, 1): array([ 0.84049621, nan, nan]),
 (0, 1, 2): array([ 0.17520962, nan, nan]),
 (1, 0, 1): array([ nan, 0.96110224, nan]),
 (0, 2, 1): array([ 0.10395044, nan, nan]),
 (2, 2, 0): array([ nan, nan, 0.60131589]),
 (0, 2, 0): array([ 0.64515576, nan, 0.05946614]),
 (0, 2, 2): array([ 0.02054272, nan, nan]),
 (1, 0, 0): array([ nan, 0.98472074, 0.93688277]),
 (2, 0, 1): array([ nan, 0.64348266, nan]),
 (1, 2, 0): array([ nan, nan, 0.71462777]),
 (2, 0, 0): array([ nan, 0.98370414, 0.3517195 ]),
 (1, 2, 1): array([ nan, nan, nan]),
 (0, 0, 2): array([ 0.29771489, 0.83521032, nan]),
 (2, 2, 2): array([ nan, nan, nan]),
 (1, 2, 2): array([ nan, nan, nan]),
 (2, 0, 2): array([ nan, 0.95682699, nan]),
 (0, 0, 1): array([ 0.26649784, 0.38120757, nan]),
 (0, 0, 0): array([ 0.98960411, 0.70080955, 0.25540202]),
 (2, 1, 2): array([ nan, nan, nan]),
 (1, 1, 1): array([ nan, nan, nan]),
 (0, 1, 0): array([ 0.94015447, nan, 0.56849242]),
 (1, 1, 0): array([ nan, nan, 0.30593067]),
 (2, 1, 0): array([ nan, nan, 0.74205853]),
 (2, 2, 1): array([ nan, nan, nan]),
 (2, 1, 1): array([ nan, nan, nan]),
 (1, 1, 2): array([ nan, nan, nan]),
 (1, 0, 2): array([ nan, 0.27788722, nan])
}
asked Oct 10, 2016 at 11:53
\$\endgroup\$
0

2 Answers 2

4
\$\begingroup\$
  • Instead of itertools.product(*[markers for i in xrange(n)]) use itertools.product(markers, repeat=n)

  • Instead of creating three random values and replace it with nan use List Comprehensions.

  • dict([(key, value) for key, value in ...]) creates dict object.

  • [bool and [a] or [b]][0] - safer version of bool and a or b - one-linear version of:

    if bool:
     a
    else:
     b
    

And final version:

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 d = dict([(element, np.array([(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
 for i in element]))
 for element in itertools.product(markers, repeat=n)])
 return d

EDIT

Version without np.array - 2 times faster (thanks @JoeWallis):

import itertools
import numpy as np
def create_constrained_dict(n, markers):
 d = dict([(element, [(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
 for i in element]))
 for element in itertools.product(markers, repeat=n)])
 return d
answered Oct 11, 2016 at 10:13
\$\endgroup\$
5
  • \$\begingroup\$ Thank you, this looks neater. Is it faster as well? I will test. \$\endgroup\$ Commented Oct 11, 2016 at 10:21
  • 2
    \$\begingroup\$ i == 0 and [np.random.uniform(0, 1)] or [np.nan] Ternary operator was added at 2.5 . d = dict([(...)...]) Dict comprehensions are present in 2.7 as well. Also, line break or two would not hurt. \$\endgroup\$ Commented Oct 11, 2016 at 10:30
  • \$\begingroup\$ I tried with timeit, your solution is marginally faster as well - thanks.. \$\endgroup\$ Commented Oct 11, 2016 at 14:03
  • \$\begingroup\$ @Zhubarb I tried editing your question to be faster, but it's hard, np.array has a large overhead, and numpy doesn't implement itertools.product very efficiently either, so using a numpy solution was regularly 2 times slower. \$\endgroup\$ Commented Oct 11, 2016 at 14:40
  • \$\begingroup\$ @JoeWallis, So have you tried a version where dict values are lists (instead of numpy arrays) and it was 2 times faster? If so, it would be very good to have it as an answer. \$\endgroup\$ Commented Oct 11, 2016 at 15:03
2
\$\begingroup\$

Your display looked a lot like a n-d array; so I set about trying to create the same pattern, with numpy operations.

Here's what I've come up with so far:

Start with a 4d array of nan:

In [112]: z=np.ones((3,3,3,3))*np.nan

Fill selected subarrays with random numbers

In [113]: z[0,:,:,0]=np.random.rand(3,3)
In [114]: z[:,0,:,1]=np.random.rand(3,3)
In [115]: z[:,:,0,2]=np.random.rand(3,3)

Verify that the resulting array is patterned like the desired dictionary:

In [116]: for i,j,k in np.ndindex(3,3,3): 
 ...: print((i,j,k),z[i,j,k])
 ...: 
(0, 0, 0) [ 0.03527323 0.72731859 0.02793814]
(0, 0, 1) [ 0.9925641 0.47560692 nan]
(0, 0, 2) [ 0.9312088 0.35077862 nan]
(0, 1, 0) [ 0.72458335 nan 0.04496767]
(0, 1, 1) [ 0.42424677 nan nan]
(0, 1, 2) [ 0.11619154 nan nan]
(0, 2, 0) [ 0.64655329 nan 0.24431279]
....
(2, 2, 0) [ nan nan 0.81627296]
(2, 2, 1) [ nan nan nan]
(2, 2, 2) [ nan nan nan]

In 4d display:

In [117]: z
Out[117]: 
array([[[[ 0.03527323, 0.72731859, 0.02793814],
 [ 0.9925641 , 0.47560692, nan],
 [ 0.9312088 , 0.35077862, nan]],
 [[ 0.72458335, nan, 0.04496767],
 [ 0.42424677, nan, nan],
 [ 0.11619154, nan, nan]],
 ....
 [[ nan, nan, 0.81627296],
 [ nan, nan, nan],
 [ nan, nan, nan]]]])

The random fill could be written as a in iteration (details missing)

 for i in range(3):
 z[???,i] = np.random.rand(3,3)

It's probably not worth trying to avoid the loop.

intertools.product is faster than ndindex;

The iteration could also be use to map z on to a dictionary.

{(i,j,k):z[i,j,k,:] for i,j,k in np.ndindex(3,3,3)}

But I'm mostly interested in what kind of n-d array structure this problem is creating.

==================

The iterative z setting code:

In [127]: zr=np.random.rand(3,3,3)
In [128]: for i in range(3):
 ...: idx=[slice(None) for _ in range(4)]
 ...: idx[-1]=i
 ...: idx[i]=0
 ...: z[idx]=zr[i,...]

==================

while I'm at it, here's a direct-to-dictionary version:

from itertools import product
def foo(*args):
 return np.where(np.array(args)>0, np.nan, np.random.rand(3)) 
{ijk:foo(*ijk) for ijk in product(range(3),repeat=3)}
answered Oct 12, 2016 at 5:33
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.