I want to calculate the Cartesian product of n
copies of a small list, marker=[0,1,2]
. I want to use these Cartesian product tuples as keys in a dictionary. The value per each key is to be a numpy array with n
random floats between 0 and 1.
The only twist is that for each key:value
pair in the dictionary, if the key has a non-zero number in its index a
, I want the corresponding value np.array
to have np.nan
for the same index.
Below is the function I wrote for that. My question is whether there is a quicker / more efficient way to get the same result.
import itertools
import numpy as np
def create_constrained_dict(n, markers):
'''
Create cartesian product of a the same list repeated n times
It returns a dictionary whose keys are the cartesian products of the
input lists. The values of the dictionary are numpy arrays of length 'n'.
If the corresponding dictionary key element for a value is not zero, we replace the value
with np.nan.
Belwo is an example:
So for some key-value pair, NaN's would be lcoated as follows:
d={(0,0,1): np.array([0.1234, 0.7543, np.nan]),
(1,2,1): np.array([np.nan, np.nan, np.nan]),
(1,0,1): np.array([np.nan, 0.2634, np.nan]),
}
'''
d = dict()
for element in itertools.product(*[markers for i in xrange(n)]):
d[element] = np.random.uniform(0, 1,n)
for i in xrange(n):
if element[i] !=0:
d[element][i]= np.nan
return d
rep_num = 3
marker = [0,1,2]
d = create_constrained_dict(rep_num, marker)
The output looks like this:
print d
{
(0, 1, 1): array([ 0.84049621, nan, nan]),
(0, 1, 2): array([ 0.17520962, nan, nan]),
(1, 0, 1): array([ nan, 0.96110224, nan]),
(0, 2, 1): array([ 0.10395044, nan, nan]),
(2, 2, 0): array([ nan, nan, 0.60131589]),
(0, 2, 0): array([ 0.64515576, nan, 0.05946614]),
(0, 2, 2): array([ 0.02054272, nan, nan]),
(1, 0, 0): array([ nan, 0.98472074, 0.93688277]),
(2, 0, 1): array([ nan, 0.64348266, nan]),
(1, 2, 0): array([ nan, nan, 0.71462777]),
(2, 0, 0): array([ nan, 0.98370414, 0.3517195 ]),
(1, 2, 1): array([ nan, nan, nan]),
(0, 0, 2): array([ 0.29771489, 0.83521032, nan]),
(2, 2, 2): array([ nan, nan, nan]),
(1, 2, 2): array([ nan, nan, nan]),
(2, 0, 2): array([ nan, 0.95682699, nan]),
(0, 0, 1): array([ 0.26649784, 0.38120757, nan]),
(0, 0, 0): array([ 0.98960411, 0.70080955, 0.25540202]),
(2, 1, 2): array([ nan, nan, nan]),
(1, 1, 1): array([ nan, nan, nan]),
(0, 1, 0): array([ 0.94015447, nan, 0.56849242]),
(1, 1, 0): array([ nan, nan, 0.30593067]),
(2, 1, 0): array([ nan, nan, 0.74205853]),
(2, 2, 1): array([ nan, nan, nan]),
(2, 1, 1): array([ nan, nan, nan]),
(1, 1, 2): array([ nan, nan, nan]),
(1, 0, 2): array([ nan, 0.27788722, nan])
}
2 Answers 2
Instead of
itertools.product(*[markers for i in xrange(n)])
useitertools.product(markers, repeat=n)
Instead of creating three random values and replace it with
nan
use List Comprehensions.dict([(key, value) for key, value in ...])
createsdict
object.[bool and [a] or [b]][0]
- safer version ofbool and a or b
- one-linear version of:if bool: a else: b
And final version:
import itertools
import numpy as np
def create_constrained_dict(n, markers):
d = dict([(element, np.array([(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
for i in element]))
for element in itertools.product(markers, repeat=n)])
return d
EDIT
Version without np.array
- 2 times faster (thanks @JoeWallis):
import itertools
import numpy as np
def create_constrained_dict(n, markers):
d = dict([(element, [(i == 0 and [np.random.uniform(0, 1)] or [np.nan])[0]
for i in element]))
for element in itertools.product(markers, repeat=n)])
return d
-
\$\begingroup\$ Thank you, this looks neater. Is it faster as well? I will test. \$\endgroup\$Zhubarb– Zhubarb2016年10月11日 10:21:56 +00:00Commented Oct 11, 2016 at 10:21
-
2\$\begingroup\$
i == 0 and [np.random.uniform(0, 1)] or [np.nan]
Ternary operator was added at 2.5 .d = dict([(...)...])
Dict comprehensions are present in 2.7 as well. Also, line break or two would not hurt. \$\endgroup\$Daerdemandt– Daerdemandt2016年10月11日 10:30:39 +00:00Commented Oct 11, 2016 at 10:30 -
\$\begingroup\$ I tried with
timeit
, your solution is marginally faster as well - thanks.. \$\endgroup\$Zhubarb– Zhubarb2016年10月11日 14:03:19 +00:00Commented Oct 11, 2016 at 14:03 -
\$\begingroup\$ @Zhubarb I tried editing your question to be faster, but it's hard,
np.array
has a large overhead, and numpy doesn't implementitertools.product
very efficiently either, so using a numpy solution was regularly 2 times slower. \$\endgroup\$2016年10月11日 14:40:40 +00:00Commented Oct 11, 2016 at 14:40 -
\$\begingroup\$ @JoeWallis, So have you tried a version where dict values are lists (instead of numpy arrays) and it was 2 times faster? If so, it would be very good to have it as an answer. \$\endgroup\$Zhubarb– Zhubarb2016年10月11日 15:03:50 +00:00Commented Oct 11, 2016 at 15:03
Your display looked a lot like a n-d array; so I set about trying to create the same pattern, with numpy
operations.
Here's what I've come up with so far:
Start with a 4d array of nan
:
In [112]: z=np.ones((3,3,3,3))*np.nan
Fill selected subarrays with random numbers
In [113]: z[0,:,:,0]=np.random.rand(3,3)
In [114]: z[:,0,:,1]=np.random.rand(3,3)
In [115]: z[:,:,0,2]=np.random.rand(3,3)
Verify that the resulting array is patterned like the desired dictionary:
In [116]: for i,j,k in np.ndindex(3,3,3):
...: print((i,j,k),z[i,j,k])
...:
(0, 0, 0) [ 0.03527323 0.72731859 0.02793814]
(0, 0, 1) [ 0.9925641 0.47560692 nan]
(0, 0, 2) [ 0.9312088 0.35077862 nan]
(0, 1, 0) [ 0.72458335 nan 0.04496767]
(0, 1, 1) [ 0.42424677 nan nan]
(0, 1, 2) [ 0.11619154 nan nan]
(0, 2, 0) [ 0.64655329 nan 0.24431279]
....
(2, 2, 0) [ nan nan 0.81627296]
(2, 2, 1) [ nan nan nan]
(2, 2, 2) [ nan nan nan]
In 4d display:
In [117]: z
Out[117]:
array([[[[ 0.03527323, 0.72731859, 0.02793814],
[ 0.9925641 , 0.47560692, nan],
[ 0.9312088 , 0.35077862, nan]],
[[ 0.72458335, nan, 0.04496767],
[ 0.42424677, nan, nan],
[ 0.11619154, nan, nan]],
....
[[ nan, nan, 0.81627296],
[ nan, nan, nan],
[ nan, nan, nan]]]])
The random fill could be written as a in iteration (details missing)
for i in range(3):
z[???,i] = np.random.rand(3,3)
It's probably not worth trying to avoid the loop.
intertools.product
is faster than ndindex
;
The iteration could also be use to map z
on to a dictionary.
{(i,j,k):z[i,j,k,:] for i,j,k in np.ndindex(3,3,3)}
But I'm mostly interested in what kind of n-d
array structure this problem is creating.
==================
The iterative z
setting code:
In [127]: zr=np.random.rand(3,3,3)
In [128]: for i in range(3):
...: idx=[slice(None) for _ in range(4)]
...: idx[-1]=i
...: idx[i]=0
...: z[idx]=zr[i,...]
==================
while I'm at it, here's a direct-to-dictionary version:
from itertools import product
def foo(*args):
return np.where(np.array(args)>0, np.nan, np.random.rand(3))
{ijk:foo(*ijk) for ijk in product(range(3),repeat=3)}