I have an array:
x = np.array([[1, 2, 3], [4, 5, 6]])
and I want to create another array of shape=(1, 1)
and dtype=np.object
whose only element is x.
I've tried this code:
a = np.array([[x]], dtype=np.object)
but it produces an array of shape (1, 1, 2, 3)
.
Of course I can do:
a = np.zeros(shape=(1, 1), dtype=np.object)
a[0, 0] = x
but I want the solution to be easily scalable to greater a
shapes, like:
[[x, x], [x, x]]
without having to run for
loops over all the indices.
Any suggestions how this could be achieved?
UPD1
The arrays may be different, as in:
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[7, 8, 9], [0, 1, 2]])
u = np.array([[3, 4, 5], [6, 7, 8]])
v = np.array([[9, 0, 1], [2, 3, 4]])
[[x, y], [u, v]]
They may also be of different shapes, but for that case a simple np.array([[x, y], [u, v]])
constructor works fine
UPD2
I really want a solution that works with arbitrary x, y, u, v
shapes, not necessarily all the same.
4 Answers 4
Found a solution myself:
a=np.zeros(shape=(2, 2), dtype=np.object)
a[:] = [[x, x], [x, x]]
Comments
a = np.empty(shape=(2, 2), dtype=np.object)
a.fill(x)
2 Comments
[[x, y], [u, v]]
. The original problem for me was that the result depended on whether all the input arrays have the same shape or not.fill
puts the same pointer to x
in all 4 slots. It has the danger as the list [mutable_object]*4
replication.Here is a pretty general method: It works with nested lists, lists of lists of arrays - regardless of whether the shapes of these arrays are different or equal. It also works when the data come clumped together in one single array, which is in fact the trickiest case. (Other methods posted so far will not work in this case.)
Let's start with the difficult case, one big array:
# create example
# pick outer shape and inner shape
>>> osh, ish = (2, 3), (2, 5)
# total shape
>>> tsh = (*osh, *ish)
# make data
>>> data = np.arange(np.prod(tsh)).reshape(tsh)
>>>
# recalculate inner shape to cater for different inner shapes
# this will return the consensus bit of all inner shapes
>>> ish = np.shape(data)[len(osh):]
>>>
# block them
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>>
# admire
>>> data_blocked
array([[array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]),
array([[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]]),
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])],
[array([[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]]),
array([[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49]]),
array([[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]])]], dtype=object)
Using OP's example which is a list of lists of arrays:
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [0, 1, 2]])
>>> u = np.array([[3, 4, 5], [6, 7, 8]])
>>> v = np.array([[9, 0, 1], [2, 3, 4]])
>>> data = [[x, y], [u, v]]
>>>
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>>
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> data_blocked
array([[array([[1, 2, 3],
[4, 5, 6]]),
array([[7, 8, 9],
[0, 1, 2]])],
[array([[3, 4, 5],
[6, 7, 8]]),
array([[9, 0, 1],
[2, 3, 4]])]], dtype=object)
And an example with different shape subarrays (note the v.T
):
>>> data = [[x, y], [u, v.T]]
>>>
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)>>> data_blocked
array([[array([[1, 2, 3],
[4, 5, 6]]),
array([[7, 8, 9],
[0, 1, 2]])],
[array([[3, 4, 5],
[6, 7, 8]]),
array([[9, 2],
[0, 3],
[1, 4]])]], dtype=object)
3 Comments
x, y, u, v
shapes, not necessarily all the same. Apologies for not stating it clearly in the OP.ndindex
instead. I think it's a little easier to understand. But what really matters is whether one is more general than the other.@PaulPanzer's use of np.frompyfunc
is clever, but all that reshaping
and use of __getitem__
makes it hard to understand:
Separating the function creation from application might help:
func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
newarr = func(range(np.prod(osh))).reshape(osh)
This highlights the separation between the ish
dimensions and the osh
ones.
I also suspect a lambda
function could substitute for the __getitem__
.
This works because frompyfunc
returns an object dtype array. np.vectorize
also uses frompyfunc
but lets us specify a different otype
. But both pass a scalar to the function, which is why Paul's approach uses a flattened range
and getitem
. np.vectorize
with a signature
lets us pass an array to the function, but it uses a ndindex
iteration instead of frompyfunc
.
Inspired by that, here's a np.empty
plus fill method - but with ndindex
as the iterator:
In [385]: >>> osh, ish = (2, 3), (2, 5)
...: >>> tsh = (*osh, *ish)
...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
...: >>> ish = np.shape(data)[len(osh):]
...:
In [386]: tsh
Out[386]: (2, 3, 2, 5)
In [387]: ish
Out[387]: (2, 5)
In [388]: osh
Out[388]: (2, 3)
In [389]: res = np.empty(osh, object)
In [390]: for idx in np.ndindex(osh):
...: res[idx] = data[idx]
...:
In [391]: res
Out[391]:
array([[array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]),
....
[55, 56, 57, 58, 59]])]], dtype=object)
For the second example:
In [399]: arr = np.array(data)
In [400]: arr.shape
Out[400]: (2, 2, 2, 3)
In [401]: res = np.empty(osh, object)
In [402]: for idx in np.ndindex(osh):
...: res[idx] = arr[idx]
In the third case, np.array(data)
already creates the desired (2,2) object dtype array. This res
create and fill still works, even though it produces the same thing.
Speed isn't very different (though this example is small)
In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
49.8 μs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [416]: %%timeit
...: arr = np.array(data)
...: res = np.empty(osh, object)
...: for idx in np.ndindex(osh): res[idx] = arr[idx]
...:
54.7 μs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note that when data
is a (nested) list, np.reshape(data, (-1, *ish)
is , effectively, np.array(data).reshape(-1 *ish)
. That list has to be first turned into an array.
Besides speed, it would interesting to see whether one approach is more general than the other. Are there cases that one handles, but the other can't?
3 Comments
tmp = list(np.reshape(data, (-1, *ish))); swap = tmp[0]; tmp[0] = None; result = np.array(tmp); result[0] = swap; result = result.reshape(osh)
is more than twice as fast as frompyfunc
on the first example.ndarray
. But with a simple list we don't need to use ndindex
to iterate. enumerate
is sufficient.