Force numpy to create array of objects

Question 1

I have an array:

x = np.array([[1, 2, 3], [4, 5, 6]])

and I want to create another array of shape=(1, 1) and dtype=np.object whose only element is x.

I've tried this code:

a = np.array([[x]], dtype=np.object)

but it produces an array of shape (1, 1, 2, 3).

Of course I can do:

a = np.zeros(shape=(1, 1), dtype=np.object)
a[0, 0] = x

but I want the solution to be easily scalable to greater a shapes, like:

[[x, x], [x, x]]

without having to run for loops over all the indices.

Any suggestions how this could be achieved?

UPD1

The arrays may be different, as in:

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[7, 8, 9], [0, 1, 2]])
u = np.array([[3, 4, 5], [6, 7, 8]])
v = np.array([[9, 0, 1], [2, 3, 4]])
[[x, y], [u, v]]

They may also be of different shapes, but for that case a simple np.array([[x, y], [u, v]]) constructor works fine

UPD2

I really want a solution that works with arbitrary x, y, u, v shapes, not necessarily all the same.

Question 2

Found a solution myself:

a=np.zeros(shape=(2, 2), dtype=np.object)
a[:] = [[x, x], [x, x]]

Question 3

a = np.empty(shape=(2, 2), dtype=np.object)
a.fill(x)

Question 4

Thanks for this one. Sorry, I used the same-x array example for the sake of brevity, but in fact those can be different: [[x, y], [u, v]]. The original problem for me was that the result depended on whether all the input arrays have the same shape or not.

Question 5

This fill puts the same pointer to x in all 4 slots. It has the danger as the list [mutable_object]*4 replication.

Question 6

Here is a pretty general method: It works with nested lists, lists of lists of arrays - regardless of whether the shapes of these arrays are different or equal. It also works when the data come clumped together in one single array, which is in fact the trickiest case. (Other methods posted so far will not work in this case.)

Let's start with the difficult case, one big array:

# create example
# pick outer shape and inner shape
>>> osh, ish = (2, 3), (2, 5)
# total shape
>>> tsh = (*osh, *ish)
# make data
>>> data = np.arange(np.prod(tsh)).reshape(tsh)
>>>
# recalculate inner shape to cater for different inner shapes
# this will return the consensus bit of all inner shapes
>>> ish = np.shape(data)[len(osh):]
>>> 
# block them
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> 
# admire
>>> data_blocked
array([[array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]]),
 array([[10, 11, 12, 13, 14],
 [15, 16, 17, 18, 19]]),
 array([[20, 21, 22, 23, 24],
 [25, 26, 27, 28, 29]])],
 [array([[30, 31, 32, 33, 34],
 [35, 36, 37, 38, 39]]),
 array([[40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49]]),
 array([[50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59]])]], dtype=object)

Using OP's example which is a list of lists of arrays:

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [0, 1, 2]])
>>> u = np.array([[3, 4, 5], [6, 7, 8]])
>>> v = np.array([[9, 0, 1], [2, 3, 4]])
>>> data = [[x, y], [u, v]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> 
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> data_blocked
array([[array([[1, 2, 3],
 [4, 5, 6]]),
 array([[7, 8, 9],
 [0, 1, 2]])],
 [array([[3, 4, 5],
 [6, 7, 8]]),
 array([[9, 0, 1],
 [2, 3, 4]])]], dtype=object)

And an example with different shape subarrays (note the v.T):

>>> data = [[x, y], [u, v.T]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)>>> data_blocked
array([[array([[1, 2, 3],
 [4, 5, 6]]),
 array([[7, 8, 9],
 [0, 1, 2]])],
 [array([[3, 4, 5],
 [6, 7, 8]]),
 array([[9, 2],
 [0, 3],
 [1, 4]])]], dtype=object)

Question 7

Thanks for the answer, but it's rather important for me that the solution works for arbitrary x, y, u, v shapes, not necessarily all the same. Apologies for not stating it clearly in the OP.

Question 8

I've written an alternative that uses ndindex instead. I think it's a little easier to understand. But what really matters is whether one is more general than the other.

Question 9

Another object array case: stackoverflow.com/a/49226113/901925, complicated by the fact that the user wants a 2d array of tuples. Our methods produce an array of arrays (because they first turn the nested list into an array).

Question 10

@PaulPanzer's use of np.frompyfunc is clever, but all that reshaping and use of __getitem__ makes it hard to understand:

Separating the function creation from application might help:

func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
newarr = func(range(np.prod(osh))).reshape(osh)

This highlights the separation between the ish dimensions and the osh ones.

I also suspect a lambda function could substitute for the __getitem__.

This works because frompyfunc returns an object dtype array. np.vectorize also uses frompyfunc but lets us specify a different otype. But both pass a scalar to the function, which is why Paul's approach uses a flattened range and getitem. np.vectorize with a signature lets us pass an array to the function, but it uses a ndindex iteration instead of frompyfunc.

Inspired by that, here's a np.empty plus fill method - but with ndindex as the iterator:

In [385]: >>> osh, ish = (2, 3), (2, 5)
 ...: >>> tsh = (*osh, *ish)
 ...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
 ...: >>> ish = np.shape(data)[len(osh):]
 ...: 
In [386]: tsh
Out[386]: (2, 3, 2, 5)
In [387]: ish
Out[387]: (2, 5)
In [388]: osh
Out[388]: (2, 3)
In [389]: res = np.empty(osh, object)
In [390]: for idx in np.ndindex(osh):
 ...: res[idx] = data[idx]
 ...: 
In [391]: res
Out[391]: 
array([[array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]]),
 ....
 [55, 56, 57, 58, 59]])]], dtype=object)

For the second example:

In [399]: arr = np.array(data)
In [400]: arr.shape
Out[400]: (2, 2, 2, 3)
In [401]: res = np.empty(osh, object)
In [402]: for idx in np.ndindex(osh):
 ...: res[idx] = arr[idx]

In the third case, np.array(data) already creates the desired (2,2) object dtype array. This res create and fill still works, even though it produces the same thing.

Speed isn't very different (though this example is small)

In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
 ...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
49.8 μs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [416]: %%timeit
 ...: arr = np.array(data)
 ...: res = np.empty(osh, object)
 ...: for idx in np.ndindex(osh): res[idx] = arr[idx]
 ...: 
54.7 μs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Note that when data is a (nested) list, np.reshape(data, (-1, *ish) is , effectively, np.array(data).reshape(-1 *ish). That list has to be first turned into an array.

Besides speed, it would interesting to see whether one approach is more general than the other. Are there cases that one handles, but the other can't?

Question 11

Performance-wise, the old stick-a-None-in-the-first-cell method looks rather good

tmp = list(np.reshape(data, (-1, *ish))); swap = tmp[0]; tmp[0] = None; result = np.array(tmp); result[0] = swap; result = result.reshape(osh)

is more than twice as fast as frompyfunc on the first example.

Question 12

Here is one that probably works with yours but not with mine. (It works in principle but not with the stuff I did to make it general.)

Question 13

@PaulPanzer, mine fails on that ((10,3),(10,8)) case because it can't make ndarray. But with a simple list we don't need to use ndindex to iterate. enumerate is sufficient.

SiLiKhon SiLiKhon 6525 silver badges17 bronze badges · Answer 1 · 2018-03-02 07:37:11Z

6

Found a solution myself:

a=np.zeros(shape=(2, 2), dtype=np.object)
a[:] = [[x, x], [x, x]]

Share

Improve this answer

answered Mar 2, 2018 at 7:37

SiLiKhon's user avatar

SiLiKhon SiLiKhon

6525 silver badges17 bronze badges

Comments

wim wim 367k111 gold badges678 silver badges812 bronze badges · Answer 2 · 2018-03-02 07:38:38Z

6

a = np.empty(shape=(2, 2), dtype=np.object)
a.fill(x)

Share

Improve this answer

answered Mar 2, 2018 at 7:38

wim's user avatar

wim wim

367k111 gold badges678 silver badges812 bronze badges

2 Comments

SiLiKhon

SiLiKhon Over a year ago

Thanks for this one. Sorry, I used the same-x array example for the sake of brevity, but in fact those can be different: [[x, y], [u, v]]. The original problem for me was that the result depended on whether all the input arrays have the same shape or not.

2018年03月02日T07:50:08.27Z+00:00

hpaulj

hpaulj Over a year ago

This fill puts the same pointer to x in all 4 slots. It has the danger as the list [mutable_object]*4 replication.

2018年03月05日T22:07:14.257Z+00:00

Paul Panzer Paul Panzer 53.3k3 gold badges59 silver badges103 bronze badges · Answer 3 · 2018-03-02 08:12:27Z

Here is a pretty general method: It works with nested lists, lists of lists of arrays - regardless of whether the shapes of these arrays are different or equal. It also works when the data come clumped together in one single array, which is in fact the trickiest case. (Other methods posted so far will not work in this case.)

Let's start with the difficult case, one big array:

# create example
# pick outer shape and inner shape
>>> osh, ish = (2, 3), (2, 5)
# total shape
>>> tsh = (*osh, *ish)
# make data
>>> data = np.arange(np.prod(tsh)).reshape(tsh)
>>>
# recalculate inner shape to cater for different inner shapes
# this will return the consensus bit of all inner shapes
>>> ish = np.shape(data)[len(osh):]
>>> 
# block them
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> 
# admire
>>> data_blocked
array([[array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]]),
 array([[10, 11, 12, 13, 14],
 [15, 16, 17, 18, 19]]),
 array([[20, 21, 22, 23, 24],
 [25, 26, 27, 28, 29]])],
 [array([[30, 31, 32, 33, 34],
 [35, 36, 37, 38, 39]]),
 array([[40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49]]),
 array([[50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59]])]], dtype=object)

Using OP's example which is a list of lists of arrays:

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [0, 1, 2]])
>>> u = np.array([[3, 4, 5], [6, 7, 8]])
>>> v = np.array([[9, 0, 1], [2, 3, 4]])
>>> data = [[x, y], [u, v]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> 
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> data_blocked
array([[array([[1, 2, 3],
 [4, 5, 6]]),
 array([[7, 8, 9],
 [0, 1, 2]])],
 [array([[3, 4, 5],
 [6, 7, 8]]),
 array([[9, 0, 1],
 [2, 3, 4]])]], dtype=object)

And an example with different shape subarrays (note the v.T):

>>> data = [[x, y], [u, v.T]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)>>> data_blocked
array([[array([[1, 2, 3],
 [4, 5, 6]]),
 array([[7, 8, 9],
 [0, 1, 2]])],
 [array([[3, 4, 5],
 [6, 7, 8]]),
 array([[9, 2],
 [0, 3],
 [1, 4]])]], dtype=object)

Thanks for the answer, but it's rather important for me that the solution works for arbitrary x, y, u, v shapes, not necessarily all the same. Apologies for not stating it clearly in the OP.
I've written an alternative that uses ndindex instead. I think it's a little easier to understand. But what really matters is whether one is more general than the other.
Another object array case: stackoverflow.com/a/49226113/901925, complicated by the fact that the user wants a 2d array of tuples. Our methods produce an array of arrays (because they first turn the nested list into an array).

hpaulj hpaulj 233k14 gold badges259 silver badges391 bronze badges · Answer 4 · 2018-03-05 05:53:59Z

@PaulPanzer's use of np.frompyfunc is clever, but all that reshaping and use of __getitem__ makes it hard to understand:

Separating the function creation from application might help:

func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
newarr = func(range(np.prod(osh))).reshape(osh)

This highlights the separation between the ish dimensions and the osh ones.

I also suspect a lambda function could substitute for the __getitem__.

This works because frompyfunc returns an object dtype array. np.vectorize also uses frompyfunc but lets us specify a different otype. But both pass a scalar to the function, which is why Paul's approach uses a flattened range and getitem. np.vectorize with a signature lets us pass an array to the function, but it uses a ndindex iteration instead of frompyfunc.

Inspired by that, here's a np.empty plus fill method - but with ndindex as the iterator:

In [385]: >>> osh, ish = (2, 3), (2, 5)
 ...: >>> tsh = (*osh, *ish)
 ...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
 ...: >>> ish = np.shape(data)[len(osh):]
 ...: 
In [386]: tsh
Out[386]: (2, 3, 2, 5)
In [387]: ish
Out[387]: (2, 5)
In [388]: osh
Out[388]: (2, 3)
In [389]: res = np.empty(osh, object)
In [390]: for idx in np.ndindex(osh):
 ...: res[idx] = data[idx]
 ...: 
In [391]: res
Out[391]: 
array([[array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]]),
 ....
 [55, 56, 57, 58, 59]])]], dtype=object)

For the second example:

In [399]: arr = np.array(data)
In [400]: arr.shape
Out[400]: (2, 2, 2, 3)
In [401]: res = np.empty(osh, object)
In [402]: for idx in np.ndindex(osh):
 ...: res[idx] = arr[idx]

In the third case, np.array(data) already creates the desired (2,2) object dtype array. This res create and fill still works, even though it produces the same thing.

Speed isn't very different (though this example is small)

In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
 ...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
49.8 μs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [416]: %%timeit
 ...: arr = np.array(data)
 ...: res = np.empty(osh, object)
 ...: for idx in np.ndindex(osh): res[idx] = arr[idx]
 ...: 
54.7 μs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Note that when data is a (nested) list, np.reshape(data, (-1, *ish) is , effectively, np.array(data).reshape(-1 *ish). That list has to be first turned into an array.

Besides speed, it would interesting to see whether one approach is more general than the other. Are there cases that one handles, but the other can't?

Performance-wise, the old stick-a-None-in-the-first-cell method looks rather good tmp = list(np.reshape(data, (-1, *ish))); swap = tmp[0]; tmp[0] = None; result = np.array(tmp); result[0] = swap; result = result.reshape(osh) is more than twice as fast as frompyfunc on the first example.
Here is one that probably works with yours but not with mine. (It works in principle but not with the stuff I did to make it general.)
@PaulPanzer, mine fails on that ((10,3),(10,8)) case because it can't make ndarray. But with a simple list we don't need to use ndindex to iterate. enumerate is sufficient.

CollectivesTM on Stack Overflow

Force numpy to create array of objects

4 Answers 4

Comments

2 Comments

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

Comments

2 Comments

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related