3

I wanted to create an array to hold mixed types - string and int.

The following code did not work as desired - all elements got typed as String.

>>> a=numpy.array(["Str",1,2,3,4])
>>> print a
['Str' '1' '2' '3' '4']
>>> print type(a[0]),type(a[1])
<type 'numpy.string_'> <type 'numpy.string_'>

All elements of the array were typed as 'numpy.string_'

But, oddly enough, if I pass one of the elements as "None", the types turn out as desired:

>>> a=numpy.array(["Str",None,2,3,4])
>>> print a
['Str' None 2 3 4]
>>> print type(a[0]),type(a[1]),type(a[2])
<type 'str'> <type 'NoneType'> <type 'int'>

Thus, including a "None" element provides me with a workaround, but I am wondering why this should be the case. Even if I don't pass one of the elements as None, shouldn't the elements be typed as they are passed?

Kasravnd
108k19 gold badges167 silver badges195 bronze badges
asked Jul 3, 2018 at 8:21
2
  • 2
    both not really duplicates, second one is better, but a more explicit explanation with regards to None would be better for OP Commented Jul 3, 2018 at 8:29
  • The proposed duplicate explains just the string dtype: stackoverflow.com/questions/49751000/…. Commented Jul 3, 2018 at 14:49

2 Answers 2

2

Mixed types in NumPy is strongly discouraged. You lose the benefits of vectorised computations. In this instance:

  • For your first array, NumPy makes the decision to convert your array to a uniform array of strings of 3 or less characters.
  • For your second array, None is not permitted as a "stringable" variable in NumPy, so NumPy uses the standard object dtype. object dtype represents a collection of pointers to arbitrary types.

You can see this when you print the dtype attributes of your arrays:

print(np.array(["Str",1,2,3,4]).dtype) # <U3
print(np.array(["Str",None,2,3,4]).dtype) # object

This should be entirely expected. NumPy has a strong preference for homogenous types, as indeed you should have for any meaningful computations. Otherwise, Python list may be a more appropriate data structure.

For a more detailed descriptions of how NumPy prioritises dtype choice, see:

answered Jul 3, 2018 at 8:30
Sign up to request clarification or add additional context in comments.

Comments

1

An alternative to adding the None is to make the dtype explicit:

In [80]: np.array(["str",1,2,3,4])
Out[80]: array(['str', '1', '2', '3', '4'], dtype='<U3')
In [81]: np.array(["str",1,2,3,4], dtype=object)
Out[81]: array(['str', 1, 2, 3, 4], dtype=object)

Creating a object dtype array and filling it from a list is another option:

In [85]: res = np.empty(5, object)
In [86]: res
Out[86]: array([None, None, None, None, None], dtype=object)
In [87]: res[:] = ['str', 1, 2, 3, 4]
In [88]: res
Out[88]: array(['str', 1, 2, 3, 4], dtype=object)

Here it isn't needed, but it matters when you want an array of lists.

answered Jul 3, 2018 at 14:50

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.