I need an array that holds either an array of records or a list of records. I need something like this:
people = [['David','Blue','Dog','Car'],['Sally','Yellow','Cat','Boat']]
or:
people = (['David','Blue','Dog','Car'],['Sally','Yellow','Cat','Boat'])
But I keep getting:
people = ['David','Blue','Dog','Car','Sally','Yellow','Cat','Boat']
I've tried append vs concatenate, different axis, different np initialization, but the results is always the same. Here is my latest version. What am I doing wrong?
import numpy as np
# Tried
# people = np.empty((0,0), dtype='S')
# people = np.array([[]])
people = np.array([])
records = GetRecordsFromDB()
for record in records:
# Do some stuff
# Tried
# person = [name, color, animal, vehicle]
person = np.array([name, color, animal, vehicle])
# Tried this with different axis
# people = np.append(people, person, axis=0)
people = np.concatenate((people, person))
Thank you.
EDIT: This will be the input for a Pandas DataFrame if that helps.
-
What is the format/structure of record? Also, please provide a sample of the data in recorditprorh66– itprorh662022年03月13日 20:52:27 +00:00Commented Mar 13, 2022 at 20:52
-
Four strings: name, color, animal, vehicle. I can put them in any format, but I need to be able to add those, as a list, to an array. So I end up with an array that holds n number of records in a list.user390480– user3904802022年03月13日 21:40:08 +00:00Commented Mar 13, 2022 at 21:40
3 Answers 3
Use np.c_
people = np.c_[people, person]
3 Comments
Here is how I solved this:
import numpy as np
people = np.array([])
records = GetRecordsFromDB()
for record in records:
# Do some stuff
person = np.array([name, color, animal, vehicle])
if len(people) == 0:
people = [person]
else:
people = np.append(people, [person], axis=0)
4 Comments
np.array([person])
is a (1,4) shaped array. You could have just as as well initialed people=None
, and done a if people is None:...
test. None
is a common "uninitialized" value. Keep in mind though that this array append will be slower than a list append.numpy
documentation.edit
Looking at your answer I realized I took your subject line too literally. [360] and following makes an array of lists. But [356] is a 2d array, where each "row" looks like a list - but it isn't.
earlier
In [354]: alist = [['David','Blue','Dog','Car'],['Sally','Yellow','Cat','Boat']]
...:
In [355]: alist
Out[355]: [['David', 'Blue', 'Dog', 'Car'], ['Sally', 'Yellow', 'Cat', 'Boat']]
In [356]: np.array(alist)
Out[356]:
array([['David', 'Blue', 'Dog', 'Car'],
['Sally', 'Yellow', 'Cat', 'Boat']], dtype='<U6')
That makes a 2d array of strings. The construction is no different from the textbook example of making a 2d numeric array:
In [358]: np.array([[1, 2], [3, 4]])
Out[358]:
array([[1, 2],
[3, 4]])
With hstack
or concatenate
:
In [359]: np.hstack(alist)
Out[359]:
array(['David', 'Blue', 'Dog', 'Car', 'Sally', 'Yellow', 'Cat', 'Boat'],
dtype='<U6')
To make an array with just 2 lists, you have to initial one:
In [360]: arr = np.empty(2, object)
In [361]: arr
Out[361]: array([None, None], dtype=object)
In [362]: arr[:] = alist
In [363]: arr
Out[363]:
array([list(['David', 'Blue', 'Dog', 'Car']),
list(['Sally', 'Yellow', 'Cat', 'Boat'])], dtype=object)
If the lists differ in length,
In [364]: np.array([["David", "Blue"], ["Sally", "Yellow", "Cat"]])
<ipython-input-364-be12d6dec312>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
np.array([["David", "Blue"], ["Sally", "Yellow", "Cat"]])
Out[364]:
array([list(['David', 'Blue']), list(['Sally', 'Yellow', 'Cat'])],
dtype=object)
Read that warning - in full.
By default np.array
tries to make a multidimensional array of base classes like int or string. It's only when it can't that it falls back on making an object dtype array. That kind of arrays should be viewed as second-class array, and used only when it really is needed. Often the list of lists is just as good.
Your iterative creation is one such case
people = []
records = GetRecordsFromDB()
for record in records:
# Do some stuff
# Tried
# person = [name, color, animal, vehicle]
person = np.array([name, color, animal, vehicle])
people = append(person)
Items, whether lists or array (or anything else) can be added to a list in-place with just the addition of a reference. Trying use concatenate
to add items of an array is, not only harder to get right, but slower, since it is making a whole new array each time. That means a lot of copying!
np.append
is a badly named way of calling concatenate
. It is not a list.append
clone.
Using np.concatenate
requires a careful handling of dimensions. Don't be sloppy, thinking it will figure out what you want.
Similarly this is not a close of list []
:
In [365]: np.array([])
Out[365]: array([], dtype=float64)
In [366]: np.array([]).shape
Out[366]: (0,)
It is a 1d array with a specific shape. You can only concatenate it with another 1d array - one the only axis, 0).