I have a list of Num_tuples tuples that all have the same length Dim_tuple
xlist = [tuple_1, tuple_2, ..., tuple_Num_tuples]
For definiteness, let's say Num_tuples=3 and Dim_tuple=2
xlist = [(1, 1.1), (2, 1.2), (3, 1.3)]
I want to convert xlist into a structured numpy array xarr using a user-provided list of column names user_names and a user-provided list of variable types user_types
user_names = [name_1, name_2, ..., name_Dim_tuple]
user_types = [type_1, type_2, ..., type_Dim_tuple]
So in the creation of the numpy array,
dtype = [(name_1,type_1), (name_2,type_2), ..., (name_Dim_tuple, type_Dim_tuple)]
In the case of my toy example desired end product would look something like:
xarr['name1']=np.array([1,2,3])
xarr['name2']=np.array([1.1,1.2,1.3])
How can I slice xlist to create xarr without any loops?
2 Answers 2
A list of tuples is the correct way of providing data to a structured array:
In [273]: xlist = [(1, 1.1), (2, 1.2), (3, 1.3)]
In [274]: dt=np.dtype('int,float')
In [275]: np.array(xlist,dtype=dt)
Out[275]:
array([(1, 1.1), (2, 1.2), (3, 1.3)],
dtype=[('f0', '<i4'), ('f1', '<f8')])
In [276]: xarr = np.array(xlist,dtype=dt)
In [277]: xarr['f0']
Out[277]: array([1, 2, 3])
In [278]: xarr['f1']
Out[278]: array([ 1.1, 1.2, 1.3])
or if the names are important:
In [280]: xarr.dtype.names=['name1','name2']
In [281]: xarr
Out[281]:
array([(1, 1.1), (2, 1.2), (3, 1.3)],
dtype=[('name1', '<i4'), ('name2', '<f8')])
http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays
1 Comment
hpaulj's answer is interesting but horrifying :)
The modern Pythonic way to have named columns is to use pandas, a highly popular package built on top of numpy:
import pandas as pd
xlist = [(1, 1.1), (2, 1.2), (3, 1.3)]
# Cast name1 to int because pandas' default is float
df = pd.DataFrame(xlist, columns=['name1', 'name2']).astype({'name1':int})
print(df)
This gives you a DataFrame, df, which is the structure you want:
name1 name2
0 1 1.1
1 2 1.2
2 3 1.3
You can do all kinds of wonderful things with this, like slicing and various operations.
For example, to the create the xarr dictionary requested in the original question:
>>> xarr = {k:np.array(v) for k,v in df.to_dict(orient='list').items()}
>>> xarr
{'name1': array([1, 2, 3]), 'name2': array([1.1, 1.2, 1.3])}
3 Comments
import pandas takes more than 0.5 seconds on an dual AMD EPYC 7763 server... for a simple transformation, that's likely more than the time taken by the operation itself :-( I'd be much happier with pandas if it could be used in a more modular way.
without any loopsis this possible without loops? What about list comprehension? Also, have you tried anything?