Input:
A string list like this:
['a', 'a', 'a', 'b', 'b', 'a', 'b']
Output I want:
A numpy array like this:
array([[ 1, 0],
[ 1, 0],
[ 1, 0],
[ 0, 1],
[ 0, 1],
[ 1, 0],
[ 0, 1]])
What I tried:
Try 1 - My starting data is actually stored in a column as a csv file. So I tried the following:
data1 = genfromtxt('csvname.csv', delimiter=',')
I did this because I thought I could manipulate the csv data into to form I want after I input it into the numpy format. However, the problem is I get all nan which is not a number. I'm not sure how else to go about this effectively because I need to do this for a large data set.
Try 2 - The ineffective method which I was thinking of doing:
For each element of the list, append [1,0] if a and append [0,1] if b.
Is there a better method?
3 Answers 3
Using List comprehension
Code:
import numpy
lst = ['a', 'a', 'a', 'b', 'b', 'a', 'b']
numpy.array([[1,0] if val =="a" else [0,1]for val in lst])
Output:
array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1]])
Note:
- Rather then appending to a list\numpy array, creating a list is faster
Comments
Building List
import numpy as np
list = ['a','a','a','b','b','a','b']
np.array([[ch=='a',ch=='b'] for ch in list]).astype(int)
Output
array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1]])
Does this solve it for you?
4 Comments
NumPythonic vectorized method using np.unique -
((np.unique(A)[:,None] == A).T).astype(int)
Sample run -
In [9]: A
Out[9]: ['a', 'a', 'a', 'b', 'b', 'a', 'b']
In [10]: ((np.unique(A)[:,None] == A).T).astype(int)
Out[10]:
array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1]])
2 Comments
a,b why do you need to use np.unique and all isn't it over complicating things 2. Is this efficient thunder's answer ?a and b in it. 2) On efficiency, being a vectorized approach I would think this should be pretty fast, given enough unique letters to iterate with.