How to create 2 column binary numpy array from string list?

Asked 10 years ago

Viewed 266 times

Input:

A string list like this:

['a', 'a', 'a', 'b', 'b', 'a', 'b']

Output I want:

A numpy array like this:

array([[ 1, 0],
 [ 1, 0],
 [ 1, 0],
 [ 0, 1],
 [ 0, 1],
 [ 1, 0],
 [ 0, 1]])

What I tried:

Try 1 - My starting data is actually stored in a column as a csv file. So I tried the following:

data1 = genfromtxt('csvname.csv', delimiter=',')

I did this because I thought I could manipulate the csv data into to form I want after I input it into the numpy format. However, the problem is I get all nan which is not a number. I'm not sure how else to go about this effectively because I need to do this for a large data set.

Try 2 - The ineffective method which I was thinking of doing:

For each element of the list, append [1,0] if a and append [0,1] if b.

Is there a better method?

Improve this question

edited Jan 8, 2016 at 7:38

Divakar's user avatar

Divakar

222k19 gold badges273 silver badges374 bronze badges

asked Jan 8, 2016 at 6:46

pr338's user avatar

pr338

9,24020 gold badges57 silver badges72 bronze badges

Add a comment |

3 Answers 3

Sorted by: Reset to default

Using List comprehension

Code:

import numpy
lst = ['a', 'a', 'a', 'b', 'b', 'a', 'b']
numpy.array([[1,0] if val =="a" else [0,1]for val in lst])

Output:

array([[1, 0],
 [1, 0],
 [1, 0],
 [0, 1],
 [0, 1],
 [1, 0],
 [0, 1]])

Note:

Rather then appending to a list\numpy array, creating a list is faster

Improve this answer

answered Jan 8, 2016 at 6:54

The6thSense's user avatar

The6thSense

8,3659 gold badges38 silver badges67 bronze badges

Comments

Building List

import numpy as np
list = ['a','a','a','b','b','a','b']
np.array([[ch=='a',ch=='b'] for ch in list]).astype(int)

Output

array([[1, 0],
 [1, 0],
 [1, 0],
 [0, 1],
 [0, 1],
 [1, 0],
 [0, 1]])

Does this solve it for you?

Improve this answer

answered Jan 8, 2016 at 7:08

thundergolfer's user avatar

thundergolfer

5571 gold badge5 silver badges18 bronze badges

4 Comments

thundergolfer

thundergolfer Over a year ago

I didn't refresh the page to see I was second. Is my answer different enough to keep? Or do I delete my post when this happens?

2016年01月08日T07:09:25.89Z+00:00

pr338

pr338 Over a year ago

Yes I think it is different enough to keep! Thank you for your input!! Although both answers answer my question, who knows, your method may prove to be more useful for the next person who views this question.

2016年01月08日T07:11:18Z+00:00

The6thSense

The6thSense Over a year ago

@thundergolfer i feel that your answer maybe efficient then mine :). So just keep it.

2016年01月08日T07:46:46.783Z+00:00

The6thSense

The6thSense Over a year ago

And answering second or last does not matter providing a better output matters.

2016年01月08日T07:55:25.847Z+00:00

NumPythonic vectorized method using np.unique -

((np.unique(A)[:,None] == A).T).astype(int)

Sample run -

In [9]: A
Out[9]: ['a', 'a', 'a', 'b', 'b', 'a', 'b']
In [10]: ((np.unique(A)[:,None] == A).T).astype(int)
Out[10]: 
array([[1, 0],
 [1, 0],
 [1, 0],
 [0, 1],
 [0, 1],
 [1, 0],
 [0, 1]])

Improve this answer

answered Jan 8, 2016 at 7:36

Divakar's user avatar

Divakar

222k19 gold badges273 silver badges374 bronze badges

2 Comments

The6thSense

The6thSense Over a year ago

I have already up it. But have doubts 1. since there are only two value a,b why do you need to use np.unique and all isn't it over complicating things 2. Is this efficient thunder's answer ?

2016年01月08日T07:53:38.013Z+00:00

Divakar

Divakar Over a year ago

@The6thSense Well thanks for the up! On the questions - 1) I am assuming OP has posted a sample case in the question, so there could be more than just a and b in it. 2) On efficiency, being a vectorized approach I would think this should be pretty fast, given enough unique letters to iterate with.

2016年01月08日T08:13:22.25Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

How to create 2 column binary numpy array from string list?

3 Answers 3

Comments

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Comments

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related