I have a Numpy 2-D array in which one column has Boolean values i.e. True/False. I want to convert it to integer 1 and 0 respectively, how can I do it?
E.g. my data[0::,2] is boolean, I tried
data[0::,2]=int(data[0::,2])
, but it is giving me error:
TypeError: only length-1 arrays can be converted to Python scalars
My first 5 rows of array are:
[['0', '3', 'True', '22', '1', '0', '7.25', '0'],
['1', '1', 'False', '38', '1', '0', '71.2833', '1'],
['1', '3', 'False', '26', '0', '0', '7.925', '0'],
['1', '1', 'False', '35', '1', '0', '53.1', '0'],
['0', '3', 'True', '35', '0', '0', '8.05', '0']]
5 Answers 5
Ok, the easiest way to change a type of any array to float is doing:
data.astype(float)
The issue with your array is that float('True') is an error, because 'True' can't be parsed as a float number. So, the best thing to do is fixing your array generation code to produce floats (or, at least, strings with valid float literals) instead of bools.
In the meantime you can use this function to fix your array:
def boolstr_to_floatstr(v):
if v == 'True':
return '1'
elif v == 'False':
return '0'
else:
return v
And finally you convert your array like this:
new_data = np.vectorize(boolstr_to_floatstr)(data).astype(float)
4 Comments
data[:5].numpy array, but a Python list. And it was missing commas before I edited it—Python could not output this. Second, my code works for Python lists too, so everything should be fine. Add print(data[:5]) to your code and post the exact output.from pprint import pprint and then use pprint(data[:5]).boolarrayvariable.astype(int) works:
data = np.random.normal(0,1,(1,5))
threshold = 0
test1 = (data>threshold)
test2 = test1.astype(int)
Output:
data = array([[ 1.766, -1.765, 2.576, -1.469, 1.69]])
test1 = array([[ True, False, True, False, True]], dtype=bool)
test2 = array([[1, 0, 1, 0, 1]])
Comments
If I do this on your raw data source, which is strings:
data = [['0', '3', 'True', '22', '1', '0', '7.25', '0'],
['1', '1', 'False', '38', '1', '0', '71.2833', '1'],
['1', '3', 'False', '26', '0', '0', '7.925', '0'],
['1', '1', 'False', '35', '1', '0', '53.1', '0'],
['0', '3', 'True', '35', '0', '0', '8.05', '0']]
data = [[eval(x) for x in y] for y in data]
..and then follow that with:
data = [[float(x) for x in y] for y in data]
# or this if you prefer:
arr = numpy.array(data)
..then the problem is solved. ..you can even do it as a one-liner (I think this makes ints, though, and floats are probably needed): numpy.array([[eval(x) for x in y] for y in data])
..I think the problem is that numpy is keeping your numeric strings as strings, and since not all of your strings are numeric, you can't do a type conversion on the whole array. Also, if you try to do a type conversion just on the parts of the array with "True" and "False", you're not really working with booleans, but with strings. ..and the only ways I know of to change that are to do the eval statement. ..well, you could do this, too:
booltext_int = {'True': 1, 'False': 2}
clean = [[float(x) if x[-1].isdigit() else booltext_int[x]
for x in y] for y in data]
..this way you avoid evals, which are inherently insecure. ..but that may not matter, since you may be using a trusted data source.
Comments
Using @kirelagin's idea with ast.literal_eval
>>> import ast
>>> import numpy as np
>>> arr = np.array(
[['0', '3', 'True', '22', '1', '0', '7.25', '0'],
['1', '1', 'False', '38', '1', '0', '71.2833', '1'],
['1', '3', 'False', '26', '0', '0', '7.925', '0'],
['1', '1', 'False', '35', '1', '0', '53.1', '0'],
['0', '3', 'True', '35', '0', '0', '8.05', '0']])
>>> np.vectorize(ast.literal_eval, otypes=[np.float])(arr)
array([[ 0. , 3. , 1. , 22. , 1. , 0. ,
7.25 , 0. ],
[ 1. , 1. , 0. , 38. , 1. , 0. ,
71.2833, 1. ],
[ 1. , 3. , 0. , 26. , 0. , 0. ,
7.925 , 0. ],
[ 1. , 1. , 0. , 35. , 1. , 0. ,
53.1 , 0. ],
[ 0. , 3. , 1. , 35. , 0. , 0. ,
8.05 , 0. ]])
Comments
Old Q but, for reference - a bool can be converted to an int and an int to a float
data[0::,2]=data[0::,2].astype(int).astype(float)
dtype?