I have an large array of more than 40000 elements
a = ['15', '12', '', 18909, ...., '8989', '', '90789', '8']
I'm looking for a simply way to replace the empty '' values to '0' so that I can manipulate the data in the array using Numpy.
I would then convert the elements in my array into integers using
a = map(int, a)
so that I could find the mean of the array in numpy
a_mean = np.mean(a)
My issue is that I cannot convert to integers in an array with missing numbers to get a mean.
4 Answers 4
You could make a small function that converts a single value exactly how you want it, e.g.:
def to_int(x):
try:
return int(x)
except ValueError:
return 0
which can be used with map:
In [22]: a = ['15', '12', '', 18909, '8989', '90789', '8']
map(to_int, a)
Out[23]: [15, 12, 0, 18909, 8989, 90789, 8]
in a list comprehension:
In [25]: np.array([to_int(x) for x in a])
Out[25]: array([ 15, 12, 0, 18909, 8989, 90789, 8])
or in a generator expression to directly create a numpy array:
In [27]: np.fromiter((to_int(x) for x in a), dtype=int)
Out[27]: array([ 15, 12, 0, 18909, 8989, 90789, 8])
1 Comment
If I understood you right so it should look like that:
for index in range(len(a)):
if a[i] is '':
a[i] = '0'
You can also use:
a = list(map(lambda x: '0' if x == '' else x, a))
Comments
From the previous learning with SO, i see you can impy the below solution to convert the NaN to zeros..
from numpy import *
a = array([[0, 1, 2], [3, 4, NaN]])
where_are_NaNs = isnan(a)
a[where_are_NaNs] = 0
secondly, nan_to_num() as i said earlier in my comment.
>>> import numpy as np
>>> a = array([[0, 1, 2], [3, 4, np.NaN]])
>>> a
array([[ 0., 1., 2.],
[ 3., 4., nan]])
>>> a = np.nan_to_num(a)
>>> a
array([[ 0., 1., 2.],
[ 3., 4., 0.]])
Comments
A more verbose answer is:
acc = 0
for v in a:
acc+=int(v or 0)
a_mean = acc/len(a)
new_a = [int(v or 0) for v in a]and then usenew_a?numpy.nan_to_num?