replace blanks in numpy array

Question 1

The third column in my numpy array is Age. In this column about 75% of the entries are valid and 25% are blank. Column 2 is Gender and using some manipulation I have calculated the average age of the men in my dataset to be 30. The average age of women in my dataset is 28.

I want to replace all blank Age values for men to be 30 and all blank age values for women to be 28.

However I can't seem to do this. Anyone have a suggestion or know what I am doing wrong?

Here is my code:

# my entire data set is stored in a numpy array defined as x
ismale = x[::,1]=='male'
maleAgeBlank = x[ismale][::,2]==''
x[ismale][maleAgeBlank][::,2] = 30

For whatever reason when I'm done with the above code, I type x to display the data set and the blanks still exist even though I set them to 30. Note that I cannot do x[maleAgeBlank] because that list will include some female data points since the female data points are not yet excluded.

Is there any way to get what I want? For some reason, if I do x[ismale][::,1] = 1 (setting the column with 'male' equal to 1), that works, but x[ismale][maleAgeBlank][::,2] = 30 does not work.

sample of array:

#output from typing x
array([['3', '1', '22', ..., '0', '7.25', '2'],
 ['1', '0', '38', ..., '0', '71.2833', '0'],
 ['3', '0', '26', ..., '0', '7.925', '2'],
 ..., 
 ['3', '0', '', ..., '2', '23.45', '2'],
 ['1', '1', '26', ..., '0', '30', '0'],
 ['3', '1', '32', ..., '0', '7.75', '1']], 
 dtype='<U82')
#output from typing x[0]
array(['3', '1', '22', '1', '0', '7.25', '2'], 
 dtype='<U82')

Note that I have changed column 2 to be 0 for female and 1 for male already in the above output

Question 2

can you post a sample of the array?

Question 3

How about this:

my_data = np.array([['3', '1', '22', '0', '7.25', '2'],
 ['1', '0', '38', '0', '71.2833', '0'],
 ['3', '0', '26', '0', '7.925', '2'],
 ['3', '0', '', '2', '23.45', '2'],
 ['1', '1', '26', '0', '30', '0'],
 ['3', '1', '32', '0', '7.75', '1']], 
 dtype='<U82')
ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'

Result:

>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
 [u'1', u'0', u'38', u'0', u'71.2833', u'0'],
 [u'3', u'0', u'26', u'0', u'7.925', u'2'],
 [u'3', u'0', u'30', u'2', u'23.45', u'2'], 
 [u'1', u'1', u'26', u'0', u'30', u'0'],
 [u'3', u'1', u'32', u'0', u'7.75', u'1']], 
 dtype='<U82')

Question 4

Perfect! Thank you, very clean and understandable. Didn't even think of the & operation.

Question 5

You can use the where function:

arr = array([['3', '1', '22', '1', '0', '7.25', '2'], 
 ['3', '', '22', '1', '0', '7.25', '2']], 
 dtype='<U82')
blank = np.where(arr=='')
arr[blank] = 20
array([[u'3', u'1', u'22', u'1', u'0', u'7.25', u'2'],
 [u'3', u'20', u'22', u'1', u'0', u'7.25', u'2']], 
 dtype='<U82')

If you want to change a specific column you can do the do the following:

male = np.where(arr[:, 1]=='') # where 1 is the column
arr[male] = 30
female = np.where(arr[:, 2]=='') # where 2 is the column
arr[female] = 28

Question 6

where is efficient, but the current solution doesn't check the row's gender value and changes all blanks, not just those in the age column.

Question 7

Doesn't he want to change the blank values of age to the average? The ages columns are only 1 and 2 for male and femalte. SO he needs 2 where for both columns only.

Question 8

You could try iterating through the array in a simpler way. It's not the most efficient solution, but it should get the job done.

for row in range(len(x)):
 if row[2] == '':
 if row[1] == 1:
 row[2] == 30
 else:
 row[2] == 28

Question 9

using a for loop with a numpy array is called nonsense. You loose the advantages of numpy by iterating.

Question 10

@void That's fair. I'm not saying there aren't better solutions. But if all the OP cares about is getting this particular task solved quickly, hopefully this will help.

Question 11

Using where is more efficient. Check my answer.

Akavall 86.9k58 gold badges215 silver badges262 bronze badges · Accepted Answer · 2013-11-10 01:04:30Z

How about this:

my_data = np.array([['3', '1', '22', '0', '7.25', '2'],
 ['1', '0', '38', '0', '71.2833', '0'],
 ['3', '0', '26', '0', '7.925', '2'],
 ['3', '0', '', '2', '23.45', '2'],
 ['1', '1', '26', '0', '30', '0'],
 ['3', '1', '32', '0', '7.75', '1']], 
 dtype='<U82')
ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'

Result:

>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
 [u'1', u'0', u'38', u'0', u'71.2833', u'0'],
 [u'3', u'0', u'26', u'0', u'7.925', u'2'],
 [u'3', u'0', u'30', u'2', u'23.45', u'2'], 
 [u'1', u'1', u'26', u'0', u'30', u'0'],
 [u'3', u'1', u'32', u'0', u'7.75', u'1']], 
 dtype='<U82')

Perfect! Thank you, very clean and understandable. Didn't even think of the & operation.

CollectivesTM on Stack Overflow

replace blanks in numpy array

3 Answers 3

1 Comment

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

1 Comment

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related