I have a 2d array that takes this kind of form:
[5643, 22, 0.67, [1.00, 0.05, -0.044....]]
[6733, 12, -0.44, [0.00, 1.00, -0.08...]]
so it has dimensions ~13k x 4 but the 4th column of every row is itself an array
what I’d like to do is subset this array such that I only keep the rows for which the yth element of the 4th column is greater than 0
my current approach has been this:
mask = [x[y] > 0 for x in array[:,3]]
new_array = array[mask]
Is there a faster way to do this?
3 Answers 3
Try this:
y = 1
[i for i in filter(lambda x: x[3][y] > 0, a)]
Comments
Use the if clause of a list comprehension
new_array = [r for r in array if r[3][y] > 0]
Comments
The fastest way to do this is to not pack arrays in other arrays. This causes many issues, including not being able to use the shape attribute of numpy arrays effectively.
So, first split your data into two arrays, one of which has 13k rows, and 3 columns and the other one which also has 13k rows, and the columns of which depends on the dimensionality of the embedded array. Call these X and Y.
You can then do the following:
# Split the arrays
X, Y = array[:, :3], array[:, 3]
Y = np.asarray(Y)
mask = Y[:, y] > 0
X = X[mask]
filtermethod.