Downsampling n-dimensional data from bins in one dimension

Asked 11 years, 10 months ago

Viewed 3k times

\$\begingroup\$

So I'm writing some code to perform a quite specific task given a large numpy array with N rows and 3 columns representing points in 3D. The 3D points are to be binned along one of the dimensions between specified bin edges. For each of these bins, there is a set fraction by which I must reduce the number of points in that bin, perfectly (pseudo)randomly.

Here is the code I have written to perform this task. I found myself spending a lot of time trying to figure out the most 'pythonic' way to achieve this. It still seems very clunky, so I'm sure there must be a more elegant way that capitalises on numpy's array performance. Any tips?

# Initialise the array of 3D points
radecz = np.zeros((N, 3))
# grab the point data from elsewhere
points[:, 0] = ...
points[:, 1] = ...
points[:, 2] = ... # this is the dimension we bin in, call it z
# Create an array of M + 1 elements for the edges of M bins 
binedges = ... # (they do not span all of the space of points by the way)
# Find the counts per bin
H = np.histogram(points[:, 2], bins=binedges)
# The number to downsample to in each bin is already known
num_down = ("""some M-dimensional array of fractions""") * H[0]
# initialise a mask for the final array for my analysis with dimension N
finmask = np.array(points.shape[0] * [False])
# loop over bins (do I really need to do this??)
for i, nd in enumerate(num_down):
 # First get the array ids of the points in each bin
 zbin_ids = np.where( ( (binedges[i] < points[:, 2]) & \
 (points[:, 2) <= binedges[i + 1]) ) == True )
 # Choose ids at random without replacement
 keep = np.random.choice(zbin_ids[0], size=cn, replace=False)
 # What's left is turned on in the mask for the final array
 finmask[keep] = True
points = points[(finmask,)]

asked Nov 26, 2013 at 5:43

kiliantics's user avatar

kiliantics kiliantics

1131 silver badge5 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

You could use numpy.digitize to determine which bin each point belongs in, allowing for a simpler expression inside where.

answered Nov 26, 2013 at 9:21

Janne Karila's user avatar

Janne Karila Janne Karila

10.6k21 silver badges34 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Downsampling n-dimensional data from bins in one dimension

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Downsampling n-dimensional data from bins in one dimension

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions