Draft of potential masked array implementation. #849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

andrei-papou wants to merge 2 commits into rust-ndarray:master

from andrei-papou:masked-array

Draft

Draft of potential masked array implementation. #849

andrei-papou wants to merge 2 commits into rust-ndarray:master from andrei-papou:masked-array

+263 −0

Conversation

@andrei-papou

Copy link

Contributor

@andrei-papou andrei-papou commented Nov 13, 2020

There are two files:
src/ma/mod.rs - masked array implementation, all the types and traits live there.
tests/ma.rs - a couple of tests that demonstrate the potential public API of masked array.

The main idea is to have a Mask trait which is pretty generic and can be implemented not just by ArrayBase, but by for example a set of whitelist/blacklist indices, set of whitelisted/blacklisted values, etc.


 Draft of potential masked array implementation.

dd6124a

@andrei-papou andrei-papou mentioned this pull request

Nov 13, 2020

Masked arrays #127

Open


 Simplified Mask trait by getting rid of get_dim

2eeb7b5

@andrei-papou andrei-papou marked this pull request as draft

November 16, 2020 12:07

@nilgoyette

Copy link

Collaborator

nilgoyette commented Nov 23, 2020

Is the endgoal of this PR (and further work) to be as close as possible to the ones in numpy? I ask because I'm a long time user of numpy, as is everyone in the company I work for, and nobody used this numy.ma, ever. I mean, we do use masked arrays all the times in medical imaging, mostly for ignoring invalid voxel (outside of brain, etc.), but the tools in numy.ma never had any appeal I guess.

In brief, I see 3 problems with this (not your PR spedifically, but the concept)

AFAIK, it's mostly useless. This is my opinion and the result of a survey in my small company. I don't really believe it, but keep in mind that it's possible that we're simply ignorant of the right usage, or have simply never needed it.
It forces the library to duplicate code. Like, mean for Array and MaskedArray, std-dev, etc. Maybe this is less of a hassle than I think.
Most all(?) usages can be replaced with a select() or a Zip::from(arr).and(mask). That's what we're currently doing. in fact, I think that's the problem with numy.ma: it's super easy to avoid. You simply use arr[mask > 0] and some other indexing tools, et voila.

What I would gladly use is a Zip::masked function, like

Zip::from(&brain).and(&wathever).and(&mut out).masked(mask).par_apply(|&b, &w, o| {
 // No mask == false here, yay! But it's simply avoiding a if...
})

but this is somewhat irrelevant to the current discussion :)

@bluss

Copy link

Member

bluss commented Nov 28, 2020

I appreciate reading your sketch andrei, you're more productive than me, just having a go at a draft instead of trying to make something perfect.

I think it's been mentioned before yeah, the question whether to have masked arrays or masked operations on arrays. I dread the complexity of either. Thanks nilgoyette for the candid thoughts too.

I think we should start with masked operations. I think that's what a masked array type (if it were to exist) would need as basis anyway. And it allows having a separate mask too - which should hopefully be more efficent (packed or sparse bitmap?)

@stuarta0

Copy link

stuarta0 commented Nov 9, 2025

I realise this is an older PR but I would vote in favour of masks. I'm by no means experienced in numpy so the following may have a different solution.

My current use case is a 2D jagged array of ids. Each row represents the following:

[parent_id, child_id, facet1_id, facet2_id, ...facetN_id]

Since numpy docs state 2D arrays must be rectangular, not jagged, I use np.resize to set the length of every row to the row with the highest number of facets and np.ma.array to mask the padded arrays. Another solution is padding arrays with sentinel values, but masking is more correct IMO, and required for the next step.

Following that, I mask the parent_id and child_id and perform masking operations to determine matching rows with AND/OR facet ids. e.g. I want to find all parent_id/child_id rows that contain facets (A OR B) AND (C). I do not want to discard the values, and I do not want them part of the comparison. I can do this via the following:

facet_groups = [[A, B], [C]]
filtered = arr[np.logical_and.reduce([np.isin(arr, facet_ids).any(axis=1) for facet_ids in facet_groups])]

Now I can remove the mask and extract all the parent_id/child_id values with a slice:

filtered.mask = np.ma.nomask
parent_ids = filtered[:,0]
child_ids = filtered[:,1]

In addition, I use np.unique on the masked array to calculate the total counts for each remaining facet.

All in all, it's a very concise bit of code that performs very well for the small dataset of a few hundred thousand values.

@Lazarus-931

Copy link

Lazarus-931 commented Nov 30, 2025

Yes, I highly agree with @stuarta0, there needs to be a masked_fill feature in ndarray.

Currently, you would do so on a nth dimensional array using loops, where if a specific value is mask, then replacing it with z.

@Lazarus-931

Copy link

Lazarus-931 commented Nov 30, 2025

Yes, I highly agree with @stuarta0, there needs to be a masked_fill feature in ndarray.

Currently, you would do so on a nth dimensional array using loops, where if a specific value is mask, then replacing it with z.

At least this is what I do, there most definitely is a simpler version out somewhere.

Labels

None yet

5 participants

@andrei-papou @nilgoyette @bluss @stuarta0 @Lazarus-931

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft of potential masked array implementation. #849

Are you sure you want to change the base?

Draft of potential masked array implementation. #849

Conversation

@andrei-papou andrei-papou commented Nov 13, 2020

Uh oh!

nilgoyette commented Nov 23, 2020

Uh oh!

bluss commented Nov 28, 2020

Uh oh!

stuarta0 commented Nov 9, 2025

Uh oh!

Lazarus-931 commented Nov 30, 2025

Uh oh!

Lazarus-931 commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants