Return to Answer

added 14 characters in body

edited Feb 7, 2024 at 22:11

41.8k
3
38
157

For a square matrix of side N, you throwyour test code throws about 412 ×ばつ N IndexOutOfRangeExceptions, which seems like perhaps more than you'd prefer since doing so takes time.

For a square matrix of side N, you throw about 4 ×ばつ N IndexOutOfRangeExceptions, which seems like perhaps more than you'd prefer since doing so takes time.

For a square matrix of side N, your test code throws about 12 ×ばつ N IndexOutOfRangeExceptions, which seems like perhaps more than you'd prefer since doing so takes time.

added 125 characters in body

Source Link

edited Feb 7, 2024 at 21:12

J_H

edited Feb 7, 2024 at 21:12

J_H

41.8k
3
38
157

Area of neighborhood is quadratic with distThreshold. For this approach to work well, number of distinct colors C should be smallish relative to neighborhood area. Sometimes result[1] must be set to the majority color, and computing this will have O(C) cost. (It could have O(log N) cost, but that sounds like a lot of pqueues .)

Initially computing a color histogram, the population count of each distinct pixel color, is fairly cheap. We might find there's a handful of very popular colors plus small numbers of rare color values. It might be acceptable to perform a lossy input transform, which lumps all unpopular colors together as a single designated value, or uses a randomly chosen mapping to turn them into just a few distinct colors. We could even put our thumb on the scale, so the function will never return an unpopular color.

Consider relying on the scikit-learn implemenationimage implementation of this or related filters.

Consider relying on the scikit-learn implemenation of this or related filters.

Consider relying on the scikit-image implementation of this or related filters.

Source Link

answered Feb 7, 2024 at 21:07

J_H

answered Feb 7, 2024 at 21:07

J_H

41.8k
3
38
157

This submission is about performance. It does not include any profiler measurements.

robust testing

public static void testArr()
 ...
 int arrWidth = 5;
 int arrHeight = 5;

Prefer a 5 ×ばつ 6 matrix, or other non-square shape.

Why? Sometimes target code will accidentally invert (x, y) into (y, x), and we wish to provoke and fix such bugs.

Consider using a distance threshold greater than 1 in your testing, if you wish to expose performance hotspots.

variant spelling

 DateTime nao = DateTime.Now;

You knao perfectly well how to spell that word. Prefer its standard spelling.

names

In many identifiers you describe something as an "index" where my understanding is it's more like a pixel "color" or "value".

 if (colorCounts.TryGetValue(currentIndex, out int value))
 {
 value++;
 colorCounts[currentIndex] = value;

Here value is too vague; prefer count.

common exceptions

For a square matrix of side N, you throw about 4 ×ばつ N IndexOutOfRangeExceptions, which seems like perhaps more than you'd prefer since doing so takes time.

Consider expanding the array with borders, so we have (N + 2 ×ばつ distThreshold)2 cells in total. Fill the border with unique serial numbers, or with periodic pattern of period at least 1 + 2 * distThreshold. That way a border value won't be the most popular.

common operations

Each point requests the allocation and deallocation of a Dictionary, causing GC work. Maybe hang onto a single Dictionary, and .Clear() it when starting on a new cell? Or maybe turn the dictionary into a vector of size C, the number of colors?

When scanning a raster we re-compute the same old partial sums several times. For low values of distThreshold, such as what your test code uses, I'm skeptical that one can do much better.

For moderate or large thresholds, consider an alternate approach. Initialize by making a single scan to find number of distinct pixel colors, C, in the input matrix. Depending on size of threshold, allocate either a byte or a UInt16 count matrix, with shape arrWidth ×ばつ arrHeight ×ばつ C, or slightly larger if we have tacked on border cells.

When computing window sums for a moving average, we can efficiently add a figure from the leading edge and subtract another figure from the trailing edge. Similarly, in this situation, we can efficiently compute "count of certain pixel color in this cell's neighborhood" by copying the counts we had just finished computing, increment based on leading edge, and decrement based on trailing edge. It is perfectly OK to have some garbage counts in the border, as we won't actually consult them.

simplify the input

This would give us an attractively small C, at the cost of slightly changing the output. You wouldn't want to use this technique in an adversarial setting.

median

The majority filter you propose appears to be intended to cope with salt-and-pepper noise. It seems like it lies on this neighborhood filtering continuum:

mean
median
majority

In image regions that are somewhat distant from an edge, you may be able to get away with dynamically switching to one of those other filters, taking advantage of a rapid, mature implementation. Sampling every Nth pixel neighborhood to support such a decision may be sufficient to give acceptable results.

Consider relying on the scikit-learn implemenation of this or related filters.

lang-cs