Vectorized code to find the position and length of runs of 1s in a bit matrix

Question 1

I'm trying to write something like an adjusted run-length encoding for row-wise bit matrices. Specifically I want to "collapse" runs of 1s into the number of 1s in that run while maintaining the same number of 0s as original (right-padding as necessary). For example:

input_v = np.array([
 [0, 1, 0, 0, 1, 1, 0, 1],
 [0, 0, 1, 0, 1, 1, 1, 0]
])
expected_v = np.array([
 [0, 1, 0, 0, 2, 0, 1],
 [0, 0, 1, 0, 3, 0, 0]
])

My current attempt works after padding, but is slow:

def count_neighboring_ones(l):
 o = 0
 for i in l:
 if i == 0:
 return o
 o += 1
def f(l):
 out = []
 i = 0
 while i < len(l):
 c = count_neighboring_ones(l[i:])
 out.append(c)
 i += (c or 1)
 return out

Are there some vectorization techniques I can use to operate on the entire matrix to reduce row-wise operations and post-padding?

Question 2

Why are you doing this? What are the dimensions of typical data you hope to compress in this way? Can you not just use Numpy's built-in sparse matrix support?

Question 3

I'm doing this because I need to pass the result to library code I don't control. It's normally small dims but it happens many times.

Question 4

What is the library you're passing to?

Question 5

I don't think it matters.

Question 6

I think it does, if you want a review. But hey, it's your life

Question 7

algorithm

We asked you nicely.

What are the dimensions of typical data you hope to compress in this way?

You declined to respond.

Compiled C code and interpreted python bytecode run at different speeds. When we use Big-Oh notation, we're hiding a constant factor. Some problems manipulate a dozen integers, and some a million integers. The details matter.

 while i < len(l):
 c = count_neighboring_ones(l[i:])

Let \$n\$ be the length of the input list. The while loop has \$O(n)\$ linear time complexity, and then the l[i:] slice is also linear. So we have \$O(n^2)\$ quadratic complexity. We are told that counting ones is important because there is some sparsity to the input data, so if \$z\$ is number of zeros we have no reason to believe that \$z \ll n\$.

I don't know how big your input list tends to be, because you adamantly refused to tell us. But I can tell you this: "quadratic bad".

Rather than passing in a freshly copied vector, you want to pass in an index that points within same old vector.

lint

Pep-8 asked you nicely to avoid certain names. zomg you went for both ell and oh. If this source code was longer I might be able to fill in more of my bingo squares. In the count helper you apparently intended this sort of thing: ones += 1.

You don't have to give your functions fancier names that f. But if you go the single-letter route, then cite your reference so we can read about what some author was thinking when defining the function, or give us a """docstring""" explaining what its single responsibility is.

J_H J_H 41.6k3 gold badges38 silver badges157 bronze badges · Answer 1 · 2024-04-28 23:42:51Z

algorithm

We asked you nicely.

What are the dimensions of typical data you hope to compress in this way?

You declined to respond.

Compiled C code and interpreted python bytecode run at different speeds. When we use Big-Oh notation, we're hiding a constant factor. Some problems manipulate a dozen integers, and some a million integers. The details matter.

 while i < len(l):
 c = count_neighboring_ones(l[i:])

Let \$n\$ be the length of the input list. The while loop has \$O(n)\$ linear time complexity, and then the l[i:] slice is also linear. So we have \$O(n^2)\$ quadratic complexity. We are told that counting ones is important because there is some sparsity to the input data, so if \$z\$ is number of zeros we have no reason to believe that \$z \ll n\$.

I don't know how big your input list tends to be, because you adamantly refused to tell us. But I can tell you this: "quadratic bad".

Rather than passing in a freshly copied vector, you want to pass in an index that points within same old vector.

lint

Pep-8 asked you nicely to avoid certain names. zomg you went for both ell and oh. If this source code was longer I might be able to fill in more of my bingo squares. In the count helper you apparently intended this sort of thing: ones += 1.

You don't have to give your functions fancier names that f. But if you go the single-letter route, then cite your reference so we can read about what some author was thinking when defining the function, or give us a """docstring""" explaining what its single responsibility is.

Stack Exchange Network

Vectorized code to find the position and length of runs of 1s in a bit matrix

1 Answer 1

algorithm

lint

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Vectorized code to find the position and length of runs of 1s in a bit matrix

1 Answer 1

algorithm

lint

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions