I'm trying to write something like an adjusted run-length encoding for row-wise bit matrices. Specifically I want to "collapse" runs of 1s into the number of 1s in that run while maintaining the same number of 0s as original (right-padding as necessary). For example:
input_v = np.array([
[0, 1, 0, 0, 1, 1, 0, 1],
[0, 0, 1, 0, 1, 1, 1, 0]
])
expected_v = np.array([
[0, 1, 0, 0, 2, 0, 1],
[0, 0, 1, 0, 3, 0, 0]
])
My current attempt works after padding, but is slow:
def count_neighboring_ones(l):
o = 0
for i in l:
if i == 0:
return o
o += 1
def f(l):
out = []
i = 0
while i < len(l):
c = count_neighboring_ones(l[i:])
out.append(c)
i += (c or 1)
return out
Are there some vectorization techniques I can use to operate on the entire matrix to reduce row-wise operations and post-padding?
-
\$\begingroup\$ Why are you doing this? What are the dimensions of typical data you hope to compress in this way? Can you not just use Numpy's built-in sparse matrix support? \$\endgroup\$Reinderien– Reinderien2021年07月27日 18:54:55 +00:00Commented Jul 27, 2021 at 18:54
-
\$\begingroup\$ I'm doing this because I need to pass the result to library code I don't control. It's normally small dims but it happens many times. \$\endgroup\$erip– erip2021年07月27日 19:13:23 +00:00Commented Jul 27, 2021 at 19:13
-
\$\begingroup\$ What is the library you're passing to? \$\endgroup\$Reinderien– Reinderien2021年07月27日 19:14:32 +00:00Commented Jul 27, 2021 at 19:14
-
\$\begingroup\$ I don't think it matters. \$\endgroup\$erip– erip2021年07月27日 19:15:31 +00:00Commented Jul 27, 2021 at 19:15
-
\$\begingroup\$ I think it does, if you want a review. But hey, it's your life \$\endgroup\$Reinderien– Reinderien2021年07月27日 19:17:42 +00:00Commented Jul 27, 2021 at 19:17
1 Answer 1
algorithm
We asked you nicely.
What are the dimensions of typical data you hope to compress in this way?
You declined to respond.
Compiled C code and interpreted python bytecode run at different speeds. When we use Big-Oh notation, we're hiding a constant factor. Some problems manipulate a dozen integers, and some a million integers. The details matter.
while i < len(l):
c = count_neighboring_ones(l[i:])
Let \$n\$ be the length of the input list.
The while
loop has \$O(n)\$ linear time complexity,
and then the l[i:]
slice is also linear.
So we have \$O(n^2)\$ quadratic complexity.
We are told that counting ones is important
because there is some sparsity to the input data,
so if \$z\$ is number of zeros we have no reason
to believe that \$z \ll n\$.
I don't know how big your input list tends to be, because you adamantly refused to tell us. But I can tell you this: "quadratic bad".
Rather than passing in a freshly copied vector, you want to pass in an index that points within same old vector.
lint
Pep-8
asked you nicely to avoid certain names.
zomg you went for both ell and oh.
If this source code was longer I might
be able to fill in more of my bingo squares.
In the count helper you apparently intended this sort of thing:
ones += 1
.
You don't have to give your functions fancier names that f
.
But if you go the single-letter route, then cite your reference
so we can read about what some author was thinking
when defining the function, or give us a """docstring"""
explaining what its
single responsibility
is.