Counting Sequential Booleans

Question 1

This function takes a list/array of booleans and converts them to an array that counts the number of either True/False values found next to each other.

I'd like to see this optimized for performance. It's not too slow, but I do use multiple loops with embedded if-else statements, I'm wondering if they're absolutely necessary.

import numpy as np
x = np.random.uniform(1,100,100)
b = x > x.mean()
#function start, input is b
endarray = []
count = 0
instance = True
while True:
 subarray = 0
 while True:
 if count >= len(b):
 endarray.append(subarray)
 break
 if b[count] == instance:
 subarray += 1
 count += 1
 else:
 endarray.append(subarray)
 instance = not instance
 break
 if count >= len(b):
 break
if len(endarray) % 2 != 0:
 endarray = np.append(endarray, 0)
else:
 endarray = np.asarray(endarray)
endarray = endarray.reshape(-1,2)

The output is a Nx2 array, where the left-hand values are always a count of Trues, and the right-hand values are always a count of Falses.

After a sequence of False values are no longer continuous(a True value pops up), the next count of True values begin, and vice versa.

Example input

b
Out[31]: 
array([ True, True, True, False, True, True, True, True, False,
 False, True, False, False, True, False, False, False, False,
 True, False, False, False, True, True, True, True, False,
 False, True, False, False, False, False, False, False, True,
 True, False, True, True, False, False, True, False, False,
 True, False, False, True, False, True, False, True, False,
 True, True, True, False, True, False, True, True, True,
 True, False, False, True, False, True, True, True, True,
 True, True, False, True, True, False, True, True, False,
 False, True, False, True, False, False, True, True, True,
 True, False, False, False, False, False, True, True, True,
 True])

Example output

endarray
Out[32]: 
array([[3, 1],
 [4, 2],
 [1, 2],
 [1, 4],
 [1, 3],
 [4, 2],
 [1, 6],
 [2, 1],
 [2, 2],
 [1, 2],
 [1, 2],
 [1, 1],
 [1, 1],
 [1, 1],
 [3, 1],
 [1, 1],
 [4, 2],
 [1, 1],
 [6, 1],
 [2, 1],
 [2, 2],
 [1, 1],
 [1, 2],
 [4, 5],
 [4, 0]])

Edit: I wanted to add an updated version of this code, the one in the answer below is not technically correct in all regards. But this was entirely derived from it:

m = np.append(b[0], np.diff(b))
_, c = np.unique(m.cumsum(), return_index=True)
out = np.diff(np.append(c, len(b)))
if b[0] == False:
 out = np.append(0, out)
if len(out) % 2:
 out = np.append(out, 0)
out = out.reshape(-1, 2)

Question 2

Using `itertools.groupby`

What you are looking for is itertools.groupby. When there is an odd number of groups then we use try-except block here.

from itertools import groupby
get_grp_len = lambda grp: len([*grp])
def transform(b):
 if len(b) == 0: # if not b wouldn't work since your `b` is ndarray
 return []
 it = groupby(b)
 out = []
 for _, grp in it:
 try:
 t_size = get_grp_len(grp)
 f_size = get_grp_len(next(it)[1])
 out.append([t_size, f_size])
 except StopIteration:
 out.append([t_size, 0])
 return out
print(transform(b)) # `b` taken from the question itself.
# same output as expected output posted in question.

`NumPy` has a lot of vectorized operations

NumPy has many vectorized operations, you have to find the correct ones. Not an expert but the below approach should do good.

The idea here is to find the index of the first value from each group and then take the difference.

We check if ith element is not equal to i+1th element.
Now use np.ndarray.cumsum to give each group a unique sequential number.
Then use np.unique to get first index of value from each group.
Find the difference between i+1th value and ith to get size of each grp. We can use np.diff

m = b != (np.r_[np.nan, b[:-1]]) 
_, c = np.unique(m.cumsum(), return_index=True)
#print(c)
# array([ 0, 3, 4, 8, 10, 11, 13, 14, 18, 19, 22, 26, 28, 29, 35, 37, 38,
# 40, 42, 43, 45, 46, 48, 49, 50, 51, 52, 53, 54, 57, 58, 59, 60, 64,
# 66, 67, 68, 74, 75, 77, 78, 80, 82, 83, 84, 85, 87, 91, 96])
# np.unique gives the index of the first occurrence, our `b` len is 100
# and `c` stopped at 96. we need to add the last index + 1 to it.
out = np.diff(np.r_[c, len(b)])
# Now, reshape your array, if it's of odd length add a 0 at end.
if len(out) % 2:
 out = np.r_[out, 0].reshape(-1, 2)
out = out.reshape(-1, 2)
print(out)
# same output mentioned in the question.

Using `Pandas`'s `GroupBy`

A lot of times pandas and NumPy are used together so might as well post pandas code too.

import pandas as pd
s = pd.Series(b)
g = s.ne(s.shift()).cumsum()
out = s.groupby(g).size().to_numpy()
# repeat same step as we did in above solution add a 0 if len is odd.
if len(out)%2:
 out = np.r_[out, 0].reshape(-1, 2)
out = out.reshape(-1, 2)
print(out)

Code Review

Everything's good as far as I can see, well-named variables, proper indentation but too many if :p

Question 3

nicely done! The numpy code is ~3x faster than what I made. The if statement in the numpy code also needs to be reversed.

Question 4

@Estif np.r_ is written in pure python to make it a little faster replace np.r_ with np.hstack

Question 5

I believe np.append is actually a bit faster in this case

Ch3steR Ch3steR 6447 silver badges14 bronze badges · Accepted Answer · 2020-11-18 18:30:54Z

Using `itertools.groupby`

What you are looking for is itertools.groupby. When there is an odd number of groups then we use try-except block here.

from itertools import groupby
get_grp_len = lambda grp: len([*grp])
def transform(b):
 if len(b) == 0: # if not b wouldn't work since your `b` is ndarray
 return []
 it = groupby(b)
 out = []
 for _, grp in it:
 try:
 t_size = get_grp_len(grp)
 f_size = get_grp_len(next(it)[1])
 out.append([t_size, f_size])
 except StopIteration:
 out.append([t_size, 0])
 return out
print(transform(b)) # `b` taken from the question itself.
# same output as expected output posted in question.

`NumPy` has a lot of vectorized operations

NumPy has many vectorized operations, you have to find the correct ones. Not an expert but the below approach should do good.

The idea here is to find the index of the first value from each group and then take the difference.

We check if ith element is not equal to i+1th element.
Now use np.ndarray.cumsum to give each group a unique sequential number.
Then use np.unique to get first index of value from each group.
Find the difference between i+1th value and ith to get size of each grp. We can use np.diff

m = b != (np.r_[np.nan, b[:-1]]) 
_, c = np.unique(m.cumsum(), return_index=True)
#print(c)
# array([ 0, 3, 4, 8, 10, 11, 13, 14, 18, 19, 22, 26, 28, 29, 35, 37, 38,
# 40, 42, 43, 45, 46, 48, 49, 50, 51, 52, 53, 54, 57, 58, 59, 60, 64,
# 66, 67, 68, 74, 75, 77, 78, 80, 82, 83, 84, 85, 87, 91, 96])
# np.unique gives the index of the first occurrence, our `b` len is 100
# and `c` stopped at 96. we need to add the last index + 1 to it.
out = np.diff(np.r_[c, len(b)])
# Now, reshape your array, if it's of odd length add a 0 at end.
if len(out) % 2:
 out = np.r_[out, 0].reshape(-1, 2)
out = out.reshape(-1, 2)
print(out)
# same output mentioned in the question.

Using `Pandas`'s `GroupBy`

A lot of times pandas and NumPy are used together so might as well post pandas code too.

import pandas as pd
s = pd.Series(b)
g = s.ne(s.shift()).cumsum()
out = s.groupby(g).size().to_numpy()
# repeat same step as we did in above solution add a 0 if len is odd.
if len(out)%2:
 out = np.r_[out, 0].reshape(-1, 2)
out = out.reshape(-1, 2)
print(out)

Code Review

Everything's good as far as I can see, well-named variables, proper indentation but too many if :p

nicely done! The numpy code is ~3x faster than what I made. The if statement in the numpy code also needs to be reversed.
@Estif np.r_ is written in pure python to make it a little faster replace np.r_ with np.hstack

Stack Exchange Network

Counting Sequential Booleans

Example input

Example output

1 Answer 1

Using `itertools.groupby`

`NumPy` has a lot of vectorized operations

Using `Pandas`'s `GroupBy`

Code Review

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting Sequential Booleans

Example input

Example output

1 Answer 1

Using itertools.groupby

NumPy has a lot of vectorized operations

Using Pandas's GroupBy

Code Review

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Using `itertools.groupby`

`NumPy` has a lot of vectorized operations

Using `Pandas`'s `GroupBy`