Make excl_zone a parameter of stumpy.motif() · stumpy-dev/stumpy · Discussion #1063

JulienLeprince
Dec 12, 2024

Hi,

stumpy.motif() identifies a lot of overlapping motifs in my time series - would it possible to add excl_zone as a an input parameter to the function such that the unwanted behavior can be adjusted by users?

Kind regards,
Julien

Replies: 8 comments

seanlaw
Dec 12, 2024
Maintainer

@JulienLeprince Thank you for your question and welcome to the STUMPY community.

would it possible to add excl_zone as a an input parameter to the function such that the unwanted behavior can be adjusted by users

IIRC @NimaSarajpoor looked at this a long time ago (I don't recall the detail and I can't seem to find the original conversation) but I vaguely recall that excluding regions next to an already-discovered motif (candidate) subsequence would lead to unintended consequences. @NimaSarajpoor do you remember anything about this or was it related to something else?

Note that an exclusion zone is already applied to neighboring areas surrounding subsequences that match a candidate motif subsequence.

0 replies

JulienLeprince
Dec 12, 2024
Author

Thanks for the welcome and prompt reply Sean. To be clear:

the exclusion zone works well for spotting an isolated motif A, i.e. no overlaps with identified motifs within that same group.
however, following identified motif groups, say B and C, are typically shifted versions of the initial A motif. Increasing the sequence length eventually removes the unwanted behavior but at the cost of not identifying other desirable motifs.

0 replies

seanlaw
Dec 12, 2024
Maintainer

however, following identified motif groups, say B and C, are typically shifted versions of the initial A motif. Increasing the sequence length eventually removes the unwanted behavior but at the cost of not identifying other desirable motifs.

Yes, I understand. However, since the exclusion zone is also applied, say, upstream (and downstream), of a motif, it is also possible to miss desirable motifs as well. In other words, it's a double edged sword and we'd rather err on the side of NOT ignoring potential motifs since we (the developer) can't tell whether a subsequence is "important" or not. It also certainly depends on the size of your window, m, and the length of your time series.

@JulienLeprince Often times, I like to ask whether you could do a bit of post-processing by setting max_motifs to some large number and then go through each motif and see if it is "too close" to one that was already previously found? Frankly, I think this is the "safer" thing to do. As I think about it a bit more, I think you might also be able to set max_motifs=10 and, say, the first motif is "correct" but the other 9/10 are "too close" to the first motif, then you can take your P and doctor it by setting those distances (for the 9 latter subsequence locations) to np.inf and then you could run stumpy.motifs again. Purely just thinking out loud here (without having consumed any coffee today)!

I truly don't mean to be difficult here but stumpy.motifs was meant to be reasonable starting point for analyzing your matrix profile (a simple helper function) and it was never meant to satisfy more specific conditions. Anything beyond its super basic functionality probably means that you should/could start rolling your own motif_finder function by copying the stumpy.motifs function. I would encourage that and possibly share your code in our Discussions section.

0 replies

NimaSarajpoor
Dec 13, 2024
Collaborator

@seanlaw

do you remember anything about this or was it related to something else?

I think we had a relevant discussion but couldn't find it. I tried to check the code again, and noticed a few things that I shared below. Please let me know if you notice any mistake

@JulienLeprince
First, let's have a quick review of stumpy.motifs. Let's say we are looking for motifs of length m in the time series T. We can use the function stumpy.motifs, and get the output motif_indices:

# motif_indices
array([
[A0, A1, A2, A3, A4],
[B0, B1, B2, B3, B4],
...
])

A0 is the index of first motif, and the index for its closest matches are A1, A2, A3, and A4.
B0 is the index of second motif, and its closest matches are B1, B2, B3, and B4.

Note 1:
excl_zone is considered between the two motifs A0 and B0 , i.e. $|A0 - B0| > excl\textunderscore{zone} $

Note 2:
excl_zone is also considered between the matches of a motif. In other words:

$\forall{x, y}, {x}\neq{y} \in set(A0, A1, A2, A3, A4): |x - y| > excl\textunderscore{zone} $
AND
$\forall{x, y}, {x}\neq{y} \in set(B0, B1, B2, B3, B4): |x - y| > excl\textunderscore{zone} $

Note 3:
The matches of motif B0 can even be the same as the matches of motif A0. In other words, the following logic is NOT implemented:

$\forall{x} \in set(A1, A2, A3, A4), \forall{y} \in set(B1, B2, B3, B4): |x - y| > excl\textunderscore{zone} $

If your challenge is related to Note 1 or Note 2 above, it can be fixed by changing the excl_zone in stumpy.config.
If your challenge is related to Note 3, then I think you can try to modify the code. However, IMO, this can/may result in missing (not capturing) some motifs. We can discuss it further if needed.

0 replies

seanlaw
Dec 13, 2024
Maintainer

Note 1:
excl_zone is considered between the two motifs A0 and B0 , i.e. $|A0 - B0| > excl\textunderscore{zone} $

@NimaSarajpoor Please correct me if I'm wrong but I don't think this is incorrect. From what I can see in the stumpy.motifs code, excl_zone appears to only be applied to the matches such that each match (i.e., [A1, A2, A3, A4]) is prevented from becoming a motif. However, I don't think there is anything preventing A0 + 1 (i.e., one index location next to A0) from becoming the next candidate motif (i.e., B0 can be A0 + 1 or even A0 + m / 4 - 1). Maybe I'm overlooking this in the code?

0 replies

NimaSarajpoor
Dec 13, 2024
Collaborator

From what I can see in the stumpy.motifs code, excl_zone appears to only be applied to the matches such that each match (i.e., [A1, A2, A3, A4]) is prevented from becoming a motif. However, I don't think there is anything preventing A0 + 1 (i.e., one index location next to A0)

https://github.com/TDAmeritrade/stumpy/blob/3165d1ccbe505b0c5bd324db5e12979c392967d8/stumpy/motifs.py#L140-L143

A0 (the start index of the motif itself) is included in the query_matches[:, 1], and hence its close neighbours are excluded before it chooses the next best motif candidate.

A quick check

import numpy as np
import stumpy
seed = 0
np.random.seed(seed)
T = np.random.rand(20)
T[:3] = 0.0
T[-3:] = 0.0
m = 3
mp = stumpy.stump(T, m=m)
query_matches = stumpy.match(T[:3], T)
print(query_matches[:, 1]) 
[0 17] # This contains the index `0`

As a side, thanks for mentioning that matches (and their "close" neighbours) are prevented from becoming the next motif. In my previous comment, Note 1 did not reflect that. Going to provide its revised version below:

Note 1 (revised)
$\forall{x} \in set(A0, A1, A2, A3, A4): |B0 - x| > excl\textunderscore{zone} $

0 replies

seanlaw
Dec 13, 2024
Maintainer

A0 (the start index of the motif itself) is included in the query_matches[:, 1], and hence its close neighbours are excluded before it chooses the next best motif candidate.

Ohhhhhhhh. Very sneaky. I totally forgot about that!! Nice catch. I almost feel like we SHOULD add a comment to remind future-self.

In light of this, @JulienLeprince are you able to provide a concrete example (with data) of where the "other" motifs are "inside" of the exclusion zone of a previously identified motif? Note that the exclusion zone is within +/- m / 4 of some index and m is your window size.

0 replies

seanlaw
Jan 23, 2025
Maintainer

3676545

I've added a comment in the code:

https://github.com/TDAmeritrade/stumpy/blob/3676545d9d7622e7cbf79141b05f040d28a5954f/stumpy/motifs.py#L137-L145

0 replies

Uh oh!

Make excl_zone a parameter of stumpy.motif() #1063

Uh oh!

JulienLeprince Dec 12, 2024

Replies: 8 comments

Uh oh!

Uh oh!

seanlaw Dec 12, 2024 Maintainer

Uh oh!

JulienLeprince Dec 12, 2024 Author

Uh oh!

Uh oh!

seanlaw Dec 12, 2024 Maintainer

Uh oh!

Uh oh!

NimaSarajpoor Dec 13, 2024 Collaborator

Uh oh!

seanlaw Dec 13, 2024 Maintainer

Uh oh!

Uh oh!

NimaSarajpoor Dec 13, 2024 Collaborator

Uh oh!

Uh oh!

seanlaw Dec 13, 2024 Maintainer

Uh oh!

seanlaw Jan 23, 2025 Maintainer

JulienLeprince
Dec 12, 2024

seanlaw
Dec 12, 2024
Maintainer

JulienLeprince
Dec 12, 2024
Author

seanlaw
Dec 12, 2024
Maintainer

NimaSarajpoor
Dec 13, 2024
Collaborator

seanlaw
Dec 13, 2024
Maintainer

NimaSarajpoor
Dec 13, 2024
Collaborator

seanlaw
Dec 13, 2024
Maintainer

seanlaw
Jan 23, 2025
Maintainer