Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Make excl_zone a parameter of stumpy.motif() #1063

JulienLeprince started this conversation in General
Discussion options

Hi,

stumpy.motif() identifies a lot of overlapping motifs in my time series - would it possible to add excl_zone as a an input parameter to the function such that the unwanted behavior can be adjusted by users?

Kind regards,
Julien

You must be logged in to vote

Replies: 8 comments

Comment options

@JulienLeprince Thank you for your question and welcome to the STUMPY community.

would it possible to add excl_zone as a an input parameter to the function such that the unwanted behavior can be adjusted by users

IIRC @NimaSarajpoor looked at this a long time ago (I don't recall the detail and I can't seem to find the original conversation) but I vaguely recall that excluding regions next to an already-discovered motif (candidate) subsequence would lead to unintended consequences. @NimaSarajpoor do you remember anything about this or was it related to something else?

Note that an exclusion zone is already applied to neighboring areas surrounding subsequences that match a candidate motif subsequence.

You must be logged in to vote
0 replies
Comment options

Thanks for the welcome and prompt reply Sean. To be clear:

  • the exclusion zone works well for spotting an isolated motif A, i.e. no overlaps with identified motifs within that same group.
  • however, following identified motif groups, say B and C, are typically shifted versions of the initial A motif. Increasing the sequence length eventually removes the unwanted behavior but at the cost of not identifying other desirable motifs.
You must be logged in to vote
0 replies
Comment options

however, following identified motif groups, say B and C, are typically shifted versions of the initial A motif. Increasing the sequence length eventually removes the unwanted behavior but at the cost of not identifying other desirable motifs.

Yes, I understand. However, since the exclusion zone is also applied, say, upstream (and downstream), of a motif, it is also possible to miss desirable motifs as well. In other words, it's a double edged sword and we'd rather err on the side of NOT ignoring potential motifs since we (the developer) can't tell whether a subsequence is "important" or not. It also certainly depends on the size of your window, m, and the length of your time series.

@JulienLeprince Often times, I like to ask whether you could do a bit of post-processing by setting max_motifs to some large number and then go through each motif and see if it is "too close" to one that was already previously found? Frankly, I think this is the "safer" thing to do. As I think about it a bit more, I think you might also be able to set max_motifs=10 and, say, the first motif is "correct" but the other 9/10 are "too close" to the first motif, then you can take your P and doctor it by setting those distances (for the 9 latter subsequence locations) to np.inf and then you could run stumpy.motifs again. Purely just thinking out loud here (without having consumed any coffee today)!

I truly don't mean to be difficult here but stumpy.motifs was meant to be reasonable starting point for analyzing your matrix profile (a simple helper function) and it was never meant to satisfy more specific conditions. Anything beyond its super basic functionality probably means that you should/could start rolling your own motif_finder function by copying the stumpy.motifs function. I would encourage that and possibly share your code in our Discussions section.

You must be logged in to vote
0 replies
Comment options

@seanlaw

do you remember anything about this or was it related to something else?

I think we had a relevant discussion but couldn't find it. I tried to check the code again, and noticed a few things that I shared below. Please let me know if you notice any mistake

@JulienLeprince
First, let's have a quick review of stumpy.motifs. Let's say we are looking for motifs of length m in the time series T. We can use the function stumpy.motifs, and get the output motif_indices:

# motif_indices
array([
[A0, A1, A2, A3, A4],
[B0, B1, B2, B3, B4],
...
])

A0 is the index of first motif, and the index for its closest matches are A1, A2, A3, and A4.
B0 is the index of second motif, and its closest matches are B1, B2, B3, and B4.

Note 1:
excl_zone is considered between the two motifs A0 and B0 , i.e. $|A0 - B0| > excl\textunderscore{zone} $

Note 2:
excl_zone is also considered between the matches of a motif. In other words:

$\forall{x, y}, {x}\neq{y} \in set(A0, A1, A2, A3, A4): |x - y| > excl\textunderscore{zone} $
AND
$\forall{x, y}, {x}\neq{y} \in set(B0, B1, B2, B3, B4): |x - y| > excl\textunderscore{zone} $

Note 3:
The matches of motif B0 can even be the same as the matches of motif A0. In other words, the following logic is NOT implemented:

$\forall{x} \in set(A1, A2, A3, A4), \forall{y} \in set(B1, B2, B3, B4): |x - y| > excl\textunderscore{zone} $


  • If your challenge is related to Note 1 or Note 2 above, it can be fixed by changing the excl_zone in stumpy.config.
  • If your challenge is related to Note 3, then I think you can try to modify the code. However, IMO, this can/may result in missing (not capturing) some motifs. We can discuss it further if needed.
You must be logged in to vote
0 replies
Comment options

Note 1:
excl_zone is considered between the two motifs A0 and B0 , i.e. $|A0 - B0| > excl\textunderscore{zone} $

@NimaSarajpoor Please correct me if I'm wrong but I don't think this is incorrect. From what I can see in the stumpy.motifs code, excl_zone appears to only be applied to the matches such that each match (i.e., [A1, A2, A3, A4]) is prevented from becoming a motif. However, I don't think there is anything preventing A0 + 1 (i.e., one index location next to A0) from becoming the next candidate motif (i.e., B0 can be A0 + 1 or even A0 + m / 4 - 1). Maybe I'm overlooking this in the code?

You must be logged in to vote
0 replies
Comment options

From what I can see in the stumpy.motifs code, excl_zone appears to only be applied to the matches such that each match (i.e., [A1, A2, A3, A4]) is prevented from becoming a motif. However, I don't think there is anything preventing A0 + 1 (i.e., one index location next to A0)

https://github.com/TDAmeritrade/stumpy/blob/3165d1ccbe505b0c5bd324db5e12979c392967d8/stumpy/motifs.py#L140-L143

A0 (the start index of the motif itself) is included in the query_matches[:, 1], and hence its close neighbours are excluded before it chooses the next best motif candidate.

A quick check

import numpy as np
import stumpy
seed = 0
np.random.seed(seed)
T = np.random.rand(20)
T[:3] = 0.0
T[-3:] = 0.0
m = 3
mp = stumpy.stump(T, m=m)
query_matches = stumpy.match(T[:3], T)
print(query_matches[:, 1]) 
[0 17] # This contains the index `0`

As a side, thanks for mentioning that matches (and their "close" neighbours) are prevented from becoming the next motif. In my previous comment, Note 1 did not reflect that. Going to provide its revised version below:

Note 1 (revised)
$\forall{x} \in set(A0, A1, A2, A3, A4): |B0 - x| > excl\textunderscore{zone} $

You must be logged in to vote
0 replies
Comment options

A0 (the start index of the motif itself) is included in the query_matches[:, 1], and hence its close neighbours are excluded before it chooses the next best motif candidate.

Ohhhhhhhh. Very sneaky. I totally forgot about that!! Nice catch. I almost feel like we SHOULD add a comment to remind future-self.

In light of this, @JulienLeprince are you able to provide a concrete example (with data) of where the "other" motifs are "inside" of the exclusion zone of a previously identified motif? Note that the exclusion zone is within +/- m / 4 of some index and m is your window size.

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Converted from issue

This discussion was converted from issue #1050 on January 23, 2025 11:52.

AltStyle によって変換されたページ (->オリジナル) /