Trying to identify most common distinct motifs in time series data and then search for those motifs #438

Answered by seanlaw

chrisruk asked this question in Help: Coding & Implementations

@chrisruk chrisruk

Jul 17, 2021

· 1 comments · 12 replies

Answered by seanlaw Return to top

chrisruk
Jul 17, 2021

Hi,

I'm wondering anyone can possibly point me in the right direction,

I'm reading the paper - https://www.researchgate.net/publication/343280455_Human_Presence_Detection_by_monitoring_the_indoor_CO2_concentration

They mention:
"We identified events of presence or absence using motif detection on the CO2 concentration time series. Therefore, we used Motif Clustering within computing the full distance matrix using the STOMP algorithm".

I believe this is one of the algorithms your API uses.

The following is an example of one of their figures:
fig4

I have used their open data and tried to use stumpy, to find the same first 3 motifs as them, by doing:

#!/usr/bin/python3
import stumpy
import csv
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
if __name__ == "__main__":
 co2 = []
 with open('MUC2020/csv/RoomA_CO2.csv') as csvfile:
 reader = csv.DictReader(csvfile, delimiter=';')
 for row in reader:
 if len(row['CO2-Concentration (in ppm)'].strip()) > 0:
 co2.append(float(row['CO2-Concentration (in ppm)']))
 fig, axs = plt.subplots(1, sharex=True, gridspec_kw={'hspace': 0})
 plt.suptitle('Motif (Pattern) Discovery', fontsize='30')
 axs.plot(co2)
 axs.set_ylabel('CO2 ppm', fontsize='20')
 m = 10
 c = np.array(co2)
 p = stumpy.stump(c, m)
 dists, inde = stumpy.motifs(c, p[:,1], max_matches=3)
 for z in range(0, inde.shape[1]):
 rect = Rectangle((inde[0][z], 0), m, 10, facecolor=(1.0,0.0,0.0))
 axs.add_patch(rect)
 plt.show()

The following image shows my results using their data:

my_fig

The 3 motifs that the motifs function is finding for me, all seem to be very similar in that they're all below the downward trends of the CO2 ppm line.

I'm just wondering if anyone could possibly point me in the right direction to find the same 3 unique motifs they have and to then search for those motifs across the time series data.

I'm thinking the motifs function, might not be the function I should be using to find the most common distinct motifs?

Many thanks!

Answered by seanlaw

Jul 17, 2021

@chrisruk Thank you for your question and welcome to the STUMPY community. Please be forewarned that the stumpy.motifs function is still in the experimental stage and should be officially released in the upcoming v1.9.0 release. In the meantime, the API may still change.

Having said that, it looks like what you've done seems fine. However, I could be wrong but your window size of m = 10 appears to be smaller than the window size used in the paper. Also, can you please plot the raw CO2 time series along with the matrix profile computed using stumpy.stump (like what is shown here)? That will at least give you an idea of where the potential motifs are located within your time series (by look...

View full answer

Replies: 1 comment 12 replies

seanlaw
Jul 17, 2021
Maintainer

12 replies

@seanlaw

seanlaw Jul 19, 2021
Maintainer

Thanks again for all your help!

You bet!

What's the soon-to-be-released function going to be called out of interest, so I can look out for it.

It is called stumpy.stimp and actually computes what is referred to as a "pan matrix profile" (see this paper for more details). A work-in-progress tutorial can be found here

Here is the code that I used for your reference (please be warned that this is still in development and subject to change):

min_m, max_m = 10, 100
co2_pan = stumpy.stimp(c, min_m=min_m, max_m=max_m, percentage=1.0) # This percentage controls the extent of `stumpy.scrump` completion
percent_m = 1.0 # The percentage of windows to compute
n = np.ceil((max_m - min_m) * percent_m).astype(int)
for _ in range(n):
 co2_pan.update()

And you can plot the results with:

import matplotlib.pyplot as plt
from matplotlib import cm
fig = plt.figure()
fig.canvas.toolbar_visible = False
fig.canvas.header_visible = False
fig.canvas.footer_visible = False
color_map = cm.get_cmap("Greys_r", 256)
threshold = 0.2 # 0.2 is usually an excellent default but this is something that you'll need to play around with
im = plt.imshow(co2_pan.pan(threshold=threshold), cmap=color_map, origin="lower", interpolation="none", aspect="auto")
plt.xlabel("Time", fontsize="20")
plt.ylabel("m", fontsize="20")
plt.clim(0.0, 1.0)
plt.colorbar()
plt.tight_layout()
plt.show()

download-5

As you adjust the threshold, what you are looking for is the location of the peak of the right angle triangles that are formed. Of course, this is definitely more art than science but at least it is a "better" way to explore the data.

@chrisruk

chrisruk Jul 21, 2021
Author

Sorry for the delay in replying. I'll have a look at the tutorial and paper you referenced - they look very handy!

@seanlaw

seanlaw Jul 21, 2021
Maintainer

No apologies necessary. Just an FYI that the new version was released today!

@amurthy-sunysb

amurthy-sunysb Feb 14, 2023

Thanks a lot for your help, it's much appreciated!

I adapted your changes, to use the 'stumpy.match' function to find all matches for each motif.

I decided to use just 2 motifs as I'm mainly curious about the entering / leaving of a room.

Figure_1

I also added the 'max_distance' parameter to stumpy.match to get a couple more matches.

import stumpy
import csv
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [20, 6] # width, height
plt.rcParams['xtick.direction'] = 'out'
co2 = []
with open('MUC2020/csv/RoomA_CO2.csv') as csvfile:
 reader = csv.DictReader(csvfile, delimiter=';')
 for row in reader:
 if len(row['CO2-Concentration (in ppm)'].strip()) > 0:
 co2.append(float(row['CO2-Concentration (in ppm)']))
m = 24
c = np.array(co2)
p = stumpy.stump(c, m)
dists, inde = stumpy.motifs(c, p[:, 0], max_motifs=2)
fig, axs = plt.subplots(2, sharex=True, gridspec_kw={'hspace': 0})
plt.suptitle('Motif (Pattern) Discovery', fontsize='30')
axs[0].plot(co2)
axs[0].set_ylabel('CO2 ppm', fontsize='20')
cols = ['red' , 'green', 'blue' ]
for z in range(0, inde.shape[0]):
 col = cols[z]
 start = inde[z, 0]
 stop = inde[z, 0] + m
 matches = stumpy.match(c[start:stop],c, max_distance=2.0) 
 for mt in range(matches.shape[0]):
 s = matches[mt, 1]
 st = s + m
 axs[0].plot(np.arange(s, st), c[s : st], c=col)
axs[1].plot(p[:, 0])
axs[1].set_ylabel('Matrix profile', fontsize='20')
plt.show()

This example doesn't seem to be working for Stumpy 1.11.1. The inde returned from the call to stumpy.motifs is of shape (1, 10) @chrisruk.

@seanlaw

seanlaw Feb 16, 2023
Maintainer

This example doesn't seem to be working for Stumpy 1.11.1. The inde returned from the call to stumpy.motifs is of shape (1, 10) @chrisruk.

@amurthy-sunysb Please see this discussion

Answer selected by chrisruk

Trying to identify most common distinct motifs in time series data and then search for those motifs #438

Uh oh!

chrisruk Jul 17, 2021

Replies: 1 comment · 12 replies

Uh oh!

seanlaw Jul 17, 2021 Maintainer

Uh oh!

seanlaw Jul 19, 2021 Maintainer

Uh oh!

chrisruk Jul 21, 2021 Author

Uh oh!

seanlaw Jul 21, 2021 Maintainer

Uh oh!

Uh oh!

amurthy-sunysb Feb 14, 2023

Uh oh!

Uh oh!

seanlaw Feb 16, 2023 Maintainer

chrisruk
Jul 17, 2021

Replies: 1 comment 12 replies

seanlaw
Jul 17, 2021
Maintainer

seanlaw Jul 19, 2021
Maintainer

chrisruk Jul 21, 2021
Author

seanlaw Jul 21, 2021
Maintainer

seanlaw Feb 16, 2023
Maintainer