Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

New scalable Time Series classification developed in Python.

Notifications You must be signed in to change notification settings

moradabaz/ProximityForests-python

Repository files navigation

Proximity Forest (Using only DTW measure)

An effective and scalable distance-based classifier for time series classification. This repostitory contains the source code for the time series classification algorithm Proximity Forest, published in the paper [https://arxiv.org/abs/1808.10594]

Check out the original project developed in Java: https://github.com/fpetitjean/ProximityForest

Proximity Forests in Python using dtai DTW distance measure

This is the Proximity Forest algorithm implemented to classify Time Series.

Requirements:

  • You can only run this project if you have a UNIX-based OS (Linux or OSX)
  • You must have python3.7
  • Give full permission to your project directory and files
  • We recommend you locate your project in your $HOME directory.

Packages to Install Previously:

  • pip install dtaidistance or pip3 install dtaidistance
  • pip install -v --force-reinstall --no-deps --no-binary dtaidistance dtaidistance
  • pip install scipy or pip3 install scipy

How to run a simple execution:

  • Download the project

  • Give full permissions to your project

  • Go to the directory launchers

  • Execute the following command:

    • sh TermLauncherParam.sh [DATASET_NAME], where [DATASET_NAME] is the name of the dataset you have stored in the project. The Names available are:
      • ItalyPowerDemand
      • MoteStrain
      • Plane
      • GesturePebbleZ2
  • Try this example: - sh TermLauncherParam.sh ItalyPowerDemand

Parameters of python execution:

  • Run the sh file application/TermLauncher.sh setting the following parameters:
  • name: The name of the experiment
  • -train: The training dataset path. It must have a .ts or .arff extension.
  • -test: The testing dataset path. It must have a .ts or .arff extension.
  • The training and testing datasets can be found in the folder datasets/
  • -repeat: Number of repeats of the experiment
  • -trees: Number of trees of the PForest
  • -candidates: Number of candidates per tree
  • -targetlast: It indicates if the last column of the dataset is the class of the serie.
    • P.e. if the line is 0.32, 0.45, ..., -0.12, X_AXIS, setting -targetlast=True means the X_AXIS is the series class and it's located in the last column

Comparison with Sktime.ProximityForest

sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks. You can find the package through this URL: https://sktime.org

If you want to compare this Proximity Forest Project to the sktime project:

  • Run the sh file launchers/launchersktime.sh, which can be found in the launchers directory, setting the following arguments:
    • name: The name of the dataset. The Names available are:
      • ItalyPowerDemand
      • MoteStrain
      • Plane
      • GesturePebbleZ2
      • Chinatown
      • ERing
      • FacesUCR
      • FreezerRegularTrain
      • SmoothSubspace
      • ElectricDevices
    • trees: The number of trees for the algorithm
    • candidates: Number of candidates per split
  • Example: sh launchersktime.sh Plane 20 3

Installing Package

You can install the package using the command:

pip install Pforests-dtw

Nevertheless, you must have installed the following packages:

  • numpy
  • dtaidistance
  • pytest
  • scipy

Example

from trees import ProximityForest
from core import FileReader
import random
train_dataset = FileReader.FileReader.load_arff_data("/Users/moradisten/Projects/PForests/datasets/Plane/Plane_TRAIN.arff")
test_dataset = FileReader.FileReader.load_arff_data("/Users/moradisten/Projects/PForests/datasets/Plane/Plane_TEST.arff")
Pforest = ProximityForest.ProximityForest(1, n_trees=100, n_candidates=5)
Pforest.train(train_dataset)
results = Pforest.test(test_dataset)
print(results.accuracy)

Data Structure: ListDataset

Normally, in Data Science, we are used to handle datasets with Pandas. But for these case, as researchers did, a data structure has been developed, which is the ListDataset. This data structure contains:

  • series_data: Contains a list of the time series (X).
  • classes: Contains a list of the labels (y).
  • class_counter: Dictionary which contains the count of each label -> <label, label count>
  • series_map: Map which indicates the number of series per label -> <label, no series>

File Format

This project reads csv and arff files.

My Bachelor Thesis

If you speak Spanish, I invite you to read my bachelor's thesis about this matter

https://figshare.com/articles/thesis/Distance-based_Time_Series_Classifiers/13269005

About

New scalable Time Series classification developed in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /