Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

alezanga/dataMole

Repository files navigation

DataMole

A Qt-based graphical tool for processing and exploring tabular datasets (CSV) in machine learning and data analysis projects.

Note that DataMole is still experimental and under development.

Table of Contents

Features

  • Import and export CSV dataset
  • Apply transformations using the graphical interface:
    • Fill missing values
    • Replace values
    • Discretize numeric columns or datetimes attributes
    • Scale columns
    • One-hot encode
    • Add/remove columns
    • Join two tables
    • Convert types (e.g. numeric -> string)
    • Extract time series information from longitudinal datasets
  • Draw scatterplots and line charts for time series
  • Get data statistics (histogram, mean, std, etc.)
  • Create pipelines of transformations and execute them
  • Import and export pipelines (in pickle)

Screenshots

DataMole main window Scatterplot matrix on iris dataset Preprocessing pipeline

Installation

  1. Install Python >= 3.8.0
  2. Open a terminal. On Windows 10 use the Windows PowerShell
  3. Create a virtualenv (*):
    • Install virtualenv: python -m pip install virtualenv
    • Move in the main dataMole folder (the one with main.py)
    • Create a virtualenv: python -m virtualenv venv
    • Activate it: source ./venv/bin/activate (.\venv\Scripts\Activate.ps1 on Windows)
  4. With the active virtualenv, install dependencies: python -m pip install -r requirements.txt
  5. Generate Qt resources: make resources (**)
  6. Start software with python main.py

(*) Of course you can just use the global Python installation if you are ok with that (not recommended)

(**) On Windows make command does not work, so the command to give at step 5 is:

  • pyside2-rcc dataMole/resources.qrc -o dataMole/qt_resources.py

This will generate a new file qt_resources.py.

Software usage

Refer to the user manual in docs/manuals

Developer info

In addition to the packages listed in requirements.txt you may want to install the ones listed in requirements.dev.txt.

Refer to the developer manual in docs/manuals for information about software architecture and on how to extend it.

Manuals .tex source files are in docs/manuals/source folder.

Building documentation with Sphinx

  1. Move into the docs folder
  2. Generate stubs (rst) if required: make stubs
  3. Generate documentation:
    • make html for html documentation
    • make pdf to convert the documentation in a single pdf file

The output will be found in the auto/build directory.

Use make clean to remove the build directory.

Credits

Here is a list of third party software used within this project.

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /