GitHub - manujosephv/pytorch_tabular: A standard framework for modelling Deep Learning Models for tabular data

Name	Name	Last commit message	Last commit date
Latest commit History 592 Commits
.github	.github
docs	docs
examples	examples
requirements	requirements
src/pytorch_tabular	src/pytorch_tabular
tests	tests
.editorconfig	.editorconfig
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
.readthedocs.yml	.readthedocs.yml
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
Makefile	Makefile
README.md	README.md
mkdocs.yml	mkdocs.yml
pyproject.toml	pyproject.toml
setup.cfg	setup.cfg
setup.py	setup.py

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:

Low Resistance Usability
Easy Customization
Scalable and Easier to Deploy

It has been built on the shoulders of giants like PyTorch(obviously), and PyTorch Lightning.

Installation
Documentation
Available Models
Usage
Blogs
Citation

Installation

Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from here, picking up the right CUDA version for your machine.

Once, you have got Pytorch installed, just use:

pip install -U "pytorch_tabular[extra]"

to install the complete library with extra dependencies (Weights&Biases & Plotly).

And :

pip install -U "pytorch_tabular"

for the bare essentials.

The sources for pytorch_tabular can be downloaded from the Github repo_.

You can either clone the public repository:

git clone git://github.com/manujosephv/pytorch_tabular

Once you have a copy of the source, you can install it with:

cd pytorch_tabular && pip install .[extra]

Documentation

For complete Documentation with tutorials visit ReadTheDocs

Available Models

FeedForward Network with Category Embedding is a simple FF network, but with an Embedding layers for the categorical columns.
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets.
TabNet: Attentive Interpretable Tabular Learning is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output.
Mixture Density Networks is a regression model which uses gaussian components to approximate the target function and provide a probabilistic prediction out of the box.
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks is a model which tries to learn interactions between the features in an automated way and create a better representation and then use this representation in downstream task
TabTransformer is an adaptation of the Transformer model for Tabular Data which creates contextual representations for categorical features.
FT Transformer from Revisiting Deep Learning Models for Tabular Data
Gated Additive Tree Ensemble is a novel high-performance, parameter and computationally efficient deep learning architecture for tabular data. GATE uses a gating mechanism, inspired from GRU, as a feature representation learning unit with an in-built feature selection mechanism. We combine it with an ensemble of differentiable, non-linear decision trees, re-weighted with simple self-attention to predict our desired output.
Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) is pared-down version of GATE which is more efficient and performing than GATE. GANDALF makes GFLUs the main learning unit, also introducing some speed-ups in the process. With very minimal hyperparameters to tune, this becomes an easy to use and tune model.
DANETs: Deep Abstract Networks for Tabular Data Classification and Regression is a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.

Semi-Supervised Learning

Denoising AutoEncoder is an autoencoder which learns robust feature representation, to compensate any noise in the dataset.

Implement Custom Models

To implement new models, see the How to implement new models tutorial. It covers basic as well as advanced architectures.

Usage

from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
 DataConfig,
 OptimizerConfig,
 TrainerConfig,
 ExperimentConfig,
)
data_config = DataConfig(
 target=[
 "target"
 ], # target should always be a list.
 continuous_cols=num_col_names,
 categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
 auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
 batch_size=1024,
 max_epochs=100,
)
optimizer_config = OptimizerConfig()
model_config = CategoryEmbeddingModelConfig(
 task="classification",
 layers="1024-512-512", # Number of nodes in each layer
 activation="LeakyReLU", # Activation between each layers
 learning_rate=1e-3,
)
tabular_model = TabularModel(
 data_config=data_config,
 model_config=model_config,
 optimizer_config=optimizer_config,
 trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_model("examples/basic")

Blogs

Future Roadmap(Contributions are Welcome)

Integrate Optuna Hyperparameter Tuning
Migrate Datamodule to Polars or NVTabular for faster data loading and to handle larger than RAM datasets.
Add GaussRank as Feature Transformation
Have a scikit-learn compatible API
Enable support for multi-label classification
Keep adding more architectures

Contributors

manujosephv
_{Manu Joseph} Borda
_{Jirka Borovec} wsad1
_{Jinu Sunil} ProgramadorArtificial
_{Programador Artificial} sorenmacbeth
_{Soren Macbeth} ArozHada
_{Aroj Hada}

fonnesbeck
_{Chris Fonnesbeck} snehilchatterjee
_{Snehil Chatterjee} jxtrbtk
_Null abhisharsinha
_{Abhishar Sinha} ndrsfel
_Andreas charitarthchugh
_{Charitarth Chugh}

EeyoreLee
_Earlee JulianRein
_Null krshrimali
_{Kushashwa Ravi Shrimali} Actis92
_{Luca Actis Grosso} sgbaird
_{Sterling G. Baird} furyhawk
_{Teck Meng}

yinyunie
_{Yinyu Nie} YonyBresler
_YonyBresler HernandoR
_{Liu Zhen} enifeder
_enifeder taimo3810
_taimo

Citation

If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:

arxiv Paper

@misc{joseph2021pytorch,
 title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
 author={Manu Joseph},
 year={2021},
 eprint={2104.13638},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

Zenodo Software Citation

@software{manu_joseph_2023_7554473,
 author = {Manu Joseph and
 Jinu Sunil and
 Jiri Borovec and
 Chris Fonnesbeck and
 jxtrbtk and
 Andreas and
 JulianRein and
 Kushashwa Ravi Shrimali and
 Luca Actis Grosso and
 Sterling G. Baird and
 Yinyu Nie},
 title = {manujosephv/pytorch\_tabular: v1.0.1},
 month = jan,
 year = 2023,
 publisher = {Zenodo},
 version = {v1.0.1},
 doi = {10.5281/zenodo.7554473},
 url = {https://doi.org/10.5281/zenodo.7554473}
}

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

License

Uh oh!

manujosephv/pytorch_tabular

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Installation

Documentation

Available Models

Implement Custom Models

Usage

Blogs

Future Roadmap(Contributions are Welcome)

Contributors

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 26

Uh oh!

Languages

Uh oh!

License

manujosephv/pytorch_tabular

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Installation

Documentation

Available Models

Implement Custom Models

Usage

Blogs

Future Roadmap(Contributions are Welcome)

Contributors

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 26

Uh oh!

Languages

Packages