Binder Chat on Discourse Chat on Slack
Andrew Ng stated, "applied ML is basically just feature engineering." In data science and ML, the most important, but oftentimes most overlooked, piece of the puzzle is feature engineering.
At Rasgo, we are data scientists on the mission to enable the global data science community to generate valuable and trusted insights from data in under 5 minutes. As we have marched forward on this mission, we’ve grown incredibly frustrated in the lack of helpful content and python functions that target feature engineering. We wrestle with these problems everyday and we wanted to provide a repository of recipes that showcase how to use the best tools available in this space. Additionally, we’ve built our own SDK (PyRasgo) for feature engineering that enables users to automatically track, visualize, and evaluate their feature engineering experiments to make more accurate and explainable feature engineering decisions.
In that vein, this repository contains tutorials and code to enable data scientists to easily create new ML features and evaluate their importance for supervised machine learning. We sincerely hope this is helpful and please leave comments with any questions on what we can do to improve!
Please join us on the
- Rasgo Forum for questions about these recipies and PyRasgo.
- Rasgo User Group Slack to join our community.
- Video Tutorials on YouTube (Coming Soon)
- Feature Profiling
- Data Cleaning
- Missing Data
- Duplicate Data
- Data Type Mismatch
- Date Gaps in Time Series
- Feature Transformation
- Time-series
- Lag: Open In Colab Render in nbviewer Binder
- Moving Average: Open In Colab Render in nbviewer Binder
- Weekly Resampled Aggregation: Open In Colab Render in nbviewer Binder
- Weekly Rolling Aggregation: Open In Colab Render in nbviewer Binder
- Velocity and Acceleration: Open In Colab Render in nbviewer Binder
- Energy: Open In Colab Render in nbviewer Binder
- Mean Difference: Open In Colab Render in nbviewer Binder
- Mean Absolute Difference: Open In Colab Render in nbviewer Binder
- tsfresh: Open In Colab Render in nbviewer Binder
- Categorical
- Numerical
- Time-series
- Model Selection
- Train-Test Split
- Time Series Split
- Train-Test Split: Open In Colab Render in nbviewer Binder
- K-Fold or Cross-Validation
- Model Comparison
- Model Training
- Model Metrics
- Binary Classification
- Regression
- Train-Test Split
- Feature Importance
- Feature Selection
- Model Agnostic
- Low Variance: Open In Colab Render in nbviewer Binder
- Univariate Feature Selection
- Model Based
- Lasso-based Selection (Coming soon)
- Feature Importance
- Sequential Feature Selection
- Forward Stepwise Selection (Coming soon)
- Backwards Stepwise Selection
- Model Agnostic