Name	Name	Last commit message	Last commit date
Latest commit History 23 Commits
stacker	stacker
test	test
.gitignore	.gitignore
.travis.yml	.travis.yml
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

stacker

Build Status Coverage Status

Stacker is a library for automating repetitive tasks in a Machine Learning process. It is specially thought for competitive Machine Learning.

Getting Started

Installing package

To install the package (as well as the requirements)

pip install git+https://github.com/bamine/stacker.git

Optimization

Let's start with a toy classification task

In [1]: from sklearn import datasets
In [2]: X, y = datasets.make_classification()

Next we need to define outTask instance, that describes the problem at hand and the evaluation method

In [3]: from stacker.optimization.task import Task
In [4]: task = Task(name="my_test_problem", X=X, y=y, task="classification", test_size=0.1, random_state=42)

We also need to define a scorer to evaluate the performance of our models, it will represent the function to minimize in our optimization

In [5]: from stacker.optimization.scorer import Scorer
In [6]: from sklearn.metrics import roc_auc_score
In [7]: scorer = Scorer(name="auc_error", scoring_function=lambda y_pred, y_true: 1 - roc_auc_score(y_pred, y_true))

That's it, that's all we need to start our optimization

In [8]: from stacker.optimization.optimizers import XGBoostOptimizer
In [9]: optimizer = XGBoostOptimizer(task=task, scorer=scorer)
In [10]: best = optimizer.start_optimization(max_evals=10)

We see optimization logs printing to the screen:

2016年10月30日 20:06:58,908 optimization INFO Started optimization for task Task=my_test_problem - type=classification - validation_method=train_test_split - len(X)=100
2016年10月30日 20:06:58,940 hyperopt.tpe INFO tpe_transform took 0.010630 seconds
2016年10月30日 20:06:58,940 hyperopt.tpe INFO TPE using 0 trials
2016年10月30日 20:06:58,944 optimization INFO Evaluating with test size 0.1 with parameters {'base_score': 0.15131750274218725, 'n_estimators': 1549, 'max_delta_step': 5.456151637154244, 'max_depth': 32, 'scale_pos_weight': 17.01641396909963, 'subsample': 0.5602243542512042, 'min_child_weight': 7.357276709287346, 'reg_lambda': 0.04892385490528424, 'colsample_bylevel': 0.9270255573832653, 'reg_alpha': 9.241575982835831, 'learning_rate': 0.334, 'gamma': 9.204287327296093, 'colsample_bytree': 0.7401493524610046}
2016年10月30日 20:06:58,944 optimization INFO Training model ...
2016年10月30日 20:06:59,445 optimization INFO Training model done !
2016年10月30日 20:06:59,449 optimization INFO Score = 0.5

After max_evals round of optimization, the variable best holds the best parameters for our model so far:

In [11]: best
Out[11]:
{'base_score': 0.11896730519023568,
 'colsample_bylevel': 0.7784020826729505,
 'colsample_bytree': 0.5406725444519914,
 'gamma': 2.8852788849467914,
 'learning_rate': 0.0458,
 'max_delta_step': 7.94440873152742,
 'max_depth': 40.54677208316948,
 'min_child_weight': 1.4917791224018282,
 'n_estimators': 4098.139372543198,
 'reg_alpha': 5.145060179230331,
 'reg_lambda': 4.275665709200242,
 'scale_pos_weight': 92.51593088136909,
 'subsample': 0.30804272389187076}

To see the performance of our best parameters:

In [16]: optimizer.score(best)
2016-10-30 20:09:58,918 optimization INFO Evaluating with test size 0.1 with parameters {'base_score': 0.11896730519023568, 'n_estimators': 4098, 'max_delta_step': 7.94440873152742, 'max_depth': 40, 'scale_pos_weight': 92.51593088136909, 'subsample': 0.30804272389187076, 'min_child_weight': 1.4917791224018282, 'reg_lambda': 4.275665709200242, 'colsample_bylevel': 0.7784020826729505, 'reg_alpha': 5.145060179230331, 'learning_rate': 0.0458, 'gamma': 2.8852788849467914, 'colsample_bytree': 0.5406725444519914}
2016-10-30 20:09:58,918 optimization INFO Training model ...
2016-10-30 20:10:00,227 optimization INFO Training model done !
2016-10-30 20:10:00,228 optimization INFO Score = 0.0
Out[16]: {'loss': 0.0, 'status': 'ok'}

Saving progress

File Logger

You can save the progress of the optimization process to flat files in json format, by creating a FileLogger object:

In [17]: from stacker.optimization.loggers import FileLogger
In [18]: file_logger = FileLogger(task=task)
In [19]: optimizer = XGBoostOptimizer(task=task, scorer=scorer, logger=file_logger)
In [20]: best = optimizer.start_optimization(max_evals=10)

The optimization progress logs are in a file called "tast.name".log which in out case is my_test_problem.log. Each line contains a json object which stores useful information.

{"scorer_name": "auc_error", 
"score": 0.0, 
"validation_method": "train_test_split", 
"predictions": [0.9586731195449829, 0.9586731195449829, ...], 
"model": "XGBModel(base_score=0.3547331680585909, ...)", 
"parameters": {"base_score": 0.3547331680585909, "colsample_bylevel": ...}, 
"task": "my_test_problem", 
"random_state": 42}

Predictions (on the holdout set for train-test splits and on each folds for cv validation) are also stored.

DB Logger

You can also store the optimization progress into a database. First we create an SQLAlchemy engine object.

In [21]: from sqlalchemy import create_engine
In [22]: engine = create_engine('sqlite:///test.db')

Then we create our DBLogger instance:

In [23]: from stacker.optimization.loggers import DBLogger
In [24]: db_logger = DBLogger(task=task, engine=engine)
In [25]: optimizer = XGBoostOptimizer(task=task, scorer=scorer, opt_logger=db_logger)
In [26]: best = optimizer.start_optimization(max_evals=10)

We inspect the contents of our database:

sqlite3 test.db
sqlite> .tables
my_test_problem_opt_table
sqlite> select * from my_test_problem_opt_table limit 1;
task|date_time|model|parameters|score|scorer_name|validation_method|predictions|random_state
my_test_problem|2016年10月30日 21:15:21.338861|XGBModel(base_score=0.4987515735616427, colsample_bylevel=0.9537358268329221,
 colsample_bytree=0.5540367970123756, gamma=3.9642149376691425,
 learning_rate=0.060700000000000004, max_delta_step=6.416649245434087,
 max_depth=39, min_child_weight=8.653862671384827, missing=None,
 n_estimators=2370, nthread=-1, objective='reg:linear',
 reg_alpha=17.94451568455059, reg_lambda=4.289513035748907,
 scale_pos_weight=48.14895822783563, seed=0, silent=True,
 subsample=0.9218555541314193)|{"base_score": 0.4987515735616427, ...

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

bamine/stacker

Folders and files

Latest commit

History

Repository files navigation

stacker

Getting Started

Installing package

Optimization

Saving progress

File Logger

DB Logger

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

bamine/stacker

Folders and files

Latest commit

History

Repository files navigation

stacker

Getting Started

Installing package

Optimization

Saving progress

File Logger

DB Logger

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages