25
25
class TabularClassificationTask (BaseTask ):
26
26
"""
27
27
Tabular Classification API to the pipelines.
28
+
28
29
Args:
29
- seed (int), (default=1): seed to be used for reproducibility.
30
- n_jobs (int), (default=1): number of consecutive processes to spawn.
31
- n_threads (int), (default=1):
30
+ seed (int: default=1):
31
+ seed to be used for reproducibility.
32
+ n_jobs (int: default=1):
33
+ number of consecutive processes to spawn.
34
+ n_threads (int: default=1):
32
35
number of threads to use for each process.
33
36
logging_config (Optional[Dict]):
34
- specifies configuration for logging, if None, it is loaded from the logging.yaml
35
- ensemble_size (int), ( default=50):
37
+ Specifies configuration for logging, if None, it is loaded from the logging.yaml
38
+ ensemble_size (int: default=50):
36
39
Number of models added to the ensemble built by
37
40
Ensemble selection from libraries of models.
38
41
Models are drawn with replacement.
39
- ensemble_nbest (int), ( default=50):
40
- only consider the ensemble_nbest
42
+ ensemble_nbest (int: default=50):
43
+ Only consider the ensemble_nbest
41
44
models to build the ensemble
42
- max_models_on_disc (int), (default=50):
43
- maximum number of models saved to disc.
44
- Also, controls the size of the ensemble as any additional models will be deleted.
45
+ max_models_on_disc (int: default=50):
46
+ Maximum number of models saved to disc.
47
+ Also, controls the size of the ensemble
48
+ as any additional models will be deleted.
45
49
Must be greater than or equal to 1.
46
50
temporary_directory (str):
47
- folder to store configuration output and log file
51
+ Folder to store configuration output and log file
48
52
output_directory (str):
49
- folder to store predictions for optional test set
53
+ Folder to store predictions for optional test set
50
54
delete_tmp_folder_after_terminate (bool):
51
- determines whether to delete the temporary directory, when finished
55
+ Determines whether to delete the temporary directory,
56
+ when finished
52
57
include_components (Optional[Dict]):
53
- If None, all possible components are used. Otherwise
54
- specifies set of components to use.
58
+ If None, all possible components are used.
59
+ Otherwise specifies set of components to use.
55
60
exclude_components (Optional[Dict]):
56
- If None, all possible components are used. Otherwise
57
- specifies set of components not to use. Incompatible
58
- with include components
61
+ If None, all possible components are used.
62
+ Otherwise specifies set of components not to use.
63
+ Incompatible with include components.
59
64
search_space_updates (Optional[HyperparameterSearchSpaceUpdates]):
60
65
search space updates that can be used to modify the search
61
66
space of particular components or choice modules of the pipeline
@@ -102,6 +107,16 @@ def __init__(
102
107
)
103
108
104
109
def build_pipeline (self , dataset_properties : Dict [str , Any ]) -> TabularClassificationPipeline :
110
+ """
111
+ Build pipeline according to current task and for the passed dataset properties
112
+
113
+ Args:
114
+ dataset_properties (Dict[str,Any])
115
+
116
+ Returns:
117
+ TabularClassificationPipeline:
118
+ Pipeline compatible with the given dataset properties.
119
+ """
105
120
return TabularClassificationPipeline (dataset_properties = dataset_properties )
106
121
107
122
def search (
@@ -143,38 +158,38 @@ def search(
143
158
budget_type (str):
144
159
Type of budget to be used when fitting the pipeline.
145
160
It can be one of:
146
- + ' epochs' : The training of each pipeline will be terminated after
147
- a number of epochs have passed. This number of epochs is determined by the
148
- budget argument of this method.
149
- + ' runtime' : The training of each pipeline will be terminated after
150
- a number of seconds have passed. This number of seconds is determined by the
151
- budget argument of this method. The overall fitting time of a pipeline is
152
- controlled by func_eval_time_limit_secs. 'runtime' only controls the allocated
153
- time to train a pipeline, but it does not consider the overall time it takes
154
- to create a pipeline (data loading and preprocessing, other i/o operations, etc.).
155
- budget_type will determine the units of min_budget/max_budget. If budget_type=='epochs'
156
- is used, min_budget will refer to epochs whereas if budget_type=='runtime' then
157
- min_budget will refer to seconds.
161
+ + ` epochs` : The training of each pipeline will be terminated after
162
+ a number of epochs have passed. This number of epochs is determined by the
163
+ budget argument of this method.
164
+ + ` runtime` : The training of each pipeline will be terminated after
165
+ a number of seconds have passed. This number of seconds is determined by the
166
+ budget argument of this method. The overall fitting time of a pipeline is
167
+ controlled by func_eval_time_limit_secs. 'runtime' only controls the allocated
168
+ time to train a pipeline, but it does not consider the overall time it takes
169
+ to create a pipeline (data loading and preprocessing, other i/o operations, etc.).
170
+ budget_type will determine the units of min_budget/max_budget. If budget_type=='epochs'
171
+ is used, min_budget will refer to epochs whereas if budget_type=='runtime' then
172
+ min_budget will refer to seconds.
158
173
min_budget (int):
159
- Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>_` to
174
+ Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>`_ to
160
175
trade-off resources between running many pipelines at min_budget and
161
176
running the top performing pipelines on max_budget.
162
177
min_budget states the minimum resource allocation a pipeline should have
163
178
so that we can compare and quickly discard bad performing models.
164
179
For example, if the budget_type is epochs, and min_budget=5, then we will
165
180
run every pipeline to a minimum of 5 epochs before performance comparison.
166
181
max_budget (int):
167
- Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>_` to
182
+ Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>`_ to
168
183
trade-off resources between running many pipelines at min_budget and
169
184
running the top performing pipelines on max_budget.
170
185
max_budget states the maximum resource allocation a pipeline is going to
171
186
be ran. For example, if the budget_type is epochs, and max_budget=50,
172
187
then the pipeline training will be terminated after 50 epochs.
173
- total_walltime_limit (int), ( default=100): Time limit
174
- in seconds for the search of appropriate models.
188
+ total_walltime_limit (int: default=100):
189
+ Time limit in seconds for the search of appropriate models.
175
190
By increasing this value, autopytorch has a higher
176
191
chance of finding better models.
177
- func_eval_time_limit_secs (int), (default=None ):
192
+ func_eval_time_limit_secs (Optional[ int] ):
178
193
Time limit for a single call to the machine learning model.
179
194
Model fitting will be terminated if the machine
180
195
learning algorithm runs over the time limit. Set
@@ -185,47 +200,54 @@ def search(
185
200
total_walltime_limit // 2 to allow enough time to fit
186
201
at least 2 individual machine learning algorithms.
187
202
Set to np.inf in case no time limit is desired.
188
- enable_traditional_pipeline (bool), ( default=True):
203
+ enable_traditional_pipeline (bool: default=True):
189
204
We fit traditional machine learning algorithms
190
205
(LightGBM, CatBoost, RandomForest, ExtraTrees, KNN, SVM)
191
- before building PyTorch Neural Networks. You can disable this
206
+ prior building PyTorch Neural Networks. You can disable this
192
207
feature by turning this flag to False. All machine learning
193
208
algorithms that are fitted during search() are considered for
194
209
ensemble building.
195
- memory_limit (Optional[int]), ( default=4096):
196
- Memory limit in MB for the machine learning algorithm. autopytorch
197
- will stop fitting the machine learning algorithm if it tries
198
- to allocate more than memory_limit MB. If None is provided,
199
- no memory limit is set. In case of multi-processing, memory_limit
200
- will be per job. This memory limit also applies to the ensemble
201
- creation process.
210
+ memory_limit (Optional[int]: default=4096):
211
+ Memory limit in MB for the machine learning algorithm.
212
+ Autopytorch will stop fitting the machine learning algorithm
213
+ if it tries to allocate more than memory_limit MB. If None
214
+ is provided, no memory limit is set. In case of multi-processing,
215
+ memory_limit will be per job. This memory limit also applies to
216
+ the ensemble creation process.
202
217
smac_scenario_args (Optional[Dict]):
203
218
Additional arguments inserted into the scenario of SMAC. See the
204
- [SMAC documentation] (https://automl.github.io/SMAC3/master/options.html?highlight=scenario#scenario)
219
+ `SMAC documentation <https://automl.github.io/SMAC3/master/options.html?highlight=scenario#scenario>`_
220
+ for a list of available arguments.
205
221
get_smac_object_callback (Optional[Callable]):
206
222
Callback function to create an object of class
207
- [ smac.optimizer.smbo.SMBO]( https://automl.github.io/SMAC3/master/apidoc/smac.optimizer.smbo.html) .
223
+ ` smac.optimizer.smbo.SMBO < https://automl.github.io/SMAC3/master/apidoc/smac.optimizer.smbo.html>`_ .
208
224
The function must accept the arguments scenario_dict,
209
225
instances, num_params, runhistory, seed and ta. This is
210
226
an advanced feature. Use only if you are familiar with
211
- [SMAC](https://automl.github.io/SMAC3/master/index.html).
212
- all_supported_metrics (bool), (default=True):
213
- if True, all metrics supporting current task will be calculated
227
+ `SMAC <https://automl.github.io/SMAC3/master/index.html>`_.
228
+ tae_func (Optional[Callable]):
229
+ TargetAlgorithm to be optimised. If None, `eval_function`
230
+ available in autoPyTorch/evaluation/train_evaluator is used.
231
+ Must be child class of AbstractEvaluator.
232
+ all_supported_metrics (bool: default=True):
233
+ If True, all metrics supporting current task will be calculated
214
234
for each pipeline and results will be available via cv_results
215
- precision (int), (default=32): Numeric precision used when loading
216
- ensemble data. Can be either '16', '32' or '64'.
235
+ precision (int: default=32):
236
+ Numeric precision used when loading ensemble data.
237
+ Can be either '16', '32' or '64'.
217
238
disable_file_output (Union[bool, List]):
218
- load_models (bool), ( default=True):
239
+ load_models (bool: default=True):
219
240
Whether to load the models after fitting AutoPyTorch.
220
- portfolio_selection (str), (default=None ):
241
+ portfolio_selection (Optional[ str] ):
221
242
This argument controls the initial configurations that
222
243
AutoPyTorch uses to warm start SMAC for hyperparameter
223
244
optimization. By default, no warm-starting happens.
224
245
The user can provide a path to a json file containing
225
246
configurations, similar to (...herepathtogreedy...).
226
247
Additionally, the keyword 'greedy' is supported,
227
248
which would use the default portfolio from
228
- `AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`
249
+ `AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`_.
250
+
229
251
Returns:
230
252
self
231
253
@@ -281,6 +303,16 @@ def predict(
281
303
batch_size : Optional [int ] = None ,
282
304
n_jobs : int = 1
283
305
) -> np .ndarray :
306
+ """Generate the estimator predictions.
307
+ Generate the predictions based on the given examples from the test set.
308
+
309
+ Args:
310
+ X_test (np.ndarray):
311
+ The test set examples.
312
+
313
+ Returns:
314
+ Array with estimator predictions.
315
+ """
284
316
if self .InputValidator is None or not self .InputValidator ._is_fitted :
285
317
raise ValueError ("predict() is only supported after calling search. Kindly call first "
286
318
"the estimator fit() method." )
0 commit comments