tfm.optimization.CosineDecayWithOffset

View source on GitHub

A LearningRateSchedule that uses a cosine decay with optional warmup.

Inherits From: base_lr_class

View aliases

Main aliases

tfm.optimization.lr_schedule.CosineDecayWithOffset

tfm.optimization.CosineDecayWithOffset(
 offset=0, **kwargs
)

See Loshchilov & Hutter, ICLR2016, SGDR: Stochastic Gradient Descent with Warm Restarts.

For the idea of a linear warmup of our learning rate, see Goyal et al..

When we begin training a model, we often want an initial increase in our learning rate followed by a decay. If warmup_target is an int, this schedule applies a linear increase per optimizer step to our learning rate from initial_learning_rate to warmup_target for a duration of warmup_steps. Afterwards, it applies a cosine decay function taking our learning rate from warmup_target to alpha for a duration of decay_steps. If warmup_target is None we skip warmup and our decay will take our learning rate from initial_learning_rate to alpha. It requires a step value to compute the learning rate. You can just pass a TensorFlow variable that you increment at each training step.

The schedule is a 1-arg callable that produces a warmup followed by a decayed learning rate when passed the current optimizer step. This can be useful for changing the learning rate value across different invocations of optimizer functions.

Our warmup is computed as:

defwarmup_learning_rate(step):
 completed_fraction = step / warmup_steps
 total_delta = target_warmup - initial_learning_rate
 return completed_fraction * total_delta

And our decay is computed as:

if warmup_target is None:
 initial_decay_lr = initial_learning_rate
else:
 initial_decay_lr = warmup_target
defdecayed_learning_rate(step):
 step = min(step, decay_steps)
 cosine_decay = 0.5 * (1 + cos(pi * step / decay_steps))
 decayed = (1 - alpha) * cosine_decay + alpha
 return initial_decay_lr * decayed

Example usage without warmup:

decay_steps = 1000
initial_learning_rate = 0.1
lr_decayed_fn = tf.keras.optimizers.schedules.CosineDecay(
 initial_learning_rate, decay_steps)

Example usage with warmup:

decay_steps = 1000
initial_learning_rate = 0
warmup_steps = 1000
target_learning_rate = 0.1
lr_warmup_decayed_fn = tf.keras.optimizers.schedules.CosineDecay(
 initial_learning_rate, decay_steps, warmup_target=target_learning_rate,
 warmup_steps=warmup_steps
)

You can pass this schedule directly into a tf.keras.optimizers.Optimizer as the learning rate. The learning rate schedule is also serializable and deserializable using tf.keras.optimizers.schedules.serialize and tf.keras.optimizers.schedules.deserialize.

Returns

A 1-arg callable learning rate schedule that takes the current optimizer step and outputs the decayed learning rate, a scalar Tensor of the same type as initial_learning_rate.

Child Classes

class base_lr_class

Methods

from_config

@classmethod
from_config(
 config
)

Instantiates a LearningRateSchedule from its config.

Args
config Output of get_config().

Returns
A LearningRateSchedule instance.

get_config

get_config()

__call__

View source

__call__(
 step
)

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.

Last updated 2024年02月02日 UTC.