π We have just released PyTorch v1.2.0.
π It has over 1,900 commits and contains a significant amount of effort in areas spanning JIT, ONNX, Distributed, as well as Performance and Eager Frontend Improvements.
π Version 1.2 includes a new, easier-to-use API for converting nn.Modules into ScriptModules. A sample usage is:
class MyModule(torch.nn.Module):
...
# Construct an nn.Module instance
module = MyModule(args)
# Pass it to `torch.jit.script` to compile it into a ScriptModule.
my_torchscript_module = torch.jit.script(module)
π torch.jit.script() will attempt to recursively compile the given nn.Module, including any submodules or methods called from forward(). See the migration guide for more info on what's changed and how to migrate.
π In 1.2, TorchScript has significantly improved its support for Python language constructs and Python's standard library. Highlights include:
for..in loops, zip(), and enumerate().NamedTuples.math and string library support.π See the detailed notes below for more information.
β In PyTorch 1.2, working with Microsoft, weβve added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). Weβve have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:
Updated docs can be found here and also a refreshed tutorial using ONNXRuntime can be found here.
Read the documentation or simply type from torch.utils.tensorboard import SummaryWriter to get started!
We include a standard nn.Transformer module, based on the paper "Attention is All You Need". The nn.Transformer module relies entirely on an attention mechanism to draw global dependencies between input and output. The individual components of the nn.Transformer module are designed so they can be adopted independently. For example, the nn.TransformerEncoder can be used by itself, without the larger nn.Transformer. New APIs include:
nn.Transformernn.TransformerEncoder and nn.TransformerEncoderLayernn.TransformerDecoder and nn.TransformerDecoderLayerπ See the Transformer Layers documentation for more info.
lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=) ) return dtype has changed from torch.uint8 to torch.bool (21113)π Version 1.1:
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([1, 0, 0], dtype=torch.uint8)
π Version 1.2:
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([True, False, False])
For most programs, we don't expect that any changes will need to be made as a result of this change. There are a couple of possible exceptions listed below.
Mask Inversion
π In prior versions of PyTorch, the idiomatic way to invert a mask was to call 1 - mask. This behavior is no longer supported; use the ~ or bitwise_not() operator instead.
π Version 1.1:
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([0, 1, 1], dtype=torch.uint8)
π Version 1.2:
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported.
If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.
>>> ~(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([False, True, True])
sum(Tensor) (python built-in) does not upcast dtype like torch.sum
Python's built-in sum returns results in the same dtype as the tensor itself, so it will not return the expected result if the value of the sum cannot be represented in the dtype of the tensor.
π Version 1.1:
# value can be represented in result dtype
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(3, dtype=torch.uint8)
# value can NOT be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(44, dtype=torch.uint8)
# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)
π Version 1.2:
# value cannot be represented in result dtype (now torch.bool)
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(True)
# value cannot be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(True)
# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)
TLDR : use torch.sum instead of the built-in sum. Note that the built-in sum() behavior will more closely resemble torch.sum in the next release.
π Note also that masking via torch.uint8 Tensors is now deprecated, see the Deprecations section for more information.
__invert__ / ~: now calls torch.bitwise_not instead of 1 - tensor and is supported for all integral+Boolean dtypes instead of only torch.uint8. (22326)π Version 1.1:
>>> ~torch.arange(8, dtype=torch.uint8)
tensor([1, 0, 255, 254, 253, 252, 251, 250], dtype=torch.uint8)
π Version 1.2:
>>> ~torch.arange(8, dtype=torch.uint8)
tensor([255, 254, 253, 252, 251, 250, 249, 248], dtype=torch.uint8)
torch.tensor(bool) and torch.as_tensor(bool) now infer torch.bool dtype instead of torch.uint8. (19097)π Version 1.1:
>>> torch.tensor([True, False])
tensor([1, 0], dtype=torch.uint8)
π Version 1.2:
>>> torch.tensor([True, False])
tensor([True, False])
nn.BatchNorm{1,2,3}D: gamma (weight) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)π Version 1.1:
>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([0.1635, 0.7512, 0.4130, 0.6875, 0.5496],
requires_grad=True)
π Version 1.2:
>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([1., 1., 1., 1., 1.], requires_grad=True)
π | Removed | Use Instead |
| --- | --- |
| btrifact | lu |
| btrifact_with_info | lu with get_infos=True |
| btrisolve | lu_solve |
| btriunpack | lu_unpack |
| gesv | solve |
| pstrf | cholesky |
| potrf | cholesky |
| potri | cholesky_inverse |
| potrs | cholesky_solve |
| trtrs | triangular_solve |
.data is no longer supported. (17072)>>> x = torch.randn(2,3)
>>> x.data = torch.sparse_coo_tensor((2, 3))
RuntimeError: Attempted to call `variable.set_data(tensor)`,
but `variable` and `tensor` have incompatible tensor type.
π Version 1.1:
>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)
>>> s.coalesce().indices().shape
torch.Size([1, 1])
>>> s.coalesce().values().shape
torch.Size([1])
π Notice indices() and values() reflect the resized tensor shapes.
π Version 1.2:
>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)
>>> s.coalesce().indices().shape
torch.Size([1, 2])
>>> s.coalesce().values().shape
torch.Size([2])
π Notice indices() and values() reflect the original tensor shapes.
.grad will no longer retain Python object identity. (17072)π Version 1.1:
>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad
# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved still refers to the .grad of m's weight
# even though the sparsity has changed
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)
π Version 1.2:
>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad
# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved NO LONGER refers to the .grad of m's weight
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)
AssertionError
nn.utils.convert_sync_batchnorm has been replaced with nn.SyncBatchNorm.convert_sync_batchnorm(18787) Example of new usage:
>>> # Network with nn.BatchNorm layer
>>> module = torch.nn.Sequential(
>>> torch.nn.Linear(20, 100),
>>> torch.nn.BatchNorm1d(100)
>>> ).cuda()
>>> # creating process group (optional)
>>> process_group = torch.distributed.new_group(process_ids)
>>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)
torch.addcmul and torch.lerp operators enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.π Version 1.1:
>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
tensor([[0., 0., 0.],
[0., 0., 0.]])
π Version 1.2:
>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
RuntimeError: output with shape [1] doesn't match the broadcast shape [2, 3]
If you run into this error, please ensure the out parameter is of the correct output shape (post-broadcasting).
β‘οΈ PyTorchβs autograd system uses a version tracking mechanism to ensure that Tensors that are saved for backwards computations retain their correct values when the backward pass is computed (i.e. that they havenβt been updated in-place since they were saved). See In Place Correctness Checks in the docs for more information.
In PyTorch 1.2 we have enhanced the version tracking in a number of cases, which may flag issues that were not caught previously. There is now additional tracking through the Variable() constructor, the nn.Parameter() constructor, after setting .data, and via nn.Module._apply (internal API).
Track changes through Variable constructor:
>>> x = torch.ones(1, requires_grad=True)+1
>>> y = x*x
# do an in-place update through Variable constructor
>>> torch.autograd.Variable(x).add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0
instead.
Track changes on an nn.Parameter:
>>> x = torch.ones(1)
>>> p = torch.nn.Parameter(x)
>>> y = p * p
# do an in-place update on a saved Parameter
>>> x.add_(1)
>>> y.sum().backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0
instead.
Track changes after setting .data:
>>> x = torch.zeros(1, requires_grad=True)+1
>>> y = x * x
>>> x.data = torch.zeros(1, requires_grad=True)+1
>>> x.add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]], which is output 0 of AddBackward0,
is at version 1; expected version 0 instead.
@ignoredπ torch.jit.script now recursively compiles everything it finds in the original function, so if you had Python functions called from in your scripted function or module, you must now explicitly @ignore it. See the new API guide for more details.
π Version 1.1
def my_unscriptable_python_fn():
# weird stuff
@torch.jit.script
def fn():
# This gets inserted as a Python call, and only errors on `save()`.
my_unscriptable_python_fn()
π Version 1.2
@torch.jit.ignore # this needs to be added ...
def my_unscriptable_python_fn():
...
@torch.jit.script
def fn():
# ... or else recursive compilation will attempt to compile this call
my_unscriptable_python_fn()
NOTE: This is also a change to behavior of the @torch.jit.ignore decorator. In version 1.1, @ignore tells the compiler to omit compiling a function entirely, to mark Python functions that you know will not be called after export. In version 1.2 @ignore, tells the compiler to insert a call back to the Python interpreter instead of trying to compile the function.
To get the old behavior, use @torch.jit.ignore(drop_on_export=True) (@torch.jit.ignore with no arguments is equivalent to @torch.jit.ignore(drop_on_export=False)).
optimize for ScriptModules is now a context managerπ Whether optimization passes are run is now a thread-local flag. This better reflects how optimization actually happens in the JIT (i.e. it is decided at runtime, not compilation time).
π Version 1.1
@torch.jit.script(optimize=False)
def fn(inputs):
...
fn(inputs)
π Version 1.2
@torch.jit.script
def fn(inputs):
...
with @torch.jit.optimized_execution(False):
fn(inputs)
script::Module is now a reference typeβ‘οΈ To better align with the PyTorch C++ API philosophy, script::Module and script::Method are now reference types. Our APIs have been updated to use script::Module instead of std::shared_ptr<script::Module>.
π Version 1.1
using torch::jit::script::Module;
std::shared_ptr<Module> m = torch::jit::load("my_model.py");
m->forward(...);
π Version 1.2
using torch::jit::script::Module;
Module m = torch::jit::load("my_model.py");
m.forward(...);
π Version 1.1 API:
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
π Version 1.2 API:
Tensor sum(IntArrayRef dim, bool keepdim=false,
c10::optional<ScalarType> dtype=c10::nullopt) const;
that is, to override dtype, keepdim must now be provided.
β‘οΈ We have streamlined our conda and wheel binary distributions, so that it is easier than ever to install the version of PyTorch appropriate for your needs. The install instructions on https://pytorch.org/ have been updated, but if you have tooling to download and install PyTorch, here is a detailed description of the changes we made:
Wheels now have local version identifiers. Wheels that are for non-default CUDA configurations (the default CUDA version for this release is 10.0) now have local version identifiers like +cpu and +cu92. This means that, when installing, it is no longer necessary to specify a full wheel URLβjust specify an appropriate version constraint like torch==1.2.0+cu92.
π Version 1.1 (for Python 3.7 on Linux only):
pip install numpy
pip install https://download.pytorch.org/whl/cpu/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
π Version 1.2 (works for all versions of Python, and both Linux and Mac):
pip install torch==1.2.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
CPU-only binaries on conda can be selected with the cpuonly feature. Weβve eliminated the pytorch-cpu conda package; instead, the cpu-only conda package can be enabled by installing the cpuonly metapackage. Similarly, there is no longer both a torchvision and torchvision-cpu package; the feature will ensure that the CPU version of torchvision is selected.
π Version 1.1:
conda install -c pytorch pytorch-cpu
π Version 1.2:
conda install -c pytorch pytorch cpuonly
Conda nightlies now live in the pytorch-nightly channel and no longer have "-nightly" in their name. We have added a new dedicated channel for nightlies called pytorch-nightly; all nightlies (pytorch, torchvision, torchaudio, etc.) will now be uploaded to this channel, but with the same name as their corresponding stable versions (unlike before, when we had a separate pytorch-nightly, torchvision-nightly, etc. packages.) This makes it more difficult to accidentally install a copy of the nightly and stable at the same time.
π Version 1.1:
conda install -c pytorch pytorch-nightly
π Version 1.2:
conda install -c pytorch-nightly pytorch
Wheel nightlies no longer have -nightly in their name. Similar to the changes we made in Conda, we no longer suffix wheel nightlies with "-nightly", to make it harder to accidentally install a copy of nightly and stable at the same time.
π Version 1.1:
pip install --pre torch_nightly -f https://download.pytorch.org/whl/nightly/torch_nightly.html
π Version 1.2:
pip install --pre torch -f https://download.pytorch.org/whl/nightly/torch_nightly.html
torch.bool: added support for many operators (masking, comparison, arithmetic operators) to achieve feature parity with torch.uint8. See the Breaking Changes section for details about how this could affect existing programs. (21032, etc.)torch.sparse.HalfTensor: Added support for torch.float16 sparse Tensors on both CPU and CUDA. (19695)torch.bfloat16: Added basic creation and serialization support for Brain Floating Point Tensors. (21522, 21523, 21860, 22852)nn.Transformer: added implementation of Transformer from Attention is All You Need. (20170, 22588)nn.Embedding: support float16 embeddings on CUDA. (19695)nn.Flatten: added a Module that performs torch.flatten. (22245)nn.functional.gelu: Added support for Gaussian Error Linear Units. (20665, 21237)nn.Module hooks: add ability to replace input/output via forward_pre_hook and forward_hook. (22285)nn.Module: add requires_grad_()method for turning on/off requires_grad for Module parameters. (22576)Tensor.to_sparse: now supports autograd. (20458)Tensor.fill_diagonal_: operator to fill the main diagonal of a Tensor. (21892)torch.qr: supports autograd. (21274)torch.bitwise_not: add operator for boolean/integer types. Also have python ~ operator use this. (22283, 22320)torch.trapz: integrate using the trapezoid rule; equivalent to numpy.trapz. (21610)torch.var_mean / torch.std_mean: compute variance and mean at the same time.(18731)torch.utils.ThroughputBenchmark: benchmark utility for measuring the throughput of PyTorch operators. (20766).Logging: lightweight at-most-once logging to record operators that are used (c10::Logging). (20745)optim.AdamW: introduce AdamW optimizer from Decoupled Weight Decay Regularization. (21250)optim.LBFGS: added support for strong Wolfe line search. (8824)DistributedDataParallel: support CPU modules. (20236)DistributedDataParallel: support sparse tensors. (19146)DistributedDataParallel: support local gradient accumulation. (21736)IterableDataset: introduces a new type of Dataset designed for data read from a stream. (19228)SummaryWriter.flush: now supported. (20607)SummaryWriter.add_mesh: add support for 3D point clouds. (20413)List, Tuple, Dict, Tensor, String and you can also use zip(), enumerate(), and for...in. (21801, 22006, 21990, 21985)in membership checks. (21527)math support. (20979, 19707, 21151, 21131, 21129, 21130, 21512, 21126, 21127, 21128)NamedTuple. (21428)dict methods. (21979)sorted() keyword for lists and dicts. (23274)torch::List, torch::Dict and torch::Optional, supports dispatch (i.e. registering a different function for CPU and CUDA for the same operator).nn.GRU in script. (23266)pack_padded_sequence and pad_packed_sequence. (23249)torch._C._get_tracing_state in TorchScript. (23248)torch.as_tensor in TorchScript. (23247)Modules. (20708)all builtin. (20521)Final[T] annotated members to __constants__. (21603)save() to scripted Functions. (20386)Constant node. (22007)torch.jit.annotate(). (21390)ModuleList / Sequential. (21306)Module. (19905)Tensor.pin_memory(): only ask for context on current device. (22229)Tensor.view(): suggest using reshape() instead of contiguous() when the input is non-contiguous. (20968)Tensor.numpy(): throw TypeError instead of ValueError if the type isnβt supported. (21608)torch.norm: add support for p="nuc" with dim specified. (21022)torch.qr: support batching of input matrices. (20689)torch.qr: support some parameter akin to NumPy's mode option. (20689)torch.det / torch.logdet / torch.slogdet: added batching support. (22909)torch.cdist: support batching. (20934)torch.symeig: support batching. (21858)torch._dirichlet_grad: support CUDA. (21191)torch.randperm: support torch.float16. (22102)torch.Size is now pickle-able in Python2. (20952)torch.tensor / torch.as_tensor: infer device if input supports Numbaβs __cuda_array_interface__. (20584)torch.isinf / torch.isfinite: throw TypeError instead of ValueError when a non-tensor is passed in. (20817)nn.MultiheadedAttention: add functional support. (20415)nn.MultiheadedAttention: added support for key/value to have different number of features. (21288)nn.MultiheadAttention: allow static key/values. (21288)nn.Conv{1,2,3}D: support torch.int64 dtype in forward. (20730, 22594)nn.AvgPool{1,2,3}D: support torch.int64 dtype in forward. (22433)nn.Module: make _save_to_state_dict overrideable. (21933)autograd: Checkpointing of modules inside large fanout networks no longer hits a recursion error. (22397)autograd: Track in-pace changes of Tensors through Module._apply (internal API). (21865)autograd.profiler: Add shape aggregation support. 20035)autograd.profiler: Profile custom c10 ops. (20175)DataLoader: support setting batch_size=0 to disable automatic batching (collation) in DataLoader for easier bulk loading. (19228)DataLoader: add multiprocessing_context parameter. (22990)DataLoader: added error detection for worker_init_fn. (20150)DataLoader: Retry on EINTR. (21723)torch.cuda.set_rng_state / torch.cuda.get_rng_state: accept string as device parameter. (23448)CUDA: add warning when using Turing GPUs and CUDA <= 9000. (21468)CUDA: warn on conditions that can trigger a cuBLAS 9.0 bug. (22034)CPU: Improve CPUAllocator OOM message. (20618)[memory_format]: added support for torch.empty, torch.empty_like, Tensor.contiguous(), Tensor.is_contiguous() to specify / check the order in which dimensions are laid out in memory. (20455, 20558)distributions.MultivariateNormal: fix precision matrix instability. (21366)distributions.transforms.SigmoidTransform: fix numerical instability. (19802)DistributedDataParallel: Support DDP forward/backward calls even if no module parameter is used. (19821)DistributedDataParallel: Only call into reducer if grad is enabled. (19897)DistributedDataParallel: Require finalize DDP backward only when there are indeed gradients computed, this allows application to completely discard DDP outputs and move on to the next iteration. (19901)DistributedDataParallel: Improve DDP backward reduction error messages. (20586)DistributedDataParallel: make DDP failure recoverable. (21591)DistributedDataParallel: Delay reduction of unused parameters until first autograd hook is called. (22219)c10d: support tensors shared across processes. (21449)c10d: ProcessGroupMPI Add device guard around MPI operations. (22446)utils.data.distributed.DistributedSampler: Make shuffling optional. (22479)Tensor.T: added numpy-like support for reversing dimensions. (20598)Tensor.ndim: NumPy equivalent property for the number of dimensions. (20565)Tensor.nonzero: added as_tuple argument (default False) that when True, will return a tuple of Tensors, which matches the behavior of numpy.nonzero. (20293)torch.dtype: support passing in NumPy dtypes as arguments. (21215)torch.normal: add size parameter when called with two floats. (20545)torch.where: add one-argument overload that is an alias for Numpy-like nonzero. (21986)axis instead of dim. (20451)start and step parameters for range in TorchScript. (20795)max_pool2d to symbolic derivatives. (19661)matmul memory usage for certain cases. (23433)__init__ function. (21880)ScriptModule buffer attributes can also cast device/type. (19700)ScriptModule.training an attribute instead of a parameter. (21078)strtod_c compatible with different gcc abi. (21293)nn::PoissonNLLLoss: Added support. (19316)nn::Module: added replace_module API to overwrite submodules in C++ Frontend. (22546)nn:Module::register_module / register_parameter / register_buffer: make public (23196)data::datasets::ChunkDataReader: fix include headers and a vector issue. (19485)data::datasets::ChunkDataset: add new get_batch method. (21797)data::datasets::ChunkDataset: add checkpoint support. (21889)data::datasets::ChunkDataset: add support for cross-chunk shuffling. (22347)data::datasets::ChunkDataset: add sorting policy. (23053)β Add support for a number of operators on MKLDNN Tensors including:
Tensor.is_mkldnn: (22386)Tensor.transpose(): (21943)Tensor.zero_(): (20573)torch.empty: (21184)torch.mul: (20575)nn.AdaptiveAvgPool{1,2,3}D: (19818)nn.Sigmoid: (20820)nn.Softmax: (21516)nn.Module: support saving/loading MKLDNN modules. (20799)nn.MaxPool{1,2,3}D: support ceil_mode. (21310)Tensor.index_copy_: fix segfault by properly checking dimension is in range. (21617)Tensor.copy_: Fix a bug where non-blocking was not being respected. (20305)Tensor.clone: Fix an issue with MKLDNN tensors. (20943)torch.cat: Fix segfault with tensors that can't be indexed with 32-bit ints. (21530)torch.range / torch.linspace / torch.logspace: properly respect the current Stream. (21619)torch.lu: return the identity permutation instead of zeros when not using pivoting. (22242)torch.einsum: Fix an issue where the backward pass would potentially be skipped. (22111)torch.cosh: Fix an issue where torch.cos was instead calculated with torch.double dtype and vectorized instructions. (20797)torch.triu / torch.tril: handle strides correctly for in-place versions. (22730).torch.triu / torch.tril: Fix handling of batches > 65535 on CUDA. (21067)torch.inverse / torch.solve / torch.cholesky_solve / torch.triangular_solve: Fix batch sizes > 65535 on CUDA. (21689)torch.histc: return dtype is now the same as the input tensor on CUDA, matching CPU behavior. (20369)torch.histc: properly return 1-dim tensor on CPU with 0-dim input and 1 bin. (21497)torch.randperm: handle non-contiguous out parameter. (23043)torch.unique: Fix empty tensor handling when dim is passed as an argument. (19000)torch.min / torch.max: properly error on empty tensor inputs, as with CPU tensors. (19612).CUDA: fix launch parameters for reductions. (22827).torch.hub: fix an issue with find_module. (20782)autograd: Fix a number of custom autograd Function corner cases by inverting the relationship between PyFunction and THPFunction. (22983)autograd: give "Trying to backward through the graph a second time" error instead of internal assert when the buffers are a list of Tensors (with indexing). (21533)optim.lr_scheduler.CosineAnnealingLR: rename from CosineAnnealingLr. (23242)distributions.Binomial: Fix overflow of log_prob when logits is large. (20679)distributions.SigmoidTransform: Fix numerical issues that could result in inf / -inf return values. (20288)distributions.Categorical.sample: fix a view bug. (23328)CUDA: Give proper error message for bad cuda forks. (23322)pickle: Fix Unpickling error when loading multiple objects from a file. (20270)NCCL: Fix race condition. (23040)nn.Conv{1,2,3}D: fix memory leak on MKLDNN code path. (22392)nn.Conv{1,2,3}D: properly unpickle older pickled versions. (21687)nn.CTCLoss: fix backward on CUDA when 2d target tensor is larger than max_target_length. (20971)nn.CTCLoss: fix some numerical stability issues. (21392)nn.CTCLoss: disable buggy non-deterministic CudNN algorithm. (22977)nn.CTCLoss: fixed empty target handling. (21910, 23298)nn.SyncBatchNorm: fix syncing of running statistics when count size differs between GPUs. (22248)nn.SyncBatchNorm: retain requires_grad value when converting from nn.BatchNorm. (22569)nn.SyncBatchNorm: correctly handle process_group in convert_sync_batchnorm. (19240)nn.MultiheadedAttention: fix for torch.float16 dtype. (21658).nn.EmbeddingBag: fix NaN output when input is empty. (21400)nn.Dropout: fix python crash (with SIGFPE) when called on an empty cuda tensor. (20541)nn.MaxPool: fix output size calculation in some corner cases. (22304)nn.MaxPool: return valid indices if all entries are -inf. (23161)nn.Softmax: respect the current Stream. (22470)nn.LogSoftmax: fix numerical stability issues. (21672)nn.Module.load_state_dict: break ref cycle. (20397)nn.Module: fix loading in 32-bit environments. (20900)nn.utils.rnn.pack_padded_sequence: Fix segfault on empty tensors. (21461)nn.utils.spectral_norm: fix loading state_dict when strict=False. (22545)CudNN: Fix uninitialized PoolWindow on Windows. (22405)nn.parallel.DataParallel: fix error in no_grad mode. (21262)torch.distributed.all_gather: fix errors for views and aliases. (21490)c10d: fix collective communication errors on empty tensors. (20658)deepCopy also copies type information of lists, (23271)dictKeys and dictItems ops on typed dicts return typed lists. (23270)dict key type. (22231)builtin_function_or_method. (22935)__get_state__ to let a user know that ScriptModules can't be deep-copied at the moment.(20885)dropout derivative should respect the train flag. (20760)__constants__ for some nn modules. (21071)ScriptModule. __dir__ (). (22426)CompilationUnit::define. (21886)Graph::toString. (21370)NameError with PYTORCH_JIT=0. (20120)pow() bug on overloads. (20824)1 - x in C++ would cause the size of 1 to get hardcoded. (20932)None constants. (23029)_flat_weights bug. (21107)WeakIValueEq. (21891)list() not making a copy. (22093)Module::forward method. (21398)a += b for lists do an in place add. (21896)floor/ceil return ints. (21124)__file__ for torch.ops. (21888)nn::RNN: Fix assertions in bidirectional RNN. (22850).nn::MaxPool / nn::AvgPool: expand incomplete kernel size, as in Python. (22073, 22075)Optim: Fix memory leak when weight_decay is applied to Adam, Adagrad, RMSProp. (23125)Optim::SGD: fix memory leak with weight_decay. (23007)torch::autograd::Scatter / torch::autograd::Gather: Fix nullptr bug. (20286)torch::nn::parallel::data_parallel: fix gradient computation error. (20910)torch.uint8 Tensors is now deprecated in favor of masking via torch.bool Tensors.π₯ See the Breaking Changes section for more details about torch.bool Tensors and comparison operators.
torch.masked_select, torch.masked_fill, torch.masked_scatter now expect torch.bool masks rather than torch.uint8.
>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])
>>> a.masked_select(tensor([0, 1, 1], dtype=torch.uint8))
UserWarning: masked_select received a mask with dtype torch.uint8,
this behavior is now deprecated, please use a mask with dtype torch.bool instead.
tensor([2, 3])
# instead use torch.bool
>>> a.masked_select(tensor([False, True, True]))
tensor([2, 3])
Comparison operators with out= parameters now expect torch.bool dtype rather than torch.uint8.
>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])
>>> res = torch.empty_like(a, dtype=torch.uint8)
>>> torch.gt(a, b, out=res)
UserWarning: torch.gt received 'out' parameter with dtype torch.uint8, this behavior
is now deprecated, please use 'out' parameter with dtype torch.bool instead.
tensor([0, 1, 1], dtype=torch.uint8)
# instead use torch.bool
>>> res = torch.empty_like(a, dtype=torch.bool)
>>> torch.gt(a, b, out=res)
tensor([False, True, True])
autograd.Function (Function without static forward method) is now deprecated>>> class MyLegacyFunction(Function):
>>> def forward(self, x):
>>> return x
>>>
>>> def backward(self, grad_output):
>>> return grad_output
>>>
>>> MyLegacyFunction()(torch.randn((3,), requires_grad=True)
UserWarning: Legacy autograd function with non-static forward method is deprecated
and will be removed in 1.3. Please use new-style autograd function
with static forward method.
# instead use new-style Autograd Function
>>> class MyFunction(Function):
>>> @staticmethod
>>> def forward(ctx, x):
>>> return x
>>>
>>> @staticmethod
>>> def backward(ctx, grad_output):
>>> return grad_output
>>>
>>> MyFunction.apply(torch.randn((3,), requires_grad=True)
π See the torch.autograd.Function documentation for more details.
torch.gels: has been renamed to torch.lstsq; torch.gels will work for this release but is now deprecated. (23460)Tensor.copy_: increase broadcasting CUDA copy performance by 25%. (20685)torch.matmul: Optimize the case A.ndim <= 2 && B.ndim >= 3, shows up to 15x speed up. (20448)torch.bmm: Improve performance by up to 3x for small cases on CPU by applying TensorAccessor. (20266)torch.inverse: Move workspace query and allocation outside loop to improve performance by up to 5x. (20904)torch.topk: Optimize CPU perf using parallel and partial sort, up to 6x improvement. (22865)torch.cdist: Improve CPU perf by up to 10x for some cases. (20605)torch.normal: Move normal, normal_means, normal_stddevs, and normal_means_stddevs to ATen, increasing performance by up to 3x. (21287)torch.bernoulli: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop, increasing performance by up to 2x. (21300)torch.coalesce: Use _sparse_coo_tensor_unsafe in coalesce for up to 10x speedup. (21214)torch.sinh / torch.cosh: Parallelize and vectorize on CPU. (21115)torch.lerp: Vectorize on CPU. (22038)torch.eye: Parallelize on CPU. (21077)torch.randperm: Parallelize initialization in randperm on CPU. (21529)nn.Softmax: Add persistent CUDA kernels that increase performance 2-10x on small inputs. (20827)nn.Embedding / nn.EmbeddingBag: Optimize CUDA kernel, increasing performance up to 2.7x. (22016)nn.Linear: optimize BERT model perf by using mkldnn inner product. (21851)nn.Conv{1,2,3}D: improve perf for depthwise convolutions in torch.float16 on Volta and Turing GPUs. (22302)nn.RNN: optimize on CPU by fusing matmul ops. (22512)nn.Upsample: a number of significant perf improvements on CUDA. (21879, 21694).nn.functional.layer_norm: optimize a fast path for layer_norm, increasing perf by up to 4x on CPU. (20345, 20883)mkldnn inner product for nn.Linear() to improve BERT perf. (21851).torch.bool: doc the Boolean tensor type. (21601)torch.as_strided: add docs. (22842)torch.empty_strided: add docs. (23740)torch.lerp: clarify broadcasting requirements. (23268)torch.enable_grad / torch.no_grad / torch.set_grad_enable: clarify interaction between these features. (23310)torch.autograd.grad_mode: Document that no_grad is thread local. (21755)torch.multiprocessing: Explain refcounting of CUDA tensors. (19904)torch.Tensor: Add a warning about memory usage. (20801)torch.utils.data.Dataloader: Document RNG state consumption. (22540)torch.optim.lr_scheduler.CyclicLR: Clarify base_momentum and max_momentum. (20880).tensor.to. (20977)nn.functional / nn.init: Break up NN in docs so they load faster. (21291)nn.functional.conv{1,2,3}d: Remove padding_mode. (20891)nn.functional.upsample / nn.functional.interpolate: add note about overshooting with mode=βbicubicβ. (23321)nn.init.zeros_ / nn.init.ones_: add documentation. (23145)nn.MultiheadAttention: Add documentation for add_bias_kv, add_zero_attn, and attn_mask. (20071)nn.MultiheadAttention: Fix documentation for attention mask shape. (20850)nn.Softmax: Fixed to specify dimension to prevent warning in 1.1.0. (20310 )ninja to build instructions. (20079)π In PyTorch 1.2, we have added the full support for ONNX Opset 7, 8, 9 and 10 in ONNX exporter, and we have also enhanced the constant folding pass to support Opset 10. The export of ScriptModule has better support. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export.
Dropout for Opset 10. (20710)Slice and Flip for Opset 10. (20533)Interpolate (Resize) for Opset 10. (21434)torch.arange. (22601)torch.masked_fill. (22521)torch.floor, torch.ceil, torch.log2 and prim::shape. (17895)torch._dim_arange. (20078)torch.randn_like. (20093)torch._standard_gamma. (20126)torch.topk. (21104)__and__, __or__. (17894)torch.sign. (20470)torch.scatter. (18543)torch.rand. (20559)torch.gather. (21235)torch.cosine_similarity. (21884)torch.sum. (22240)torch.logsumexp. (22306)torch.layer_norm. (22265)torch.min and torch.max with dim. (19689)maxpool with dilations. (18721)RNN with batch_first=True. (19766)Upsample with dynamic input. (20116)torch.full with scalar parameters. (21931)Slice in constant folding optimization. (21811)π Note: CUDA 8.0 is no longer supported
π First-class and native support for visualization and model debugging with TensorBoard, a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple from torch.utils.tensorboard import SummaryWriter command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs here.
π Attributes can be assigned on a ScriptModule by wrapping them with torch.jit.Attribute and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you call torch.jit.save(), so they are a great way to store arbitrary state in your model. See the docs for more info.
Example:
class Foo(torch.jit.ScriptModule):
def __init__ (self, a_dict):
super(Foo, self). __init__ (False)
self.words = torch.jit.Attribute([], List[str])
self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int])
@torch.jit.script_method
def forward(self, input: str) -> int:
self.words.append(input)
return self.some_dict[input]
π TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and for...in constructs.
π For more complex stateful operations, TorchScript now supports annotating a class with @torch.jit.script. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See the docs for more info.
@torch.jit.script
class Pair:
def __init__ (self, first, second)
self.first = first
self.second = second
def sum(self):
return self.first + self.second
nn.parallel.DistributedDataParallel: can now wrap multi-GPU modules, which enables use cases such as model parallel (tutorial) on one server and data parallel (tutorial) across servers.
(19271).
Tensor.set_: the device of a Tensor can no longer be changed via Tensor.set_. This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in a Storage on a different CUDA device. Instead, set up the Tensor on the correct device from the beginning. (18832).lr_scheduler.step(). (7889).torch.unique: changed the default value of sorted to True. (15379).Type no longer exist; use the functional or Tensor method equivalent. (17991).Backend constructor of TensorOptions no longer exists. (18137).ProcessGroup::getGroupRank has been removed. (19147).torch.tril_indices, torch.triu_indices: added operator with same behavior as NumPy. (14904, 15203).torch.combinations, torch.cartesian_prod: added new itertools-like operators. (9393).torch.repeat_interleave: new operator similar to numpy.repeat. (18395).torch.from_file: new operator similar to Storage.from_file, but returning a tensor. (18688).torch.unique_consecutive: new operator with semantics similar to std::unique in C++. (19060).torch.tril, torch.triu, torch.trtrs: now support batching. (15257, 18025).torch.gather: add support for sparse_grad option. (17182).torch.std, torch.max_values, torch.min_values, torch.logsumexp can now operate over multiple dimensions at once. (14535, 15892, 16475).torch.cdist: added operator equivalent to scipy.spatial.distance.cdist. (16168, 17173).torch. __config__.show(): reports detailed version of all libraries. (18579).nn.MultiheadedAttention: new module implementing MultiheadedAttention from Attention Is All You Need. (18334).nn.functional.interpolate: added support for bicubic. (9849).nn.SyncBatchNorm: support synchronous Batch Normalization. (14267).nn.Conv: added support for Circular Padding via mode='circular'. (17240).nn.EmbeddingBag: now supports trainable `per_sample_weights. (18799).nn.EmbeddingBag: add support for from_pretrained method, as in nn.Embedding. (15273).RNNs: automatically handle unsorted variable-length sequences via enforce_sorted. (15225).nn.Identity: new module for easier model surgery. (19249).torch.bool: added support for torch.bool dtype and Tensors with that dtype (1-byte storage). NumPy conversion is supported, but operations are currently limited. (16810).optim.lr_scheduler.CyclicLR: Support for Cyclical Learning Rate and Momentum. (18001).optim.lr_scheduler.CosineAnnealingWarmRestarts: new scheduler: Stochastic Gradient Descent with Warm Restarts). (17226).torch.distributions: now support multiple inheritance. (16772).quasirandom.SobolEngine: new sampler. (10505).nn.parallel.DistributedDataParallel: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). (18251, 18953).@ignore annotation, which statically tells the TorchScript compiler to ignore the Python function. (#16055)for...in loops on lists. (#16726)...) in Tensor indexing. (#17763)None in Tensor indexing. (#18615)if foo is not None. (#15587)to(), cpu(), and cuda() on ScriptModules. (#15340 , #15904)clear(), pop(), reverse(), copy() , extend(),index(), count(), insert(), remove() ).sort() on lists of specialized type (Tensors, int, float, bool). (#19572)index(), slice(), len())Tensor.to() in TorchScript. ( #15976 )Torch.tensor() in TorchScript. (#14913, #19445)torch.manual_seed() in TorchScript. (#19510)nn.LSTM in TorchScript. (#15744)nn.init in TorchScript. (#19640)hash() builtin. (#18258)min() and max() builtins for numerical types. (#15680)isinstance() builtin, which performs a static type check. (#15076)train() / eval() / is_training() to C++ ScriptModule API. (#16044)std::vector and std::unordered_map as arguments to custom operators. (#17587)nn.Sequential in ModuleList. (#16882)torch.qint8 dtype, torch.quantize_linear conversion function. (18230).MKLDNN tensors via Tensor.to_mkldnn(); operators are currently limited to ResNext101 operators. (17748).torch.min, torch.max, torch.median, torch.mode, torch.kthvalue, torch.symeig, torch.eig, torch.pstrf, torch.qr, torch.geqrf, torch.solve, torch.slogdet, torch.sort, torch.topk, torch.gels, torch.triangular_solve, torch.svd now return namedtuples describing their outputs. (16186, 16950, 17093, 17195, 15429).torch.empty (and other factory functions): now take a pin_memory kwarg; can now pin without going through torch.Storage interface.. (18455).torch.histc: Now supported on CUDA. (15842)torch.unique: Add return_counts. (18391, 18651).torch.logspace: add the ability to specify a base. (19542).torch.set_printoptions: added scientific notation support. (16876).torch.btrifact now handles tensors with greater than 3 dimensions. (14964).torch.kthvalue: now supported on CUDA. (17544).torch.abs: now supported on uint8 and int8 dtypes. (16893).torch.stack, torch.cat: now supported for CPU half tensors. (16389).torch.cross: added support for negative dimensions. (17582).torch.lerp: add support for weight as a Tensor. (17348).torch.transpose: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is. (17462, 17535).torch.linspace, torch.logspace can now be used with steps=1 and start != end. (14748).torch.cholesky: changed the derivative from a triangular matrix to symmetric matrix. (19116).torch.lerp: Improved numerical stability. (18871).torch.logdet, torch.slogdet: improve numerical precision. (18449).Tensor. __contains__ is now supported. (17733).Tensor.fill_ and torch.zeros now support half on CPU. (17536).Tensor.resize_as_, Tensor.view: now supported on half CPU tensors. (18821).Tensor indexing: allow indexing via NumPy booleans. (14932).nn.EmbeddingBag: enable half precision dense backward. (19293).nn.Embedding: fix dense Embedding to work with double backwards. (9078).nn.MaxPool1d: Allow list and tuples to be passed as output_size. (16489).nn.CTCLoss: support zeroing infinite losses via zero_infinity argument. (16199).nn.Dropout: add support for enabling during eval. (17549).nn.MSELoss: add warning about unexpected broadcasting. (18349).nn.Module.load_state_dict: also return missing_keys and unexpected_keys. (18668).nn.parallel.data_parallel: Enforce devices match device_ids. (17129).torch.device: handle in more places that used to accept only device ordinals. (14929)dtype.int8 tensors can now be converted to NumPy arrays. (14710).nn.functional.gumbel_softmax: allow multidimensional input with dim argument. (13339).nn.functional.cosine_similarity: improved precision. (18250).torch.autograd: Don't keep unnecessary saved_inputs alive, increasing memory efficiency. (16583).torch.autograd.profiler: add Self (non-nested) CPU Time Total, CPU time total (19378).DataLoader: support accepting a custom memory pinning function. (16743).DataLoader: retry libshm on EINTR. (15964).DataLoader: fixed an issue with pin_memory and PackedSequence. (18079)data.utils.collate, data.utils.pin_memory: now preserve namedtuples. (16440)IndexError instead of RuntimeError on many indexing error cases. (17049, 17114).torch.float16 tensor on CPU. (17645).utils.checkpoint.checkpoint: support None as an argument to checkpoint function. (17969).torch.autograd: added more information for one of the variables needed for gradient computation has been modified by an inplace operation exception. (18523).cuda.synchronize: add a device argument. (19573).cuda.reset_max_memory_*: now supported. (15985).distributions.Independent: can now calculate KL Divergence. (17681).torch.distributed.new_group: now supports overriding default backend. (18595).torch.distributed.init_process_group: will now propagate timeout to underlying Store. (16571).nn.Module attributes to __constants__ when they are using in TorchScript. (#18164)torch.save(): Improve error message when you try to save a ScriptModule. (#15321)torch.jit.save(): Improve error message when trying to save a model with Python code. (#16850)__constants__. (#16724)__constants__. (#17167)nn::Module: added Python interop. (13481).autograd::profiler: is now supported. (16580)torch.argsort is now supported in C++. (17099).Tensor.isnan: now supported in C++. (15722).nn::Sequential. (17552).torch::data::transforms::Normalize: now supported in C++. (15891).std::vector<torch::Tensor>. (19677).torch.prod: correct erroneous calculation on large tensors. (15653).torch.mean (and other reductions): fix incorrect calculation on CUDA on large inputs. (16023).nn.Conv: correctly handle non-contiguous inputs on MKLDNN convolution codepath. (16300).Tensor.eq_: Fix erroneous calculation. (15475).torch.mean: Fix fp16 output calculation. (14878).nn.PoissonNLLLoss: Properly handle reduction=None. (17358).Tensor.round is now consistently half to even. (17443).Tensor.resize_: Fix some 0-element cases. (14874).Tensor.numpy: Fix conversion of torch.int8 dtype. (15194).Tensor.grad: correctly handle del. (16525).Tensor.clamp: correctly handle NaN on CUDA. (15479).Tensor.topk: properly set launch bounds on CUDA. (17296).Tensor.kthvalue: treat NaN as bigger than any number. (17824).Tensor.copy_: Properly synchronize on src and dst sreams. (16966).Tensor indexing: Fix incorrect dimension error message. (16495).Tensor.coalesce, Tensor.clone, Tensor.to_dense: fixed for sparse 0-dimensional tensors. (17379).torch.isinf: Don't error out on integral tensors. (15489).torch.argsort, torch.sort: Match NumPy by considering NaNs to be larger than any number. (15886).torch.geqrf, torch.ormqr: when an out parameter is specified, dispatch to the correct function. (16964).torch.cuda.get_device_name / torch.cuda.get_device_capability: Fix handling of optional. (17222).Tensor.tril_ / Tensor.triu_: properly reuse input memory. (17031).torch.arange: fix shape inconsistency between CPU and CUDA. (18462).torch.empty (and other size-based factory functions): properly enforce non-negative sizes. (17077).torch.load: support serializing / deserializing pathlib.Path object. (18562).nn.BatchNorm: correctly handle very large batches. (17047).nn.Softmax / nn.LogSoftmax: fix double backward for torch.half. (17330).nn.Softmax: handle empty inputs in backward. (17259).nn.NLLLoss: Fix crash when ignore_index is out-of-bounds on CPU. (17328).nn.Softmax, nn.LogSoftmax: handle 0-element inputs. (17651).nn.CTCLoss: correct error checking. (16269).nn.Conv: better report convolution size mismatch. (17436).torch.nn.functional.cosine_similarity: fix output sometimes returning result > 1.0. (18168).nn.parallel.data_parallel: Fix handling of buffers that require_grad. (13352).nn.parallel.data_parallel: would previously sometimes frees tensors before all pending operations finish. (18465).torch.distributed.broadcast: fixed repeated calls leading to OOM. (19219).torch.multiprocessing: fix serialization of integer nn.Parameters. (18639).torch.multiprocessing: Fix handling of distributions on CUDA. (16854).torch.nonzero: Fix for 0-dimensional tensors on CUDA. (17406).torch.slogdet: Fix sign requiring grad when input required grad. (16337).torch.cuda.Stream: Properly restore stream on destination device when switching devices. (17439).torch.cuda.Stream: Fixed synchronization issue when used with non-current device. (15689).torch.cuda.Stream: properly change device in stream context manager. (16128).DataLoader: fixed a hang when no data was read and the buffer size is smaller than the chunk size. (17409).DataLoader: _utils.collate.default_collate now converts bool lists to byte Tensors, not integer tensors.DataLoader: ensure dataset is indexed by integers. (17649).torch.sparse.mm: Handle transposed dense tensors in backwards. (18737).torch.sparse.sum: Fix parsing of dim. (16517).torch.sparse.mm / torch.sparse.addmm: fix broadcasting and using uninitialized data. (16572).Tensor.to_sparse: Fix for 0-dimensional tensors. (17406).SparseTensor: fix add with non-contiguous values tensors. (18179).compare_exchange_weak in weak_intrusive_ptr. (16302).utils.model_zoo.load_url: Fix race condition. (16578).utils.data.RandomSampler: have len properly take into account num_samples. (15991).torch.distributions: Fix precision issue with expansion that prefers probs over logits. (18614).distributions.dirichlet.Dirichlet: fixed an underflow issue. (17488).distributions.binomial.Binomial.log_prob: fixed numerical stability issue. (15962).Caching Allocator: Free all blocks with outstanding events on OOM-retry. (19222).torch.dtype: fix pickling issue with Python 2. (18045).utils.data.DataLoader: Fix SIGCHLD checking. (19421).optim.Optimizer: Properly copy defaults. (19308).optim.lr_scheduler.CosineAnnealingLR: Fix division-by-zero error. (19180).optim.lr_scheduler.ReduceLROnPlateau: fix bug when the argument to step is reused outside the function.cudNN: fix race condition with multiple threads calling into the same device. (15080).cudNN: Properly specify accumulation types. (16825).cuDNN: Fix incorrectly selecting slower algorithms in certain cases. (15881).cuFFT: Properly handle CUDA contexts. (19300)MKLDNN: fix thread safety. (17022).floordiv: Fix integer division and divide-by-zero semantics. (#15813).ord(): Fix handling of utf8 chars. (#19423).requires_grad analysis pass. (#18361).rnn.py. (#18198)._unique_state_dict could contain duplicate Tensors. (#18139).Stream and Event APIs. (15937).extra_cuda_cflags to C++ extensions on Windows. (18638).torch::nn::init::orthogonal_: match Python API. (18915).torch.btrifact: the deprecated info argument has been removed. (14935).torch.potrs has been deprecated, use torch.cholesky_solve instead. Note that upper defaults to False for torch.cholesky_solve, and True for torch.potrs. (15334).torch.pstrf is deprecated; use torch.cholesky instead. Note that upper defaults to False for torch.cholesky, and True for torch.pstrf. (17866).torch.potri is deprecated; use torch.cholesky_inverse instead. Note that upper defaults to False for torch.cholesky_inverse, and True for torch.potri. (19498).torch.btrifact_with_info has been deprecated; use torch.lu with get_infos=True instead.(18435).torch.btrifact has been deprecated; use the new name torch.lu instead. (18435).torch.gesv is deprecated; use the new name `torch.solve instead. (18060).torch.trtrs has been deprecated; use the new name torch.triangular_solve instead. (18213).torch. btriunpack has been deprecated; use the new name torch.lu_unpack instead. (18529).torch.btrisolve has been deprecated; use the new name torch.lu_solve instead. (18726).IntList has been deprecated, use IntArrayRef instead, as it better describes the type and ownership semantics in C++. (16751).Type parameters, e.g. AT_DISPATCH_ALL_TYPES(tensor.type(), ..., are now deprecated; use ScalarType instead, e.g. AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), .... (17527, 17996).variable_tensor_functions have been removed. (15003).nn.BatchNorm CPU inference speed increased up to ~19x.(19152).nn.AdaptiveAvgPool: speed up common-case of size=1 output by ~30x. (17011).nn.EmbeddingBag CPU performance increased by ~4x. (19329).Tensor.copy_: sped up larger tensor copy ~2-3x, small regression in small tensor copy. (18618).torch.nonzero: is now ~2x faster than numpy on CPU. (15190)reduction functions: Speed up some large Tensor cases by 50-80%. (17428).batch_norm fusion for inference. (#15146)layer_norm fusion for inference. (#18266)torch.abs, torch.frac, torch.repiprocal, torch.neg have been vectorized and parallelized (19041).torch.bmm: CPU performance increased by 2x. (19338).torch.sort: CUDA performance increased by ~2x. (19379).torch.cat on CPU is now ~4x faster in the case where inputs are contiguous and dim != 0. (17032).torch.multinomial fixed a 2x performance regression. (17121).torch.empty (and another factory functions): reduce overhead by 20-40%. (17565).torch.linspace has been parallelized on CPU. (15320).torch.logspace has been parallelized on CPU. (15438).torch.range has been parallelized on CPU. (15484).torch.arange has been parallelized on CPU. (15667).torch.load: avoid unnecessary CPU-to-CUDA copy. (17297).reduction functions: improve efficiency on CUDA. (16224, 17040).sparse/dense matrix multiply: improve speed by ~5x. (16905).distributions.MultivariateNormal: sped up. (17294).aten::_convolution now participates in shape analysis. (#16837]randlike. (#14740)adaptive_avg_pool2d. (#15459)erf and erfc. (#15139)layernorm. (#17702)tanh. (#17816)matmul/dropout. (#17523)Tensor.scatter_: add documentation about value parameter. (17467).Tensor.unfold: correctly document dimension parameter, not dim. (19020).Tensor.is_floating_point() is now documented. (15704).torch.cholesky: Fix broken upper example in documentation. (15215).torch.gesv: document out parameter. (15649).torch.mul: better explain elementwise multiplication. (15664).torch.eig, torch.symeig: better explain backwards limitations. (15929).torch.ormqr: fixed output specification. (15694).torch.from_numpy: replaced usage with torch.as_tensor in documentation. (16587).torch.mvlgamma: Fix the constant in the docs. (17045).torch.mode: more precisely describe what is returned. (17069).torch.upsample: documentation now matches torch.interpolate. (17134)torch.arange: correct dtype documentation. (18604)torch.cumprod: document out parameter. (19340).torch.nonzero: document indices being returned lexicographically. (19539).torch.nn.functional.interpolate: better explain aligned_corners parameter. (14806).torch.nn.functional.pad: documentation has been made consistent with other functional ops. (15984).nn.functional.grid_sample: clarify behavior of padding. (19754).nn.TripletMarginLoss: correct type of swap parameter. (18115).nn.CrossEntropyLoss: clarify ignore_index documentation. (18117).nn.CrossEntropyLoss: the input format is more clearly explained. (15990).nn.CTCLoss: Clarify a number of ambiguities. (18415).nn.BCEWithLogitsLoss: add better explanation. (19212).nn.BCEWithLogitsLoss: better explain positive samples. (17258).nn.ModuleList / nn.ParameterList: update documentation. (17731).nn.Module.load_state_dict: correct semantics of strict. (17618)nn.parallel.DataParallel: more accurately specify how different argument types are handled. (15993).nn.parallel.DistributedDataParallel: Clarified batch size requirements. (16010).torch.distributed: Document mixed-precision training. (15440).torch.multiprocessing: Include example multiprocessing code. (16345).torch.autograd: Better explain computing Jacobian-vector product. (15197).torch.cuda.get_rng_state, torch.cuda.set_rng_state: document taking a device object. (14324).torch.device: Fix example of passing device to tensor factory. (16839).DataLoader: update documentation to describe how workers are managed. (18091).reduction arguments to use non-deprecated format. (17300).mark_non_differentiable: document correct semantics. (17891).activations attribute in nn.RNN ONNX export. (19368).Note: our conda install commands have slightly changed. Version specifiers such as cuda100 in conda install pytorch cuda100 -c pytorch have changed to conda install pytorch cudatoolkit=10.0 -c pytorch
π There are no breaking changes in this release.
torch.save and torch.load would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1.1 ^^ x, where x is a PyTorch scalar (#16687)The JIT is a set of compiler tools for bridging the gap between research in PyTorch
β‘οΈ and production. It allows for the creation of models that can run without a dependency on the Python interpreter and which can be optimized more aggressively. Using program annotations existing models can be transformed into Torch Script, a subset of Python that PyTorch can run directly. Model code is still valid Python code and can be debugged with the standard Python toolchain. PyTorch 1.0 provides two ways in which you can make your existing code compatible with the JIT, using torch.jit.trace or torch.jit.script. Once annotated, Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.
# Write in Python, run anywhere!@torch.jit.scriptdef RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
As an example, see a tutorial on deploying a seq2seq model,
π loading an exported model from C++, or browse the docs.
π¦ The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by a brand new re-designed distributed library. The main highlights of the new library are:
torch.distributed is performance driven and operates entirely asynchronously for all backends: Gloo, NCCL, and MPI.π The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:
| Python | C++ |
|---|---|
| import torch |
model = torch.nn.Linear(5, 1) β‘οΈ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() β‘οΈ optimizer.step() | #include <torch/torch.h>
torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); β‘οΈ optimizer.step(); |
We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next couple of releases. Some parts of the API may undergo breaking changes during this time.
π See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.
Torch Hub is a pre-trained model repository designed to facilitate research reproducibility.
π Torch Hub supports publishing pre-trained models (model definitions and pre-trained weights) to a github repository using a simple hubconf.py file; see hubconf for resnet models in pytorch/vision as an example. Once published, users can load the pre-trained models using the torch.hub.load API.
π For more details, see the torch.hub documentation. Expect a more-detailed blog post introducing Torch Hub in the near future!
Operations that result in 0 element tensors may return changed shapes.
torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)
torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
π torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
Some inter-type operations (e.g. *) between torch.Tensors and NumPy arrays will now favor dispatching to the torch variant. This may result in different return types. (#9651).
π Implicit numpy conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')) before an implicit conversion. (#10553).
torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
π torch.tensor function with a Tensor argument now returns a detached Tensor (i.e. a Tensor where grad_fn is None). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
#11815).
torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape (N,) instead of (N, C) to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
(#9965).
The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
π Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
π CPP Extensions : Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and TensorOptions as the last argument. For example, replace your call to at::ones(torch::CPU(at::kFloat)), {2, 3}) with torch::ones({2, 3}, at::kCPU). This applies to the following functions:
arange, empty, eye, full, linspace, logspace, ones, rand, randint, randn, randperm, range, zeros.0οΈβ£ torch.potrf renamed to torch.cholesky. It has a new default (upper=False) (#12699).
π Renamed elementwise_mean to mean for loss reduction functions (#13419)
Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
torch.roll operator to match numpy.roll (#13261, #13588, #13874).dtype, similar to numpy.finfo and numpy.iinfo (#12472).Tensor. __cuda_array_interface__ to provide compatibility with numba and other CUDA projects (#11984).Tensor.to_sparse() allows conversion from a dense tensor to a sparse tensor. (#12171)values() and torch.sparse_coo_tensor (with indices and values tensors). E.g., torch.sparse_coo_tensor(i, v).values().sum() is differentiable w.r.t. v. See the updated torch.sparse documentation for details. (#13001).dim argument. (#10423).expand method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341).copy keyword argument. (#12571).dtype accumulation argument. (#11719).compute_uv argument for optionally computing singular vectors (#12517).view) was incorrect with overlapping data locations. (#9538).reduction method. (#10018).__rsub__ now works properly when the CUDA device is not 0. (#12956).replacement=False will not properly throw an error message when there are no more categories to select (#12490).Tensor. __delitem__: fixed a segmentation fault on (#12726).load_from_state_dict now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781).RuntimeError: storages don't support slicing when loading models saved with PyTorch 0.3. (#11314).reduce parameter. (#12689).eval mode. (#10621).out= parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273).Tensor gave incorrect results on CPU. (#10269).out parameter if it is given. (#9755).replacement=True could select 0 probability events on CUDA. (#9960).NaN.inf / -inf. (#11091).torch.nn.Conv modules with stride and dilation. (#9640).output_size calculation (#12952).Tensor constructors (e.g. torch.FloatTensor(...)) now correctly check their device argument.out parameter is a CPU Tensor for CPU unary ops. (#10358).None. (#12028).dir(torch) has been fixed with Python 3.7. (#10271).replacement=False and the input has fewer nonzero elements than num_samples. (#11933).torch.float16 dtype tensor to .grad. (#11781).can only join a started process error with torch.utils.data.DataLoader. (#11432).unexpected exit in torch.utils.data.DataLoader on KeyboardInterrupt. (#11718).nn.Parameter (#12886)torch.device inputs. (#10189).grad_fns. (#10181).np.int64 to PyTorch scalar. (#9225).eigenvectors=False is passed on CUDA rather than uninitialized data. (#10645).torch.utils.trainer (#12487)torch/torch.h header is deprecated in favor of torch/extension.h, which should be used in all C++ extensions going forward. Including torch/torch.h from a C++ extension will produce a warning. It is safe to batch replace torch/torch.h with torch/extension.h.torch::set_requires_grad. Replacement: at::Tensor now has a set_requires_grad method.torch::requires_grad. Replacement: at::Tensor now has a requires_grad method.torch::getVariableType. Replacement: None.torch.nn.parallel.deprecated.DistributedDataParallel.device, is_cuda, requires_grad, is_leaf and grad.tensors from seq. (#12741)This is a pre-release preview, do not rely on the tag to have a fixed set of commits, or rely on the tag for anything practical / important
The JIT is a set of compiler tools for bridging the gap between research in PyTorch
and production. It includes a language called Torch Script (don't worry it is a subset of Python,
so you'll still be writing Python), and two ways in which you can make your existing code compatible with the JIT.
β‘οΈ Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.
# Write in Python, run anywhere!@torch.jit.scriptdef RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
As an example, see a tutorial on deploying a seq2seq model,
π loading an exported model from C++, or browse the docs.
π¦ The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by the new "C10D" library. The main highlights of the new library are:
Gloo, NCCL, and MPI.π The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:
| Python | C++ |
|---|---|
| import torch |
model = torch.nn.Linear(5, 1) β‘οΈ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() β‘οΈ optimizer.step() | #include <torch/torch.h>
torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); β‘οΈ optimizer.step(); |
We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next month or two. Some parts of the API may undergo breaking changes during this time.
π See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.
Operations that result in 0 element tensors may return changed shapes.
torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)
torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
π torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
Some inter-type operations (e.g. *) between torch.Tensors and NumPy arrays will now favor dispatching to the torch variant. This may result in different return types. (#9651).
π Implicit numpy conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')) before an implicit conversion. (#10553).
torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
π torch.tensor function with a Tensor argument now returns a detached Tensor (i.e. a Tensor where grad_fn is None). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
#11815).
torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape (N,) instead of (N, C) to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
(#9965).
The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
π Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
π CPP Extensions : Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and TensorOptions as the last argument. For example, replace your call to at::ones(torch::CPU(at::kFloat)), {2, 3}) with torch::ones({2, 3}, at::kCPU). This applies to the following functions:
arange, empty, eye, full, linspace, logspace, ones, rand, randint, randn, randperm, range, zeros.Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
dim argument. (#10423).expand method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341).view) was incorrect with overlapping data locations. (#9538).reduction method. (#10018).load_from_state_dict now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781).RuntimeError: storages don't support slicing when loading models saved with PyTorch 0.3. (#11314).eval mode. (#10621).out= parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273).Tensor gave incorrect results on CPU. (#10269).out parameter if it is given. (#9755).replacement=True could select 0 probability events on CUDA. (#9960).NaN.inf / -inf. (#11091).torch.nn.Conv modules with stride and dilation. (#9640).Tensor constructors (e.g. torch.FloatTensor(...)) now correctly check their device argument.out parameter is a CPU Tensor for CPU unary ops. (#10358).None. (#12028).dir(torch) has been fixed with Python 3.7. (#10271).replacement=False and the input has fewer nonzero elements than num_samples. (#11933).torch.float16 dtype tensor to .grad. (#11781).can only join a started process error with torch.utils.data.DataLoader. (#11432).unexpected exit in torch.utils.data.DataLoader on KeyboardInterrupt. (#11718).torch.device inputs. (#10189).grad_fns. (#10181).np.int64 to PyTorch scalar. (#9225).eigenvectors=False is passed on CUDA rather than uninitialized data. (#10645).torch/torch.h header is deprecated in favor of torch/extension.h, which should be used in all C++ extensions going forward. Including torch/torch.h from a C++ extension will produce a warning. It is safe to batch replace torch/torch.h with torch/extension.h.torch::set_requires_grad. Replacement: at::Tensor now has a set_requires_grad method.torch::requires_grad. Replacement: at::Tensor now has a requires_grad method.torch::getVariableType. Replacement: None.torch.nn.parallel.deprecated.DistributedDataParallel.torch.stft has changed its signature to be consistent with librosa #9497
stft(signal, frame_length, hop, fft_size=None, normalized=False, onesided=True, window=None, pad_end=0)stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)torch.stft is also now using FFT internally and is much faster.torch.slice is removed in favor of the tensor slicing notation #7924 torch.arange now does dtype inference: any floating-point argument is inferred to be the default dtype; all integer arguments are inferred to be int64. #7016 torch.nn.functional.embedding_bag's old signature embedding_bag(weight, input, ...) is deprecated, embedding_bag(input, weight, ...) (consistent with torch.nn.functional.embedding) should be used insteadtorch.nn.functional.sigmoid and torch.nn.functional.tanh are deprecated in favor of torch.sigmoid and torch.tanh #8748 [1] x [0] now broadcasts to [0] (used to be [1]) #9209 π Adaptive Softmax nn.AdaptiveLogSoftmaxWithLoss #5287
\>\>\> in\_features = 1000\>\>\> n\_classes = 200\>\>\> adaptive\_softmax = nn.AdaptiveLogSoftmaxWithLoss(in\_features, n\_classes, cutoffs=[20, 100, 150])\>\>\> adaptive\_softmax AdaptiveLogSoftmaxWithLoss( (head): Linear(in\_features=1000, out\_features=23, bias=False) (tail): ModuleList( (0): Sequential( (0): Linear(in\_features=1000, out\_features=250, bias=False) (1): Linear(in\_features=250, out\_features=80, bias=False) ) (1): Sequential( (0): Linear(in\_features=1000, out\_features=62, bias=False) (1): Linear(in\_features=62, out\_features=50, bias=False) ) (2): Sequential( (0): Linear(in\_features=1000, out\_features=15, bias=False) (1): Linear(in\_features=15, out\_features=50, bias=False) ) ) )\>\>\> batch = 15\>\>\> input = torch.randn(batch, in\_features)\>\>\> target = torch.randint(n\_classes, (batch,), dtype=torch.long)\>\>\> # get the log probabilities of target given input, and mean negative log probability loss\>\>\> adaptive\_softmax(input, target) ASMoutput(output=tensor([-6.8270, -7.9465, -7.3479, -6.8511, -7.5613, -7.1154, -2.9478, -6.9885, -7.7484, -7.9102, -7.1660, -8.2843, -7.7903, -8.4459, -7.2371], grad\_fn=\<ThAddBackward\>), loss=tensor(7.2112, grad\_fn=\<MeanBackward1\>))\>\>\> # get the log probabilities of all targets given input as a (batch x n\_classes) tensor\>\>\> adaptive\_softmax.log\_prob(input) tensor([[-2.6533, -3.3957, -2.7069, ..., -6.4749, -5.8867, -6.0611], [-3.4209, -3.2695, -2.9728, ..., -7.6664, -7.5946, -7.9606], [-3.6789, -3.6317, -3.2098, ..., -7.3722, -6.9006, -7.4314], ..., [-3.3150, -4.0957, -3.4335, ..., -7.9572, -8.4603, -8.2080], [-3.8726, -3.7905, -4.3262, ..., -8.0031, -7.8754, -8.7971], [-3.6082, -3.1969, -3.2719, ..., -6.9769, -6.3158, -7.0805]], grad\_fn=\<CopySlices\>)\>\>\> # predit: get the class that maximize log probaility for each input\>\>\> adaptive\_softmax.predict(input) tensor([8, 6, 6, 16, 14, 16, 16, 9, 4, 7, 5, 7, 8, 14, 3])
π Add spectral normalization nn.utils.spectral_norm #6929
\>\>\> # Usage is similar to weight\_norm\>\>\> convT = nn.ConvTranspose2d(3, 64, kernel\_size=3, pad=1)\>\>\> # Can specify number of power iterations applied each time, or use default (1)\>\>\> convT = nn.utils.spectral\_norm(convT, n\_power\_iterations=2)\>\>\>\>\>\> # apply to every conv and conv transpose module in a model\>\>\> def add\_sn(m): for name, c in m.named\_children(): m.add\_module(name, add\_sn(c)) if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)): return nn.utils.spectral\_norm(m) else: return m\>\>\> my\_model = add\_sn(my\_model)
π nn.ModuleDict and nn.ParameterDict containers #8463
Add nn.init.zeros_ and nn.init.ones_ #7488
β Add sparse gradient option to pretrained embedding #7492
β Add max pooling support to nn.EmbeddingBag #5725
π Depthwise convolution support for MKLDNN #8782
β Add nn.FeatureAlphaDropout (featurewise Alpha Dropout layer) #9073
π torch.bincount (count frequency of each value in an integral tensor) #6688
\>\>\> input = torch.randint(0, 8, (5,), dtype=torch.int64)\>\>\> weights = torch.linspace(0, 1, steps=5)\>\>\> input, weights (tensor([4, 3, 6, 3, 4]), tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])\>\>\> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1])\>\>\> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])
π torch.as_tensor (similar to torch.tensor but never copies unless necessary) #7109
\>\>\> tensor = torch.randn(3, device='cpu', dtype=torch.float32)\>\>\> torch.as\_tensor(tensor) # doesn't copy\>\>\> torch.as\_tensor(tensor, dtype=torch.float64) # copies due to incompatible dtype\>\>\> torch.as\_tensor(tensor, device='cuda') # copies due to incompatible device\>\>\> array = np.array([3, 4.5])\>\>\> torch.as\_tensor(array) # doesn't copy, sharing memory with the numpy array\>\>\> torch.as\_tensor(array, device='cuda') # copies due to incompatible device
π torch.randperm for CUDA tensors #7606
π nn.HardShrink for CUDA tensors #8117
π torch.flip (flips a tensor along specified dims) #7873
π torch.flatten (flattens a contiguous range of dims) #8578
π torch.pinverse (computes svd-based pseudo-inverse) #9052
π torch.meshgrid #8581
π torch.unique for CUDA tensors #8899
π torch.erfc (complementary error function) https://github.com/pytorch/pytorch/pull/9366/files
π torch.isinf and torch.isfinite #9169 #9487
π torch.reshape_as #9452
π Support backward for target tensor in torch.nn.functional.kl_div #7839
π torch.logsumexp #7254
β Add batched linear solver to torch.gesv #6100
π torch.sum now supports summing over multiple dimensions https://github.com/pytorch/pytorch/pull/6152/files
π torch.diagonal torch.diagflat to take arbitrary diagonals with numpy semantics #6718
π tensor.any and tensor.all on ByteTensor can now accept dim and keepdim arguments #4627
NaN and errors occuring in backward. Two functions detect_anomaly and set_detect_anomaly are provided for this. #7677 reversed(torch.Tensor) #9216 hash(torch.device) #9246 gzip in torch.load #6490 bernoulli_ #8682 broadcast #8222 sum (2x~6x speed-up in certain cases) #8992 sigmoid (>3x speed-up in most cases) #8612 nn.LeakyReLU and nn.PReLU (2x speed-up) #9206 softmax and logsoftmax (4.5x speed-up on single core and 1.8x on 10 threads) #7375 nn.init.sparse (10-20x speed-up) #6899 requires_grad and grad_fn information #8211 NaN is now propagated through many activation functions #8033 non_blocking option to nn.Module.to #7312 pos_weight argument to nn.BCEWithLogitsLoss #6856 grad_clip for parameters on different devices #9302 pad_sequence have to be sorted #7928 stride argument for max_unpool1d, max_unpool2d, max_unpool3d now defaults to kernel_size #7388 torch.no_grad, torch.enable_grad) as decorators #7737 torch.optim.lr_scheduler._LRSchedulers __getstate__ include optimizer info #7757 Tensor as input in clip_grad_* functions #7769 NaN in max_pool/adaptive_max_pool for NaN inputs #7670 nn.EmbeddingBag can now handle empty bags in all modes #7389 torch.optim.lr_scheduler.ReduceLROnPlateau is now serializable #7201 LP-Pooling to zero if the sum of all input elements to the power of p is zero #6766 torch.einsum #7173 to method for PackedSequence #7319 __floordiv__ and __rdiv__ for integral tensors #7245 torch.clamp now has subgradient 1 at min and max #7049 torch.arange now uses NumPy-style type inference: #7016 torch.norm and torch.renorm #6969 out= keyword arugment in torch.dot and torch.matmul #6961 lazy_property #7708 nn.DataParallel #7973 nn.parallel.parallel_apply to take in a list/tuple of tensors #8047 torch.Size can now accept PyTorch scalars #5676 torch.utils.data.dataset.random_split to torch.utils.data.random_split, and torch.utils.data.dataset.Subset to torch.utils.data.Subset #7816 torch.device #7713 torch.(int/float/...)* dtype objects #7699 torch.load can now take a torch.device as map location #7339 nn.BCELoss sometimes returning negative results #8147 tensor._indices on scalar sparse tensor giving wrong result #8197 tensor.as_strided not working properly when input has overlapping memory #8721 x.pow(0) gradient when x contains 0 #8945 torch.svd and torch.eig returning wrong results in certain cases #9082 nn.MSELoss having low precision #9287 torch.Tensor.grad_fn #9292 torch.topk returning wrong results when input isn't contiguous #9441 inputs / dilation #9274 avg_pool2/3d count_include_pad having default value False (should be True) #8645 nn.EmbeddingBag's max_norm option #7959 SpatialDepthwiseConvolution assuming contiguity #7952 DataLoader #7886 torch.einsum #7765 uniform.cdf() is now clamped to [0..1] #7538 CUDAGenerator will not initialize on the current device anymore, which will avoid unnecessary memory allocation on GPU:0 #7392 tensor.type(dtype) not preserving device #7474 num_workers > 0 #7265 torch.max and torch.min on CUDA in presence of NaN #7052 torch.tensor device-type calculation when used with CUDA #6995 '=' in nn.LPPoolNd repr function #9629 torch.autograd.gradcheck and torch.autograd.gradgradcheck #8166 tensor.scatter_add_ #9630 torch.add and tensor.add_, e.g. tensor.add(value=1, other) -> Tensor #9027 torch.logsumexp #8428 torch.sparse_coo_tensor #8152 torch.utils.data.dataset.random_split #7676 torch.nn.GroupNorm #7086 ConvTransposeNd, Fold/Unfold, Embedding/EmbeddingBag, Loss functions, etc.