NOTE: Conda installs for Python 3.9 will require the conda-forge channel, example:
conda install -y -c pytorch -c conda-forge pytorch.
β¬οΈ This upgrade fix regressions on Ampere cards introduced in cuDNN 8.0.4.
π It will improve performance for 3090 RTX cards, and may improve performance in other RTX-30 series card.
torch.sqrt: fix wrong output values for very large complex input (#48216)max_pool1d: fix for discontiguous inputs (#48219)collect_env: fix detection of DEBUG flag (#48319)collect_env: Fix to work when PyTorch is not installed (#48311)amp memory usage when running in no_grad() mode (#48936)nn.ParameterList and nn.ParameterDict: Remove spurious warnings (#48215)π The PyTorch 1.7 release includes a number of new APIs including support for NumPy-Compatible FFT operations, profiling tools and major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. In addition, several features moved to stable including custom C++ Classes, the memory profiler, the creation of custom tensor-like objects, user async functions in RPC and a number of other features in torch.distributed such as Per-RPC timeout, DDP dynamic bucketing and RRef helper.
A few of the highlights include:
π To reiterate, starting PyTorch 1.6, features are now classified as stable, beta and prototype. You can see the detailed announcement here. Note that the prototype features listed in this blog are available as part of this release.
π FFT-related functionality is commonly used in a variety of scientific fields like signal processing. While PyTorch has historically supported a few FFT-related functions, the 1.7 release adds a new torch.fft module that implements FFT-related functions with the same API as NumPy.
π This new module must be imported to be used in the 1.7 release, since its name conflicts with the historic (and now deprecated) torch.fft function.
Example usage:
\>\>\> import torch.fft\>\>\> t = torch.arange(4)\>\>\> ttensor([0, 1, 2, 3])\>\>\> torch.fft.fft(t)tensor([6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j])\>\>\> t = tensor([0.+1.j, 2.+3.j, 4.+5.j, 6.+7.j])\>\>\> torch.fft.fft(t)tensor([12.+16.j, -8.+0.j, -4.-4.j, 0.-8.j])
π Since PyTorch 1.5, weβve continued to maintain parity between the python and C++ frontend APIs. This update allows developers to use the nn.transformer module abstraction from the C++ Frontend. And moreover, developers no longer need to save a module from python/JIT and load into C++ as it can now be used it in C++ directly.
β
Reproducibility (bit-for-bit determinism) may help identify errors when debugging or testing a program. To facilitate reproducibility, PyTorch 1.7 adds the torch.set_deterministic(bool) function that can direct PyTorch operators to select deterministic algorithms when available, and to throw a runtime error if an operation may result in nondeterministic behavior. By default, the flag this function controls is false and there is no change in behavior, meaning PyTorch may implement its operations nondeterministically by default.
More precisely, when this flag is true:
torch.backends.cudnn.deterministic = True is set.Note that this is necessary, but not sufficient , for determinism within a single run of a PyTorch program. Other sources of randomness like random number generators, unknown operations, or asynchronous or distributed computation may still cause nondeterministic behavior.
π See the documentation for torch.set_deterministic(bool) for the list of affected operations.
Users can now see not only operator name/inputs in the profiler output table but also where the operator is in the code. The workflow requires very little change to take advantage of this capability. The user uses the autograd profiler as before but with optional new parameters: with_stack and group_by_stack_n. Caution: regular profiling runs should not use this feature as it adds significant overhead.
Torchelastic offers a strict superset of the current torch.distributed.launch CLI with the added features for fault-tolerance and elasticity. If the user is not be interested in fault-tolerance, they can get the exact functionality/behavior parity by setting max_restarts=0 with the added convenience of auto-assigned RANK and MASTER_ADDR|PORT (versus manually specified in torch.distributed.launch).
π³ By bundling torchelastic in the same docker image as PyTorch, users can start experimenting with torchelastic right-away without having to separately install torchelastic. In addition to convenience, this work is a nice-to-have when adding support for elastic parameters in the existing Kubeflowβs distributed PyTorch operators.
PyTorch 1.7 introduces a new context manager to be used in conjunction with models trained using torch.nn.parallel.DistributedDataParallel to enable training with uneven dataset size across different processes. This feature enables greater flexibility when using DDP and prevents the user from having to manually ensure dataset sizes are the same across different process. With this context manager, DDP will handle uneven dataset sizes automatically, which can prevent errors or hangs at the end of training.
π In the past, NCCL training runs would hang indefinitely due to stuck collectives, leading to a very unpleasant experience for users. This feature will abort stuck collectives and throw an exception/crash the process if a potential hang is detected. When used with something like torchelastic (which can recover the training process from the last checkpoint), users can have much greater reliability for distributed training. This feature is completely opt-in and sits behind an environment variable that needs to be explicitly set in order to enable this functionality (otherwise users will see the same behavior as before).
remote and rpc_syncπ torch.distributed.rpc.rpc_async has been available in TorchScript in prior releases. For PyTorch 1.7, this functionality will be extended the remaining two core RPC APIs, torch.distributed.rpc.rpc_sync and torch.distributed.rpc.remote. This will complete the major RPC APIs targeted for support in TorchScript, it allows users to use the existing python RPC APIs within TorchScript (in a script function or script method, which releases the python Global Interpreter Lock) and could possibly improve application performance in multithreaded environment.
β‘οΈ PyTorch provides a broad set of optimizers for training algorithms, and these have been used repeatedly as part of the python API. However, users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency in the context of large scale distributed training (e.g. Distributed Model Parallel) or any RPC-based training application). Users couldnβt do this with with distributed optimizer before because we need to get rid of the python Global Interpreter Lock (GIL) limitation to achieve this.
π In PyTorch 1.7, we are enabling the TorchScript support in distributed optimizer to remove the GIL, and make it possible to run optimizer in multithreaded applications. The new distributed optimizer has the exact same interface as before but it automatically converts optimizers within each worker into TorchScript to make each GIL free. This is done by leveraging a functional optimizer concept and allowing the distributed optimizer to convert the computational portion of the optimizer into TorchScript. This will help use cases like distributed model parallel training and improve performance using multithreading.
π Currently, the only optimizer that supports automatic conversion with TorchScript is Adagrad and all other optimizers will still work as before without TorchScript support. We are working on expanding the coverage to all PyTorch optimizers and expect more to come in future releases. The usage to enable TorchScript support is automatic and exactly the same with existing python APIs, here is an example of how to use this:
import torch.distributed.autograd as dist\_autogradimport torch.distributed.rpc as rpcfrom torch import optimfrom torch.distributed.optim import DistributedOptimizerwith dist\_autograd.context() as context\_id: # Forward pass.rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3)) rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1)) loss = rref1.to\_here() + rref2.to\_here() # Backward pass.dist\_autograd.backward(context\_id, [loss.sum()]) # Optimizer, pass in optim.Adagrad, DistributedOptimizer will# automatically convert/compile it to TorchScript (GIL-free)dist\_optim = DistributedOptimizer( optim.Adagrad, [rref1, rref2], lr=0.05, ) dist\_optim.step(context\_id)
π Support for using the PyTorch profiler in conjunction with the RPC framework was first introduced in PyTorch 1.6. In PyTorch 1.7, the following enhancements have been made:
rpc.functions.async_execution).π User are now able to use familiar profiling tools such as with torch.autograd.profiler.profile() and with torch.autograd.profiler.record_function, and this works transparently with the RPC framework with full feature support, profiles asynchronous functions, and TorchScript functions.
π PyTorch 1.7 brings prototype support for DistributedDataParallel and collective communications on the Windows platform. In this release, the support only covers Gloo-based ProcessGroup and FileStore.
π¨ To use this feature across multiple machines, please provide a file from a shared file system in init_process_group.
# initialize the process groupdist.init\_process\_group( "gloo", # multi-machine example:# Shared files need six "/"# init\_method = `"file://////{machine}/{share_folder}/file"`# Local file need three "/"init\_method="file:///{your local file path}", rank=rank, world\_size=world\_size)model = DistributedDataParallel(local\_model, device\_ids=[rank])
π¦ PyTorch Mobile supports both iOS and Android with binary packages available in Cocoapods and JCenter respectively. You can learn more about PyTorch-Mobile here.
π On some mobile platforms, such as Pixel, we observed that memory is returned to the system more aggressively. This results in frequent page faults as PyTorch being a functional framework does not maintain state for the operators. Thus outputs are allocated dynamically on each execution of the op, for the most ops. To ameliorate performance penalties due to this, PyTorch 1.7 provides a simple caching allocator for CPU. The allocator caches allocations by tensor sizes and, is currently, available only via the PyTorch C++ API. The caching allocator itself is owned by client and thus the lifetime of the allocator is also maintained by client code. Such a client owned caching allocator can then be used with scoped guard, c10::WithCPUCachingAllocatorGuard, to enable the use of cached allocation within that scope.
Example usage:
#include \<c10/mobile/CPUCachingAllocator.h\>..... c10::CPUCachingAllocator caching\_allocator; // Owned by client code. Can be a member of some client class so as to tie the// the lifetime of caching allocator to that of the class...... { c10::optional\<c10::WithCPUCachingAllocatorGuard\> caching\_allocator\_guard; if (FLAGS\_use\_caching\_allocator) { caching\_allocator\_guard.emplace(&caching\_allocator); } .... model.forward(..); } .....
NOTE : Caching allocator is only available on mobile builds, thus the use of caching allocator outside of mobile builds wonβt be effective.
torch.conj now returns the input as-is for real Tensors (#43270)π Previously, torch.conj and Tensor.conj were making a clone for Tensors of real dtype. It now returns the Tensor as-is to improve performance.
π― You can recover the original behavior by adding a .clone() for real Tensors.
Note that this behavior is different from numpy for which np.conj returns a new ndarray and ndarray.conj returns the ndarray as-is.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> t.is_complex() |
False
>>> t.conj() is t
False
| >>> t.is_complex()
False
>>> t.conj() is t
True
π― >>>t.conj().clone() is t
False
|
torch.tensor, torch.as_tensor, and torch.sparse_coo_tensor now use the input Tensorβs device when it is not specified (#41984)π This will change the device on which the Tensor is created and so the user can start seeing device mismatch errors.
π It also means for sparse Tensors that both of the provided Tensors must be on the same device if the device is not specified.
You can recover the original behavior by passing the device argument.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> t.device |
device(type=βcuda:0β)
>>> # tensor constructor
>>> torch.tensor(t, dtype=torch.float32).device
device(type=βcpuβ)
π >>> # sparse constructor
>>> torch.sparse_coo_tensor(
torch.tensor(([0], [2]), device="cpu"),
torch.tensor(([1.],), device="cuda"),
size=(3, 3, 1)).device
device(type='cuda', index=0)
| >>> t.device
device(type=βcuda:0β)
>>> # tensor constructor
>>> torch.tensor(t, dtype=torch.float32).device
device(type=βcuda:0β)
>>> # Specify the device to get the same behavior as 1.6
>>> torch.tensor(t, dtype=torch.float32, device='cpu').device
device(type=βcpuβ)
π >>> # sparse constructor
>>> torch.sparse_coo_tensor(
torch.tensor(([0], [2]), device="cpu"),
torch.tensor(([1.],), device="cuda"),
size=(3, 3, 1)).device
RuntimeError: backend of indices (CPU) must match backend
of values (CUDA)
>>> # Specify the device to get the same behavior as 1.6
>>> torch.sparse_coo_tensor(
torch.tensor(([0], [2]), device="cpu"),
torch.tensor(([1.],), device="cuda"),
size=(3, 3, 1),
device="cuda:0").device
device(type='cuda', index=0)
|
torch.nn.utils.pack_padded_sequence: remove hidden cross-device copy for lengths (#41984)π In previous versions, when the lengths argument was a CUDA tensor, it would incorrectly be moved to the CPU silently.
π This can lead to surprising performances and CPU/GPU sync when using CUDA so this has been removed.
You need to make sure that the provided lenghts is a CPU Tensor when it is provided as a Tensor.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> inp = torch.rand(10, 2, 3, device="cuda") |
>>> lengths = torch.tensor([10, 7], device="cuda") >>> torch.nn.utils.rnn.pack_padded_sequence(inp, lengths) π >>> # Implicitly move lengths to the CPU and runs fine | >>> inp = torch.rand(10, 2, 3, device="cuda") >>> lengths = torch.tensor([10, 7], device="cuda") >>> torch.nn.utils.rnn.pack_padded_sequence(inp, lengths) RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor >>> # Ensure the lenghts is already on the right device >>> lengths = lengths.cpu() >>> torch.nn.utils.rnn.pack_padded_sequence(inp, lengths) π >>> # Runs fine with no implicit move across device |
torch.norm handling of keepdim=True (#41956)0οΈβ£ Before this change, when calling torch.norm with keepdim=True and p='fro' or p=number, leaving all other optional arguments as their default values, the keepdim argument would be ignored. It is now properly respected.
Also, any time torch.norm was called with p='nuc' and keepdim=True, the result would have one fewer dimension than the input, and the dimensions could be out of order depending on which dimensions were being reduced. It is now properly keeping all the dimensions.
You can recover the original behavior by setting keepdim=False.
π NOTE: this function is now deprecated (see below) and we recommend you use torch.linalg.norm, which follows NumPyβs conventions.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> t.size() |
torch.Size([4, 4])
>>> t.norm(p=βfroβ, keepdim=True).size()
torch.size([])
>>> t.norm(p=3, keepdim=True).size()
torch.size([])
>>> t.norm(p=βnucβ, keepdim=True).size()
torch.size([1])
| >>> t.size()
torch.Size([4, 4])
>>> t.norm(p=βfroβ, keepdim=True).size()
torch.size([1, 1])
>>> t.norm(p=3, keepdim=True).size()
torch.size([1, 1])
>>> t.norm(p=βnucβ, keepdim=True).size()
torch.size([1, 1])
|
torch.split and torch.chunk: Fix view tracking for the autograd (#41567)π The autograd system is able to correctly handle modifications through views of Tensors by explicitly tracking known view operations. In prior releases, torch.split and torch.chunk were not marked as known view operations, which could lead to silently wrong gradients.
π Note that since v1.5, inplace modification of views created by functions that return multiple views is deprecated. Such case is not properly handled by the autograd and can lead to internal errors or wrong gradients. So, as a side effect of this view fix, inplace modifications of the outputs of torch.split and torch.chunk will now raise a warning and can lead to internal errors or wrong gradients while they were previously silently computing wrong gradients.
π If you see such a warning, you should replace the inplace operation with an out of place one.
You can recover the original behavior by using the new torch.unsafe_split and torch.unsafe_chunk. Note that these functions are only here to ease the transition and will also be removed in a future version.
torch.{argmin,argmax} now always return the first min/max index (#42004)torch.argmin (torch.argmax) now always returns the index of the first minimum (maximum) element. This choice is consistent with NumPy. Previously if there were multiple minima (maxima) the index returned could be the index of any of them.
β‘οΈ You cannot recover the original behavior as it was platform dependent and not guaranteed. If your code was relying on a specific index for your specific platform, you should update it to work with the first index and this new code will work on all platforms.
torch.{min,max,median}: Update backward formula when doing full reduction (dim argument not provided) (#43519)When no dimension is specified, full reduction is performed and the gradient will now flow back evenly towards all the input that realized the output value. The old behavior was to propagate the gradient only for one of such input selected arbitrarily.
This should improve stability of training by gradient descent.
To recover the previous behavior, you can perform the reduction with the dim= argument. It will ensure that the gradient only flows back for the input whose index was returned.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> a |
tensor([3, 2, 3])
>>> a.max().backward()
>>> a.grad
tensor([0, 0, 1])
| >>> a
tensor([3, 2, 3])
>>> a.max().backward()
>>> a.grad
tensor([0.5, 0, 0.5])
>>> a.max(dim=0).max(dim=0).max(dim=0).backward()
>>> a.grad
tensor([0, 0, 1])
|
nn.BCELoss size mismatch warning is now an error (#41426)π This is the end of the deprecation cycle for this op to make sure it does not have different broadcasting semantic compared to numpyβs broadcasting semantic used everywhere else in PyTorchβs codebase.
You need to make sure all inputs are the same size to avoid the error.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> bceloss = nn.BCELoss() |
>>> a = torch.rand(25)
>>> b = torch.rand(25, 1)
>>> bceloss(a, b)
β UserWarning: Using a target size (torch.Size([25, 1]))
that is different to the input size (torch.Size([25]))
π is deprecated. Please ensure they have the same size.
tensor(1.0604)
| >>> bceloss = nn.BCELoss()
>>> a = torch.rand(25)
>>> b = torch.rand(25, 1)
>>> bceloss(a, b)
ValueError: Using a target size (torch.Size([25, 1]))
that is different to the input size (torch.Size([25]))
π is deprecated. Please ensure they have the same size.
>>> b = b.reshape(25)
>>> bceloss(a, b)
tensor(1.0604)
|
autograd.Function stop materializing None output Tensors (#41490)π To improve performance, the custom autograd.Function will not create a Tensor full of zeros when an input is differentiable but the userβs backward function returns None for it. This means that code for which the .backward() or autograd.grad() final result will now be None while it used to be a Tensor full of zeros.
You can recover the previous behavior by having your custom autograd.Function materialize the zero Tensor with torch.zeros_like(input) to replace the None output for the backward method.
import torch# Custom Function that returns None for the gradientclass GetTwos(torch.autograd.Function): @staticmethoddef forward(ctx, inp): return inp.clone().fill\_(2) @staticmethoddef backward(ctx, grad\_out): # To recover the 1.6 behavior, replace the line below with `return torch.zeros_like(grad_out)`return Nonea = torch.rand(10, requires\_grad=True)b = GetTwos.apply(a)b.sum().backward()print(a.grad)# In PyTorch 1.6 this will print# tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])# In PyTorch 1.7 this will print# None
π We fixed a bug in the inplace detection code that was preventing the detection of some inplace operations for output that are not differentiable (like integer type Tensors).
This can lead to code that used to run fine to throw the error "a Tensor that was needed for backward was modified in an inplace operation".
π Such failure is true and the user code must be fixed to compute proper gradients. In general, this involves cloning the Tensor before modifying it inplace to make sure the backward pass can happen safely.
import torcha = torch.rand(10, requires\_grad=True)with torch.no\_grad(): a[2] = 10b, ind = a.max(dim=0)# ind is 2 herewith torch.no\_grad(): t = torch.rand(10) t[4] = 10res = torch.max(t, dim=0, out=(torch.Tensor(), ind)) # ind becomes 4 here# This backward runs in 1.6 but will fail in 1.7b.sum().backward()print(a.grad)# tensor([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])# The value is wrong is at index 4 while it should be at index 2# The issue is avoided by not modifying ind inplace by replacing the line# above with:# res = torch.max(t, dim=0, out=(torch.Tensor(), ind.clone()))
__torch_functions__ for methods (#37091)Functions, slicing and Tensor methods will now properly preserve the subclass type when possible.
\>\>\> class SubTensor(torch.Tensor): ... pass\>\>\> type(torch.add(SubTensor([0]), SubTensor([1]))).\_\_name\_\_'SubTensor'\>\>\> type(torch.add(SubTensor([0]), torch.Tensor([1]))).\_\_name\_\_'SubTensor'
The old behavior of "any operations on your subclass produces a torch.Tensor instead of the subclass" can be recovered by doing:
from torch.\_C import \_disabled\_torch\_function\_implclass SubTensor(torch.Tensor): \_\_torch\_function\_\_ = \_disabled\_torch\_function\_impl
π For all details on how to use this feature, please refer to the doc page for it.
tensor. __iter__: Use torch.unbind instead of a for loop (#40884)π This improves performances significantly but it changes the behavior of in-place operations on the value returned by the iterator. This happens only if either the input Tensor or any argument of the in-place operation is a Tensor that requires gradients. And it will fail with "Output X of UnbindBackward is a view and is being modified inplace".
You can recover the previous behavior by manually slicing the Tensor: [t[i] for i in range(t.size(0))] as shown in the example below.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> x = torch.randn(5, 10, requires_grad=True) |
>>> for i, v in enumerate(x): >>> v.fill_(i) | >>> x = torch.randn(5, 10, requires_grad=True) >>> for i, v in enumerate([x[j] for j in range(x.size(0))]): >>> v.fill_(i) |
β‘οΈ It fixes silent correctness errors: something that used to be silently incorrect now errors out. Code that raises this error must be updated to avoid doing such op that was returning wrong results as shown in the example below:
\>\>\> x = torch.randn(1, 3)\>\>\> # Create a tensor that has internal memory overlap\>\>\> y = x.expand(2, 3)# In 1.6, this would not error out, but in 1.7, this errors out\>\>\> torch.nn.functional.elu(y, inplace=True)RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.# Here is the fix in 1.7\>\>\> torch.nn.functional.elu(y, inplace=False)
c++ API: Any external users of TensorIterator now always get the memory overlap check. The previous behavior can be recovered by setting set_check_mem_overlap(false) when creating the iterator.
@property of TorchScript classes and ScriptModules. Custom setters and getters are also supported. Custom deleters are not supported.Modules. If these properties use Python or Pytorch features that are not supported in Torchscript, scripting will fail.@torch.jit.unused to annotate problematic properties, the other is to update the implementation of the property so that the getter and setter are scriptable.torch.absolute_ has been removed, the Tensor method (Tensor.absolute_) should be used instead just like all other inplace ops.torch.ExtraFilesMap is an internal jit construct and should not be used.In 1.7, we are enabling a Profiling Executor and a new Tensor-Expressions-based (TE) Fuser. All compilations will now go through one (an adjustable setting) profiling run and one optimization run. For the profiling run, complete tensor shapes are recorded and used by the new Fuser. For the optimization run, the focus is on finding (in torch.jit.ScriptModules) and fusing element-wise operations over CUDA tensors into a single CUDA kernel.
The TE fuser is expected to deliver performance similar to the old fuser used in 1.6. It however unlocks more opportunities for performance improvements in future releases. In rare cases, performance of some models may degrade 5-10%. If you experience any regressions please report it on Github, so we can address them as soon as possible! For 1.7, we are providing an option for our users to revert back to the old fuser by calling torch._C._jit_set_profiling_executor(False) in Python and torch::jit::getExecutorMode()= false; in C++. For more information, please see "Graph Executor" section in our documentation.
torch.norm and torch.functional.norm are deprecated in favor of torch.linalg.norm (#44321)The new torch.linalg.norm has the same behavior as numpy.linalg.norm
π Both deprecated functions had odd behaviors for matrix and vector norms. You should refer to the doc here to find the exact behavior they had and how to replicate it with the new API.
torch. namespace in favor of torch.fft. namespace (#44876)Please use torch.fft.foo as a drop-in replacement for torch.foo for the following functions: fft, ifft, rfft and irfft.
out= functions need to resize an output which is not 0-size (#42079)π This behavior is dangerous and leads to an API that is hard to use. It is being deprecated to be able to fix that API in future versions.
You should resize the output before-hand to avoid any issue in the future:
a = torch.rand(5)b = torch.rand(25)# This is deprecatedtorch.add(a, a, out=b)# This has the same behavior but will work in future versionstorch.add(a, a, out=b.resize\_(0))
torch.optim: Warn for duplicate params in param group (#41597)π Providing multiple times the same Parameter in a single param group is most likely due to user error and is being deprecated.
Please open an issue if you have a valid use case that require this feature.
torch.linspace and torch.logspace: Not giving the step argument is deprecated (#43860)π The default steps argument that has been used historically in PyTorch is not consistent with other libraries and so is being removed to avoid confusion.
For both functions, passing steps=100 keyword argument can be used to recover the original behavior.
| 1.6.0 | 1.7.0 |
|---|---|
| >>> torch.linspace(0, 10).size() |
torch.Size([100])
| >>> torch.linspace(0, 10).size()
β UserWarning: Not providing a value for linspace's
π steps is deprecated and will throw a runtime error
π in a future release.
torch.Size([100])
>>> torch.linspace(0, 10, steps=100).size()
torch.Size([100])
|
ProcessGroup and ProcessGroup::Work APIs which will be retired soon. (#46366)π New namespaces:
π New operators:
torch.count_nonzero added (#39992)nn.SiLU activation added (#41034)torch.logit added (#41062)torch.gcd, torch.lcm added (#40651, #41552, #42254)torch.functional.atleast_{1d/2d/3d} added (#41317)torch.isreal added (#41298)nn.Unflatten added (#41564)torch.movedim added (#41480)torch.isposinf, torch.isneginf added (#41588)torch.signbit added (#41589)torch.absolute added (#42586)torch.clip alias added (#42770)torch.quantile added (#42755)torch.linalg.det and torch.outer alias added (#42802)torch.nansum added (#38628)torch.hypot added (#42291)torch.nextafter added (#42580)torch.hstack, torch.vstack, torch.dstack added (#42799)torch.arccosh alias added (#43107)Tensor.movedim as a method added (#43122)torch.matrix_exp added (#40161)torch.fix alias added (#43326)torch.arccos, torch.arcsin, torch.arctan aliases added (#43319)torch.negative alias added (#43400)torch.maximum, torch.minimum added (#42579)torch.arctanh, torch.arcsinh aliases added (#43762)torch.linalg.norm added (#42749, #43907)torch.amax, torch.amin added (#43819)torch.heaviside added (#42523)torch.i0 added (#43132)torch.not_equal, torch.greater, torch.greater_equal, torch.less, torch.less_equal aliases added (#43870)torch.exp2 added (#44184)torch.kaiser_window added (#44271)torch.nanquantile added (#44393)torch.multiply, torch.divide aliases added (#44463)nn.TripletMarginWithDistanceLoss added (#43680)torch.fft.fft, torch.fft.ifft, torch.fft.rfft, torch.fft.irfft, torch.fft.hfft, torch.fft.ihfft added (#43011)torch.fft.fftn, torch.fft.ifftn, torch.fft.rfftn, torch.fft.irfftn added (#44550)optim.functional.adagrad added (#44715)optim.functional.adam added (#44791)torch.complex, torch.polar added (#39617)Tensor. __complex__ added (#43844)torch.vdot added (#43004)API extension:
torch.full added support for bool and integer dtypes (#41912)torch.lt and torch.masked_select added support for half dtype (#43704)torch.div, torch.true_divide, torch.atan2 added support for integer to float type promotion in (#42359)unflatten added support for non-named dimensions (#42563)torch.polygamma added support for n >= 2 (#42499)torch.qr added backward support for wide input matrices (#42216)nn.Linear for MKLDNN added support for no-bias (#43703)torch.lerp added support for half dtype (#43541)torch.div to perform true division (end of deprecation cycle) (#42907)torch.scatter added support for reductions on CUDA (#41977)torch.pow (#44760), unary ops and activations (#44813, #44824, #44834), torch.i0 (#44750), softmax (#44837), div, addcdiv, addcmul, mean, var (#44758), layernorm (#45002),all pooling layers (#44836, #45151)), torch.logspace (CPU and CUDA) (#44675), random kernels on Windows (#44918), torch.addmm, torch.addmv (#44986), loss functions (#45011), batched gemm (#45167), nccl path (#38515), binary logical operators (#42485), torch.neg (#45240), Conv (non-cuDNN) (#45007), torch.abs (#44804), torch.erfinv (#43399), comparison ops (#44748)torch.asin, torch.neg added support for sparse Tensors (#44028)torch.softmax added support for CUDA (#42307)Tensor.{real,imag} added setter for these attributes (#39860)torch.{addmm,addmv} added support for complex on CUDA (#40431, #43827)torch.bmm added support for complex on CPU #42383,torch.{dot, vdot} added support for complex (#42745)torch.stft, torch.istft added support for complex (#43886)torch.cholesky added support for complex (#44895, #45267)torch.sgn added (to support complex) (#39955)autograd.Function (#41821)autograd.functional API (#43428)reset_grad API to remove gradient instead of setting them to zero (#44423)torch.autograd.gradcheck (#43877)@torch.no_grad() decorator (#44633)torch.lobpcg backward (#43002)torch.cuda.amp.GradScaler now supports sparse gradients (#36786)backends.cudnn.allow_tf32 flag to control it (#40737)torch.cuda.memory.list_gpu_processes to list running processes on a give GPU (#44616)atomicAdd() (#41538)nn::TransformerEncoderLayer added (#42633)nn::TransformerDecoderLayer added (#42717)nn::TransformerEncoder added (#43187)nn::TransformerDecoder added (#42886)nn::Transformer added (#44333)nn::Unflatten added (#42613)nn.ParameterList added (#41259)torch::cuda::manual_seed and torch::cuda::manual_seed_all added (#42638)mobile_optimized boolean flag to optimized model. (#45479)adaptive_avg_pool2d (#41220), mm (#41221), reshape (#41223), max_pool2d (#41379), add_ and relu_ (#41380), cat (#41434), add and mul (#42674) and avg_pool2d (#42675).torch.utils.optimize_for_vulkan (#44903)aten::repeat (#40644)aten::apend (#40743)stack (#42187)fill_ (#43303)clone for per channel affine quantized tensor (#44573)append (graphmode) (#44641)torch.set_deterministic and torch.is_deterministic: Raise error when the flag is set and a non-deterministic operation is used (#15359, #41377)CONTRIBUTING.md (#42635, #43294)prefetch_factor argument to control the number of batch loaded ahead of time(#41130)np.memmap objects (#39847)utils.cpp_extension (#41257, #43528)nn.ReflectionPad: Add support for 0-dim batch sizes. (#39231)torch.scatter: Add reductions for CPU (#36447)torch.iinfo, torch.finfo: Improve printing (#40488)torch.where: Add support for scalar input (#40336)torch.nonzero: Remove deprecation warning for as_tuple argument (#45413)torch.distributions.Categorical: Clamp logit to avoid -inf when calculating entropy (#41002)torch.futures.Future: Add done function to query the status of the future (#42013)nn.EmbeddingBag: Add support for incude_last_offset=True when reduction is mean or max (#42215)nn.AvgPooling{1,2,3}d: Ensure all cells are valid in ceil mode to avoid division by 0 (#41368)nn,[Adaptive]MaxPool{1,2,3}d: Handle edge case when input is filled with -inf (#40665)nn.Hardsigmoid, nn.Hardswish: Add inplace option (#42346)nn.MSELoss, nn.L1Loss, nn.SmoothL1Loss: Add support for target that requires gradients. (#44437, #44471, #44486)nn.Parameter{List,Dict}: Add warning when improperly used (with DataParallel or weight_norm) (#44405)nn.functional.smooth_l1: Add beta parameter (#44433)torch.cuda.nccl APIs. (#43247)@torch.no_grad (#41371)del to TorchScript classes (#44352)@torch.jit.unused syntax for ignoring properties (#45261)@torch.jit.unused on a @torch.no_grad decorated function (#41496)ScriptModule`s vs. freshly created ones (#43298)to_backend API now accepts wrapped modules (#43612)whitelist to allowlist (#41771, #41802)dequantize now supports list and tuple of tensors (#41079)register_activation_post_process_hook function (#42342)add/mul now support different variants (#42769)OP_LIST_TO_FUSER_METHOD is exposed to the user (#43286)quantize_jit can handle new upsample overloads (#43407)convert_jit can now take preserved_attrs argument (#44490)SyncBN: preserve qconfig if it exists (#45317)state_dict (#44846)conv parameters (#43524, #43086, #43651, #44671)In PyTorch 1.7, we have continued to add and improve PyTorch operator export to ONNX. We have enabled export of 10 new operators, and further enhanced and optimized export of 10+ torch operators to ONNX. We have also focused on improving export of TorchScript modules, in particular laying some groundwork required for better support in near future. We have also created an API (torch.onnx.utils._find_missing_ops_onnx_export) as a diagnostic tool (preview only) to get a list of operators in a model that are not supported or implemented by ONNX exporter. Support for export of torch.quantization.FakeQuantize has also been added to help enable some QAT workflows.
Add support to export more torch ops torch.view_as (#40496), fake quantize functions (#39738), embedding_bag (#41234, #44693), torch.eye (#41357), Tensor.as_strided (#41569), torch.tensor (#41872), addition between list of tensors (#41888), Tensor. __floordiv__ (#43022), torch.nn.KLDivLoss (#41858), Tensor.new_empty and Tensor.new_zeros (#43506)
π Improves existing export logic and optimizing exported ONNX graph
torch.full_like (#40063)torch.where export, add support for ByteTensor (#42264)torch.scatter export, add support for src being scalar or different dtype (#42765, #43440)torch.where (#41544)torch.slice (#42935), torch.split (#43670), torch.repeat (#43430), torch.arange (#43777), len (#43824), torch.narrow (#44039), flatten (#40418), adaptive_pool (#46100)β‘οΈ Update export to follow pytorch changes
torch.utils.collect_env: Collect more informations (python 32/64bit, clang version, CPU architecture, ROCm version) (#42887, #42961, #44106)torch.hub.load_local: Allow to load models from any local directory (#44204)import torch is called from the source root (#39995)--continue-through-error option to run_test.sh script (#41136)run_name and `hparam\_domain\_discreteinadd\_hparams (#40660, #40720)with_source parameter to enable tracking source code (#43898)zero_grad, avoid using inpalce detach when it is not required (#41283)torch.div backward formula to improve numerical stability (#43627)torch.repeat as only input.dim() is needed in backward (#40766)device_count and cuda init error detection and messages (#42249)torch/*.py (#40235, #40873)Tensor attributes and methods: T and grad_fn (#40879), Tensor._version (#41125), ndim (#42909), nonzero (#43053), #40499)torch.serialization (#40862)torch.tensor (#45077)torch.Size (#40879)torch.futures (#41675)torch.random (#42234)torch.hub (#42252)collect_env.py (#43062)torch.utils (#39392, #42647, #42711, #42960, #43806, #44136, #44216)torch.nn (#43044, #44093, #43080, #42231, #40669)torch.sparse (#43108)torch.cuda.nvtx (#43443)torch.cuda.memory (#43444)torch.functional (#43446)torch.autograd (#44451, #46206)torch.quantization.fuse_modules (#43786)torch.nn.quantized (#43186, #44154, #43110)torch.testing._internal submodules (#44575, #44805, #44832, #44911, #44927, #44985, #44971, #45107, #45368, #45375)torch.backends.quantized (#44794)torch.backends.cuda (#44916)torch.cuda.{comm,nccl,amp} (#45350, #45344, #45480)torch.quasirandom (#45434)jit.trace and onnx.export (#41093)torch/optim/lr_scheduler.pyi (#41775, #41866)torch.linspace: Fix step computation for large integral types (#40132)torch.pca_lowrank: Fix un-expected memory consumption (#40853)torch.linspace: Fix behavior for non-contiguous inputs on CPU (#41286)torch.div: Fix division by low precision scalar (#41446)torch.expm1: disable mkl as it produces wrong values in some cases (#41654)torch.utils.data.RandomSampler: Stop generating samples one at a time when replacement=True (#41682)torch.nn.functional.grid_sample: Fix 64-bit indexing (#41923)torch.nn.functional.grid_sample: Fix crash when grid has NaNs (#42703)torch.det: Fix on CPU (#35136)torch.interpolate: Avoid zero division in cubic mode (#42093)torch.fmod: Fix to work with zero divisors consistently (#41948)torch.masked_select: Fix for discontiguous outputs (#41841)torch.cummin, torch.cummax: Fix for discontiguous inputs/outputs (#42507)torch.einsum: Fix for discontiguous inputs (#42425)torch.orgqr: Fix input size conditions (#42825)torch.manual_seed: Fix argument unpacking (#42206)torch.searchsorted: Properly mark output as non differentiable (#42933)torch.bucketize: Properly mark output as non differentiable (#44102)torch.addmm: Properly raise error on device mismatch (#43505)torch.chain_matmul: Properly handle empty args (#43553)torch.multinomial: Properly handle 0 size dim (#43775)torch.cholesky_solve: Fix broadcast and error checking (#43137)torch.movedim: Fix uniqueness check (#44307)torch.min, torch.max, torch.mean: Properly throw error if dim is repeated (#44281)torch.lerp: Fix for discontiguous outputs on CUDA (#44559)torch.addmv, torch.mv: Fix beta=0 case in slow path (#44681)torch.triangular_solve: Fix error check on CPU (#44720)torch.empty_like, torch.zeros_like: Properly raise error if any memory format is provided with sparse input (#44058)torch.atan2: Fix type promotion (#43466)torch.repeat: Fix backward for 0 size repeats (#45212)torch.min, torch.max, torch.median: Fix handling of nan in backward (#45280)torch.rdiv: Properly make it consistent with div (#45407)torch.std: Fix hanling of nan in backward (#45468)torch.distributions.Binomial: Fix CUDA sampling at extreme points (#42702)torch.dot, torch.vdot: Add complex support (#45074)torch.pow: Fix when scalar base is complex (#45259)torch.round, torch.abs_: Disable complex inputs (#45330)torch.svd: Fix memory corruption for complex inputs (#45486)torch.view_as_complex: Fix zero dimensional input (#44175)torch.kthvalue: Fix for non-contiguous input (#46177)torch.save: Fix python binding that could lead to out of bound read (#46207)nn.ModuleDict: Fix input dict key ordering (#40905)nn.LayerNorm: Fix handling of gamma in the backward when create_graph=True (#41595)nn.functional.{max,avg}_pool{1,2,3}d: Raise RuntimeError for zero stride (#41819)nn.Module: Fix missing attribute when loading model from older version (#42290)nn.Embedding: Raise proper error for 0-D weight (#42550)nn.SyncBatchNorm: Fix forward pass for non-default process group (#43861)nn.functional.embedding_bag: Fix for non-contiguous weight (#44032)nn.functional.upsample: Add nondeterministic checks (df6ea62)nn.GroupNorm: Fix bug when input does not require_grad on CUDA (#44863)functional.{l1_loss,smoothl1_loss,mse_loss}: Properly check that reduction strings are valid (#43527)functional.smoothl1_loss: Properly raise error for negative beta values (#45759)functional.pad: Fix extra memory allocation and invalid result for negative or zero pad when using circular padding (#39273)nn::MultiheadAttention: Ensure all parameters are properly registered (#42037)Tensor::grad: Fix the thread safety issues (#40887)Tensor::var: Ensure that var(0) does not call the var(bool keepdim) overload but var(int dim) (#40451)torch.jit.freeze import (#42319)List[str].index (#40348)torch.jit.is_tracing() so that it is correctly called rather than returning the method itself (#42486)NaN propagation in fuser's min/max implementation (#43590)NaN propagation in TensorExpression fuser's min/max implementation (#43609)ScriptModules (#43284)unsigned char, and abs(int) (#44157)len, contains, getitem inherited from interface class derived from nn container (#40789)torch.tensor for empty multidimensional-typed lists (#44652)insert_observers (#40624)SplitWithMask when splitting multiple times (#45141)inferred=True (#45360)set_grad_enabled scripted version (#46060)dict.update() scripted version (#46105)qlinear_dynamic: Fix ASAN error in QNNPACK's integration. (#41967)nn.Sequential (#19227)ignore_index for nll loss (#44816)onnx::SsaRewrite (#42148)torch.hub for new zipfile format. (#42333)optim.SparseAdam: Fix check that params are dense on init (#43668)nn::MultiheadAttention: Fix parameter registration (#42037)BUILD.bazel (#40536)autograd.gradcheck: Add support for complex (#43208)torch.{view_as_complex,view_as_real}: Remove unnecessary temporary Tensor (#44908)tensorboard.SummaryWriter.add_audio: Remove unnecessary for loops (#44201)Conv2d and Conv3d: bypass the im2col for 1x1 conv (#40324)max_pool2d perf regression (#41174)conv2d in some special cases (#40610)addmm: Reduce constant time overhead (#41374)cumsum, cumprod: Enable non-synchronizing cub scan for cum* operations (#42036)max_pool2d: CUDA NCHW performance improvement (#42182)arenge: Vectorize CPU implementation (#38697)istft: optimize by using col2im (#42826)LayerNorm: improved performance on CPU both forward and backward (#35750)silu: improved performance (#42976)addmv: improved performance for zero sized input cases (#41824)MaxPool1d: improved performance for cases without indices (#43745)adaptive_avg_pool2d: optimized code path for cases when output size is (1, 1) (#44211)cat: optimized cuda kernel (#44833)bitwise_not (#45103)local_used_maps_dev_ when find_unused_param=False in DDP to improve performance (#40407)KernelSumMultipleAxes (#43905)torch.cat (#46323)nn.Module.training to docs (#40923)nn.CrossEntropyLoss: Clarify that the mean argument is weighted (#40991)torch.scatter_: Update doc with support for reduction methods. (#40962)torch.optim.swa_utils (#41228)torch.scatter to include streams parameter. (#42814)Tensor.clone doc (#42931, #43098)torch.Tensor.is_set_to documentation (#43052)torch.qr documentation to include a warning about when the QR.backward is well-defined. (#43547)torch.sub properly, adds torch.subtract alias (#43850)nn.Batchnorm track_running_stats docs (#44445)torch.heaviside docs (#44481)torch.median doc to explain returned value for even-sized input (#44562)nn.ELU formula in the docs (#43764)torch.min, torch.max: remove incorrect warning from docs (#44615)torch.cuda.amp tutorial from core amp docs (#44725)torch.floor_divide documentation to clarify it's actually torch.trunc_divide (#45411)torch.fft doc and make warning clearer (#45409)nn.Flatten docs (#42084)all_gather_object and gather_object documentation (#43772)torch.jit.trace_module documentation (#40248)torch.jit.trace_module (#41586)PYTORCH_JIT_TYPE_VERBOSITY (#42241)torch.no_grad etc. (#45232)pixel_shuffle (#45661)torch.poisson in documentation (#45656)