Description
For a while now my main focus has been moving mixed precision functionality into Pytorch core. It was merged about a month ago:
https://pytorch.org/docs/master/amp.html
https://pytorch.org/docs/master/notes/amp_examples.html
and is now usable via master or nightly pip/conda packages. (Full features did not make the 1.5 release, unfortunately.)
torch.cuda.amp
is more flexible and intuitive, and the native integration brings more future optimizations into scope. Also, torch.cuda.amp
fixes many of apex.amp
's known pain points. Some things native amp can handle that apex amp can't:
- Guaranteed Pytorch version compatibility, because it's part of Pytorch
- No need to build extensions
- Windows support
- Bitwise accurate saving/restoring
- DataParallel and intra-process model parallelism (although we still recommend torch.nn.DistributedDataParallel with one GPU per process as the most performant approach)
- Gradient penalty (double backward)
torch.cuda.amp.autocast()
has no effect outside regions where it's enabled, so it should serve cases that formerly struggled with multiple calls toapex.amp.initialize()
(including cross-validation) without difficulty. Multiple convergence runs in the same script should each use a fresh GradScaler instance, but GradScalers are lightweight and self-contained so that's not a problem.- Sparse gradient support
If all you want is to try mixed precision, and you're comfortable using a recent Pytorch, you don't need Apex.