torch.cuda.amp > apex.amp

For a while now my main focus has been moving mixed precision functionality into Pytorch core.  It was merged about a month ago:
https://pytorch.org/docs/master/amp.html
https://pytorch.org/docs/master/notes/amp_examples.html
and is now usable via master or nightly pip/conda packages.  (Full features did not make the 1.5 release, unfortunately.)

`torch.cuda.amp` is more flexible and intuitive, and the native integration brings more future optimizations into scope.  Also, `torch.cuda.amp` fixes many of `apex.amp`'s known pain points.  Some things native amp can handle that apex amp can't:
- Guaranteed Pytorch version compatibility, because it's part of Pytorch
- No need to build extensions
- Windows support
- Bitwise accurate [saving](https://pytorch.org/docs/master/amp.html#torch.cuda.amp.GradScaler.state_dict)/[restoring](https://pytorch.org/docs/master/amp.html#torch.cuda.amp.GradScaler.load_state_dict)
- [DataParallel](https://pytorch.org/docs/master/notes/amp_examples.html#dataparallel-in-a-single-process) and intra-process model parallelism (although we still recommend [torch.nn.DistributedDataParallel](https://pytorch.org/docs/master/notes/amp_examples.html#distributeddataparallel-one-gpu-per-process) with one GPU per process as the most performant approach)
- [Gradient penalty](https://pytorch.org/docs/master/notes/amp_examples.html#gradient-penalty) (double backward)
- `torch.cuda.amp.autocast()` has no effect outside regions where it's enabled, so it should serve [cases that formerly struggled with multiple calls to `apex.amp.initialize()`](https://github.com/NVIDIA/apex/issues/439) (including [cross-validation](https://github.com/NVIDIA/apex/issues/392#issuecomment-610038073)) without difficulty.  Multiple convergence runs in the same script [should each use a fresh GradScaler instance](https://github.com/NVIDIA/apex/issues/439#issuecomment-610028282), but GradScalers are lightweight and self-contained so that's not a problem.
- Sparse gradient support

If all you want is to try mixed precision, and you're comfortable using a recent Pytorch, you don't need Apex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch.cuda.amp > apex.amp #818

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.cuda.amp > apex.amp #818

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions