附上源码:(注意源码中的这句话"The norm is computed over all gradients together"——计算所有梯度的范数)
defclip_grad_norm_(parameters, max_norm, norm_type=2):r"""Clips gradient norm of an iterable of parameters.
The norm is computed over all gradients together, as if they were
concatenated into a single vector. Gradients are modified in-place.
Arguments:
parameters (Iterable[Tensor] or Tensor): an iterable of Tensors or a
single Tensor that will have gradients normalized
max_norm (float or int): max norm of the gradients
norm_type (float or int): type of the used p-norm. Can be ``'inf'`` for
infinity norm.
Returns:
Total norm of the parameters (viewed as a single vector).
"""ifisinstance(parameters, torch.Tensor):
parameters =[parameters]#第一步
parameters =list(filter(lambda p: p.grad isnotNone, parameters))
max_norm =float(max_norm)
norm_type =float(norm_type)if norm_type == inf:
total_norm =max(p.grad.data.abs().max()for p in parameters)else:
total_norm =0for p in parameters:#第二步
param_norm = p.grad.data.norm(norm_type)#第三步
total_norm += param_norm.item()** norm_type
total_norm = total_norm **(1./ norm_type)
clip_coef = max_norm /(total_norm +1e-6)if clip_coef <1:for p in parameters:
p.grad.data.mul_(clip_coef)return total_norm