Copy weights from PyTorch GRU layer to TensorFlow

Hello,

I am trying to copy weights from a PyTorch GRU layer to TensorFlow.
My layers def are:

layers.GRU(units=100, activation=None, dropout=0.0, reset_after=True) nn.GRU(input_size=400, hidden_size=100, bidirectional=False, dropout=0.0, batch_first=True)

I copy the weights using:

  w_hh = pt_layer.weight_hh_l0.detach().numpy().T  # (hidden_size, 3*hidden_size)
  b_ih = pt_layer.bias_ih_l0.detach().numpy()
  b_hh = pt_layer.bias_hh_l0.detach().numpy()

  # Helper funcs to reorder gates: PyTorch [z, r, n] -> TensorFlow [r, z, n]
  def reorder_gru_weights(w):
      z, r, n = np.split(w, 3, axis=1)
      return np.concatenate([r, z, n], axis=1)

  def reorder_gru_bias(b):
      b_z, b_r, b_n = np.split(b, 3)
      return np.concatenate([b_r, b_z, b_n])

  w_ih = reorder_gru_weights(w_ih)
  w_hh = reorder_gru_weights(w_hh)
  b_ih = reorder_gru_bias(b_ih)
  b_hh = reorder_gru_bias(b_hh)


  reset_after = getattr(tf_layer, 'reset_after', False)

  if reset_after:
      bias = np.stack([b_ih, b_hh], axis=0)
  else:
      bias = b_ih + b_hh

  tf_shapes = [w.shape for w in tf_layer.get_weights()]
  new_weights = [w_ih, w_hh, bias]

  for i, (target, actual) in enumerate(zip(tf_shapes, [w.shape for w in new_weights])):
      if target != actual:
          raise ValueError(f"Mismatch at weight {i}: expected {target}, got {actual}")

  tf_layer.set_weights(new_weights)

Running the two layers, I get:
TensorFlow output: [[0.4295744]]
PyTorch output: [[0.44837096]]
Max absolute difference: 0.018796563

PS: The networks contain other Linear and Conv layers. When I remove the GRU layers, that difference is significantly smaller (around 0.00002)

Am I missing something in how I copy the weights? Thanks

The issue was with the activation function of the TensorFlow GRU layer. It should not be set to “None” but to ‘tanh’. That makes the two layers equivalent.