Hello,
I am trying to copy weights from a PyTorch GRU layer to TensorFlow.
My layers def are:
layers.GRU(units=100, activation=None, dropout=0.0, reset_after=True) nn.GRU(input_size=400, hidden_size=100, bidirectional=False, dropout=0.0, batch_first=True)
I copy the weights using:
w_hh = pt_layer.weight_hh_l0.detach().numpy().T # (hidden_size, 3*hidden_size)
b_ih = pt_layer.bias_ih_l0.detach().numpy()
b_hh = pt_layer.bias_hh_l0.detach().numpy()
# Helper funcs to reorder gates: PyTorch [z, r, n] -> TensorFlow [r, z, n]
def reorder_gru_weights(w):
z, r, n = np.split(w, 3, axis=1)
return np.concatenate([r, z, n], axis=1)
def reorder_gru_bias(b):
b_z, b_r, b_n = np.split(b, 3)
return np.concatenate([b_r, b_z, b_n])
w_ih = reorder_gru_weights(w_ih)
w_hh = reorder_gru_weights(w_hh)
b_ih = reorder_gru_bias(b_ih)
b_hh = reorder_gru_bias(b_hh)
reset_after = getattr(tf_layer, 'reset_after', False)
if reset_after:
bias = np.stack([b_ih, b_hh], axis=0)
else:
bias = b_ih + b_hh
tf_shapes = [w.shape for w in tf_layer.get_weights()]
new_weights = [w_ih, w_hh, bias]
for i, (target, actual) in enumerate(zip(tf_shapes, [w.shape for w in new_weights])):
if target != actual:
raise ValueError(f"Mismatch at weight {i}: expected {target}, got {actual}")
tf_layer.set_weights(new_weights)
Running the two layers, I get:
TensorFlow output: [[0.4295744]]
PyTorch output: [[0.44837096]]
Max absolute difference: 0.018796563
PS: The networks contain other Linear and Conv layers. When I remove the GRU layers, that difference is significantly smaller (around 0.00002)
Am I missing something in how I copy the weights? Thanks