Diff - c7e86acfcee30794dc99a0759924bf7b9d43f1ca^! - kernel/common

commit	c7e86acfcee30794dc99a0759924bf7b9d43f1ca	[log] [tgz]
author	David Howells <[email protected]>	Thu Nov 01 13:39:53 2018 +0000
committer	David S. Miller <[email protected]>	Fri Nov 02 23:59:26 2018 -0700
tree	c31ab320a0a156e97a0442c30e8859071a11d178
parent	284fb78ed7572117846f8e1d1d8e3dbfd16880c2 [diff] [blame]

rxrpc: Fix lockup due to no error backoff after ack transmit error

If the network becomes (partially) unavailable, say by disabling IPv6, the
background ACK transmission routine can get itself into a tizzy by
proposing immediate ACK retransmission.  Since we're in the call event
processor, that happens immediately without returning to the workqueue
manager.

The condition should clear after a while when either the network comes back
or the call times out.

Fix this by:

 (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
     This will allow a certain amount of time to elapse before we try
     again.

 (2) Enforce a return to the workqueue manager after a certain number of
     iterations of the call processing loop.

 (3) Add a backoff delay that increases the delay on deferred ACKs by a
     jiffy per failed transmission to a limit of HZ.  The backoff delay is
     cleared on a successful return from kernel_sendmsg().

 (4) Cancel calls immediately if the opening sendmsg fails.  The layer
     above can arrange retransmission or rotate to another server.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 382196e..bc628ac 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h

@@ -611,6 +611,7 @@ struct rxrpc_call {
 						 * not hard-ACK'd packet follows this.
 						 */
 	rxrpc_seq_t		tx_top;		/* Highest Tx slot allocated. */
+	u16			tx_backoff;	/* Delay to insert due to Tx failure */
 
 	/* TCP-style slow-start congestion control [RFC5681].  Since the SMSS
 	 * is fixed, we keep these numbers in terms of segments (ie. DATA