Net Tick Time

This guide covers a mechanism used by RabbitMQ nodes and CLI tools (well, Erlang nodes) to determine peer [un]availability, known as "net ticks" or kernel.net_ticktime. See the Erlang kernel documentation for more details.


Each pair of nodes in a cluster are connected by the transport layer. Periodic tick messages are exchanged between all pairs of nodes to maintain the connections and to detect disconnections. Network interruptions could otherwise go undetected for a fairly long period of time (depending on the transport and OS kernel settings e.g. for TCP). Fundamentally this is the same problem that heartbeats seek to address in messaging protocols, just between different peers: RabbitMQ cluster nodes and CLI tools.

Nodes and connected CLI tools periodically send each other small data frames. If no data was received from a peer in a given period of time, that peer is considered to be unavilable ("down").

When one RabbitMQ node determines that another node has gone down it will log a message giving the other node's name and the reason, like:

=INFO REPORT==== 23-Sep-2014::16:21:22 ===
node rabbit@cordelia down: net_tick_timeout

In this case the net_tick_timeout tells us that the other node was detected as down due to the net ticktime being exceeded. Another common reason is connection_closed, meaning that the connection was explicitly closed at the TCP level.

Tick Frequency

The frequency of both tick messages and detection of failures is controlled by the net_ticktime configuration setting. Normally four ticks are exchanged between a pair of nodes every net_ticktime seconds. If no communication is received from a node within net_ticktime (± 25%) seconds then the node is considered down and no longer a member of the cluster.

Increasing the net_ticktime across all nodes in a cluster will make the cluster more resilient to short network outtages, but it will take longer for remaing nodes to detect crashed nodes. Conversely, reducing the net_ticktime across all nodes in a cluster will reduce detection latency, but increases the risk of detecting spurious partitions.

The impact of changing the default net_ticktime should be carefully considered. All nodes in a cluster must use the same net_ticktime. The following sample rabbitmq.config configuration demonstrates doubling the default net_ticktime from 60 to 120 seconds:

        {rabbit, [{tcp_listeners, [5672]}]},
        {kernel, [{net_ticktime,  120}]}


The HTTP API often needs to perform cluster-wide queries which has the effect that the UI can appear unresponsive until a partition is detected and handled. Lowering net_ticktime can help to improve the responsiveness during such events but any decision to change net_ticktime should be done carefully as emphasised above.