Menu

Net Tick Time

This page explains the Erlang net_ticktime configuration setting. See the Erlang kernel documentation for more details.

Overview

Each pair of nodes in a cluster are connected by the transport layer. Periodic tick messages are exchanged between all pairs of nodes to maintain the connections and to detect disconnections. Network interruptions could otherwise go undetected for a period that depends on the transport.

When one RabbitMQ node determines that another node has gone down it will log a message giving the other node's name and the reason, like:

=INFO REPORT==== 23-Sep-2014::16:21:22 ===
node rabbit@cordelia down: net_tick_timeout
      

In this case the net_tick_timeout tells us that the other node was detected as down due to the net ticktime being exceeded. Another common reason is connection_closed, meaning that the connection was explicitly closed at the TCP level.

Tick Frequency

The frequency of both tick messages and detection of failures is controlled by the net_ticktime configuration setting. Normally four ticks are exchanged between a pair of nodes every net_ticktime seconds. If no communication is received from a node within net_ticktime (± 25%) seconds then the node is considered down and no longer a member of the cluster.

Increasing the net_ticktime across all nodes in a cluster will make the cluster more resilient to short network outtages, but it will take longer for remaing nodes to detect crashed nodes. Conversely, reducing the net_ticktime across all nodes in a cluster will reduce detection latency, but increases the risk of detecting spurious partitions.

The impact of changing the default net_ticktime should be carefully considered. All nodes in a cluster must use the same net_ticktime. The following sample rabbitmq.config configuration demonstrates doubling the default net_ticktime from 60 to 120 seconds:

    [
        {rabbit, [{tcp_listeners, [5672]}]},
        {kernel, [{net_ticktime,  120}]}
    ].

HTTP API

The HTTP API often needs to perform cluster-wide queries which has the effect that the UI can appear unresponsive until a partition is detected and handled. Lowering net_ticktime can help to improve the responsiveness during such events but any decision to change net_ticktime should be done carefully as emphasised above.