Menu

Clustering Guide

Overview

This guide covers fundamental topics related to RabbitMQ clustering:

  • Requirements for clustering
  • What data is and isn't replicated between cluster nodes
  • What clustering means for clients
  • How clusters are formed
  • How nodes authenticate to each other
  • Node restarts and How nodes rejoin their cluster
  • How to remove a cluster node
and more. Cluster Formation and Peer Discovery is a closely related guide that focuses on peer discovery and cluster formation automation-related topics. For queue contents (message) replication, see the Mirrored Queues guide.

A RabbitMQ broker is a logical grouping of one or several Erlang nodes, each running the RabbitMQ application and sharing users, virtual hosts, queues, exchanges, bindings, and runtime parameters. Sometimes we refer to the collection of nodes as a cluster.

Cluster Formation

Ways of Forming a Cluster

A RabbitMQ cluster can formed in a number of ways:

Please refer to the Cluster Formation guide for details.

The composition of a cluster can be altered dynamically. All RabbitMQ brokers start out as running on a single node. These nodes can be joined into clusters, and subsequently turned back into individual brokers again.

Cluster Formation Requirements

RabbitMQ nodes address each other using domain names, either short or fully-qualified (FQDNs). Therefore hostnames of all cluster members must be resolvable from all cluster nodes, as well as machines on which command line tools such as rabbitmqctl might be used.

Hostname resolution can use any of the standard OS-provided methods:

  • DNS records
  • Local host files (e.g. /etc/hosts)
In more restrictive environments, where DNS record or hosts file modification is restricted, impossible or undesired, Erlang VM can be configured to use alternative hostname resolution methods, such as an alternative DNS server, a local file, a non-standard hosts file location, or a mix of methods. Those methods can work in concert with the standard OS hostname resolution methods.

To use FQDNs, see RABBITMQ_USE_LONGNAME in the Configuration guide.

Port Access

RabbitMQ nodes bind to ports (open server TCP sockets) in order to accept client and CLI tool connections. Other processes and tools such as SELinux may prevent RabbitMQ from binding to a port. When that happens, the node will fail to start. CLI tools, client libraries and RabbitMQ nodes also open connections (client TCP sockets). Firewalls can prevent nodes and CLI tools from communicating with each other. Make sure the following ports are accessible:

  • 4369: epmd, a peer discovery service used by RabbitMQ nodes and CLI tools
  • 5672, 5671: used by AMQP 0-9-1 and 1.0 clients without and with TLS
  • 25672: used for inter-node and CLI tools communication (Erlang distribution server port) and is allocated from a dynamic range (limited to a single port by default, computed as AMQP port + 20000). Unless external connections on these ports are really necessary (e.g. the cluster uses federation or CLI tools are used on machines outside the subnet), these ports should not be publicly exposed. See networking guide for details.
  • 35672-35682: used by CLI tools (Erlang distribution client ports) for communication with nodes and is allocated from a dynamic range (computed as server distribution port + 10000 through server distribution port + 10010). See networking guide for details.
  • 15672: HTTP API clients, management UI and rabbitmqadmin (only if the management plugin is enabled)
  • 61613, 61614: STOMP clients without and with TLS (only if the STOMP plugin is enabled)
  • 1883, 8883: (MQTT clients without and with TLS, if the MQTT plugin is enabled
  • 15674: STOMP-over-WebSockets clients (only if the Web STOMP plugin is enabled)
  • 15675: MQTT-over-WebSockets clients (only if the Web MQTT plugin is enabled)
It is possible to configure RabbitMQ to use different ports and specific network interfaces.

Nodes in a Cluster

What is Replicated?

All data/state required for the operation of a RabbitMQ broker is replicated across all nodes. An exception to this are message queues, which by default reside on one node, though they are visible and reachable from all nodes. To replicate queues across nodes in a cluster, see the documentation on high availability (note: this guide is a prerequisite for mirroring).

Nodes are Equal Peers

Some distributed systems have leader and follower nodes. This is generally not true for RabbitMQ. All nodes in a RabbitMQ cluster are equal peers: there are no special nodes in RabbitMQ core. This topic becomes more nuanced when queue mirroring and plugins are taken into consideration but for most intents and purposes, all cluster nodes should be considered equal

Many CLI tool operations can be executed against any node. An HTTP API client can target any cluster node.

Individual plugins can designate (elect) certain nodes to be "special" for a period of time. For example, federation links are colocated on a particular cluster node. Should that node fail, the links will be restarted on a different node.

In versions older than 3.6.7, RabbitMQ management plugin used a dedicated node for stats collection and aggregation.

How CLI Tools Authenticate to Nodes (and Nodes to Each Other): the Erlang Cookie

RabbitMQ nodes and CLI tools (e.g. rabbitmqctl) use a cookie to determine whether they are allowed to communicate with each other. For two nodes to be able to communicate they must have the same shared secret called the Erlang cookie. The cookie is just a string of alphanumeric characters up to 255 characters in size. It is usually stored in a local file. The file must be only accessible to the owner (e.g. have UNIX permissions of 600 or similar). Every cluster node must have the same cookie.

If the file does not exist, Erlang VM will try to create one with a randomly generated value when the RabbitMQ server starts up. Using such generated cookie files are appropriate in development environments only. Since each node will generate its own value independently, this strategy is not really viable in a clustered environment. Erlang cookie generation should be done at cluster deployment stage, ideally using automation and orchestration tools such as Chef, Puppet, BOSH, Docker or similar.

On UNIX systems, the cookie will be typically located in /var/lib/rabbitmq/.erlang.cookie (used by the server) and $HOME/.erlang.cookie (used by CLI tools). Note that since the value of `$HOME` varies from user to user, it's necessary to place a copy of the cookie file for each user that will be using the CLI tools. This applies to both non-privileged users and `root`.

On Windows, the cookie file location varies depending on Erlang version used and whether the HOMEDRIVE or HOMEPATH environment variables are set.

With Erlang versions starting with 20.2, the cookie file locations are:

  • %HOMEDRIVE%%HOMEPATH%\.erlang.cookie (usually C:\Users\%USERNAME%\.erlang.cookie for user %USERNAME%) if both the HOMEDRIVE and HOMEPATH environment variables are set
  • %USERPROFILE%\.erlang.cookie (usually C:\Users\%USERNAME%\.erlang.cookie) if HOMEDRIVE and HOMEPATH are not both set
  • For the RabbitMQ Windows service - %USERPROFILE%\.erlang.cookie (usually C:\WINDOWS\system32\config\systemprofile)
The cookie file used by the Windows service account and the user running CLI tools must be synchronised.

On Erlang versions prior to 20.2 (e.g. 19.3 or 20.1), the cookie file locations are:

  • %HOMEDRIVE%%HOMEPATH%\.erlang.cookie (usually C:\Users\%USERNAME%\.erlang.cookie for user %USERNAME%) if both the HOMEDRIVE and HOMEPATH environment variables are set
  • %USERPROFILE%\.erlang.cookie (usually C:\Users\%USERNAME%\.erlang.cookie) if HOMEDRIVE and HOMEPATH are not both set
  • For the RabbitMQ Windows service - %WINDIR%\.erlang.cookie (usually C:\Windows\.erlang.cookie)
The cookie file used by the Windows service account and the user running CLI tools must be synchronised.

As an alternative, you can add the option "-setcookie value" in the RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS environment variable value:

RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="-setcookie cookie-value"
This is the least secure option and generally not recommended.

When the cookie is misconfigured (for example, not identical), RabbitMQ will log errors such as "Connection attempt from disallowed node" and "Could not auto-cluster". When a CLI tool such as rabbitmqctl fails to authenticate with RabbitMQ, the message usually says

* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
* suggestion: is the Erlang distribution using TLS?
An incorrectly placed cookie file or cookie value mismatch are most common scenarios for such failures. When a recent Erlang/OTP version is used, authentication failures contain more information and cookie mismatches can be identified better:
* connected to epmd (port 4369) on warp10
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed

* Authentication failed (rejected by the remote node), please check the Erlang cookie
See the CLI Tools guide for more information.

Clustering and Clients

Assuming all cluster members are available, a client can connect to any node and perform any operation. Nodes will route operations to the queue master node transparently to clients.

With all supported messaging protocols a client is only connected to one node at a time.

In case of a node failure, clients should be able to reconnect to a different node, recover their topology and continue operation. For this reason, most client libraries accept a list of endpoints (hostnames or IP addresses) as a connection option. The list of hosts will be used during initial connection as well as connection recovery, if the client supports it. See documentation guides for individual clients to learn more.

There are scenarios where it may not be possible for a client to transparently continue operations after connecting to a different node. They usually involve non-mirrored queues hosted on a failed node.

Clustering and Observability

Client connections, channels and queues will be distributed across cluster nodes. Operators need to be able to inspect and monitor such resources across all cluster nodes.

RabbitMQ CLI tools such as rabbitmq-diagnostics and rabbitmqctl provide commands that inspect resources and cluster-wide state. Some commands focus on the state of a single node (e.g. rabbitmq-diagnostics environment and rabbitmq-diagnostics status), others inspect cluster-wide state. Some examples of the latter include rabbitmqctl list_connections, rabbitmqctl list_mqtt_connections, rabbitmqctl list_stomp_connections, rabbitmqctl list_users, rabbitmqctl list_vhosts and so on.

Such "cluster-wide" commands will often contact one node first, discover cluster members and contact them all to retrieve and combine their respective state. For example, rabbitmqctl list_connections will contact all nodes, retrieve their AMQP 0-9-1 and AMQP 1.0 connections, and display them all to the user. The user doesn't have to manually contact all nodes. Assuming a non-changing state of the cluster (e.g. no connections are closed or opened), two CLI commands executed against two different nodes one after another will produce identical or semantically identical results. "Node-local" commands, however, will not produce identical results since two nodes rarely have identical state: at the very least their node names will be different!

Management UI works similarly: a node that has to respond to an HTTP API request will fan out to other cluster members and aggregate their responses. In a cluster with multiple nodes that have management plugin enabled, the operator can use any node to access management UI. The same goes for monitoring tools that use the HTTP API to collect data about the state of the cluster. There is no need to issue a request to every cluster node in turn.

Node Failure Handling

RabbitMQ brokers tolerate the failure of individual nodes. Nodes can be started and stopped at will, as long as they can contact a cluster member node known at the time of shutdown.

Queue mirroring allows queue contents to be replicated across multiple cluster nodes.

Non-mirrored queues can also be used in clusters. Non-mirrored queue behaviour in case of node failure depends on queue durability.

RabbitMQ clustering has several modes of dealing with network partitions, primarily consistency oriented. Clustering is meant to be used across LAN. It is not recommended to run clusters that span WAN. The Shovel or Federation plugins are better solutions for connecting brokers across a WAN. Note that Shovel and Federation are not equivalent to clustering.

Metrics and Statistics

Every node stores and aggregates its own metrics and stats, and provides an API for other nodes to access it. Some stats are cluster-wide, others are specific to individual nodes. Node that responds to an HTTP API request contacts its peers to retrieve their data and then produces an aggregated result.

In versions older than 3.6.7, RabbitMQ management plugin used a dedicated node for stats collection and aggregation.

Disk and RAM Nodes

A node can be a disk node or a RAM node. (Note: disk and disc are used interchangeably). RAM nodes store internal database tables in RAM only. This does not include messages, message store indices, queue indices and other node state.

In the vast majority of cases you want all your nodes to be disk nodes; RAM nodes are a special case that can be used to improve the performance clusters with high queue, exchange, or binding churn. RAM nodes do not provide higher message rates. When in doubt, use disk nodes only.

Since RAM nodes store internal database tables in RAM only, they must sync them from a peer node on startup. This means that a cluster must contain at least one disk node. It is therefore not possible to manually remove the last remaining disk node in a cluster.

Clustering Transcript with rabbitmqctl

The following is a transcript of manually setting up and manipulating a RabbitMQ cluster across three machines - rabbit1, rabbit2, rabbit3. It is recommended that the example is studied before more automation-friendly cluster formation options are used.

We assume that the user is logged into all three machines, that RabbitMQ has been installed on the machines, and that the rabbitmq-server and rabbitmqctl scripts are in the user's PATH.

This transcript can be modified to run on a single host, as explained more details below.

Starting Independent Nodes

Clusters are set up by re-configuring existing RabbitMQ nodes into a cluster configuration. Hence the first step is to start RabbitMQ on all nodes in the normal way:

rabbit1$ rabbitmq-server -detached
rabbit2$ rabbitmq-server -detached
rabbit3$ rabbitmq-server -detached

This creates three independent RabbitMQ brokers, one on each node, as confirmed by the cluster_status command:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}]
...done.

The node name of a RabbitMQ broker started from the rabbitmq-server shell script is [email protected]shorthostname, where the short node name is lower-case (as in [email protected], above). If you use the rabbitmq-server.bat batch file on Windows, the short node name is upper-case (as in [email protected]). When you type node names, case matters, and these strings must match exactly.

Creating the cluster

In order to link up our three nodes in a cluster, we tell two of the nodes, say [email protected] and [email protected], to join the cluster of the third, say [email protected].

We first join [email protected] in a cluster with [email protected]. To do that, on [email protected] we stop the RabbitMQ application and join the [email protected] cluster, then restart the RabbitMQ application. Note that joining a cluster implicitly resets the node, thus removing all resources and data that were previously present on that node.

rabbit2$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit2$ rabbitmqctl join_cluster [email protected]
Clustering node [email protected] with [[email protected]] ...done.
rabbit2$ rabbitmqctl start_app
Starting node [email protected] ...done.

We can see that the two nodes are joined in a cluster by running the cluster_status command on either of the nodes:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.

Now we join [email protected] to the same cluster. The steps are identical to the ones above, except this time we'll cluster to rabbit2 to demonstrate that the node chosen to cluster to does not matter - it is enough to provide one online node and the node will be clustered to the cluster that the specified node belongs to.

rabbit3$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit3$ rabbitmqctl join_cluster [email protected]
Clustering node [email protected] with [email protected] ...done.
rabbit3$ rabbitmqctl start_app
Starting node [email protected] ...done.

We can see that the three nodes are joined in a cluster by running the cluster_status command on any of the nodes:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected],[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected],[email protected]]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected],[email protected]]}]
...done.

By following the above steps we can add new nodes to the cluster at any time, while the cluster is running.

Restarting Cluster Nodes

Nodes that have been joined to a cluster can be stopped at any time. It is also ok for them to crash. In both cases the rest of the cluster continues operating, and the nodes automatically "catch up" with (sync from) the other cluster nodes when they start up again.

We shut down the nodes [email protected] and [email protected] and check on the cluster status at each step:

rabbit1$ rabbitmqctl stop
Stopping and halting node [email protected] ...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit3$ rabbitmqctl stop
Stopping and halting node [email protected] ...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected]]}]
...done.

Now we start the nodes again, checking on the cluster status as we go along:

rabbit1$ rabbitmq-server -detached
rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit3$ rabbitmq-server -detached
rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected],[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected],[email protected]]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected],[email protected]]}]
...done.

It is important to understand the process node go though when they are stopped and restarted.

A stopping node picks an online cluster member (only disc nodes will be considered) to sync with after restart. Upon restart the node will try to contact that peer 10 times by default, with 30 second response timeouts. In case the peer becomes available in that time interval, the node successfully starts, syncs what it needs from the peer and keeps going.

When a node has no online peers during shutdown, it will start without attempts to sync with any known peers. It does not start as a standalone node, however, and peers will be able to rejoin it.

When the entire cluster is brought down therefore, the last node to go down must be the first node to be brought online. However, since nodes will try to contact a known peer for up to 5 minutes (by default), nodes can be restarted in any order in that period of time. In this case they will rejoin each other one by one successfully. This window of time can be adjusted using two configuration settings:

# wait for 60 seconds instead of 30
mnesia_table_loading_retry_timeout = 60000

# retry 15 times instead of 10
mnesia_table_loading_retry_limit = 15
By adjusting these settings and tweaking the time window in which known peer has to come back it is possible to account for cluster-wide redeployment scenarios that can be longer than 5 minutes to complete.

In some cases the last node to go offline cannot be brought back up. It can be removed from the cluster using the forget_cluster_node rabbitmqctl command.

Alternatively force_boot rabbitmqctl command can be used on a node to make it boot without trying to sync with any peers (as if they were last to shut down). This is usually only necessary if the last node to shut down or a set of nodes will never be brought back online.

Breaking Up a Cluster

Sometimes it is necessary to remove a node from a cluster. The operator has to do this explicitly using a rabbitmqctl command.

Some peer discovery mechanisms support node health checks and forced removal of nodes not known to the discovery backend. That feature is opt-in (disabled by default).

We first remove [email protected] from the cluster, returning it to independent operation. To do that, on [email protected] we stop the RabbitMQ application, reset the node, and restart the RabbitMQ application.

rabbit3$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit3$ rabbitmqctl reset
Resetting node [email protected] ...done.
rabbit3$ rabbitmqctl start_app
Starting node [email protected] ...done.

Note that it would have been equally valid to list [email protected] as a node.

Running the cluster_status command on the nodes confirms that [email protected] now is no longer part of the cluster and operates independently:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected],[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}]
...done.

We can also remove nodes remotely. This is useful, for example, when having to deal with an unresponsive node. We can for example remove [email protected] from [email protected].

rabbit1$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit2$ rabbitmqctl forget_cluster_node [email protected]
Removing node [email protected] from cluster ...
...done.

Note that rabbit1 still thinks its clustered with rabbit2, and trying to start it will result in an error. We will need to reset it to be able to start it again.

rabbit1$ rabbitmqctl start_app
Starting node [email protected] ...
Error: inconsistent_cluster: Node [email protected] thinks it's clustered with node [email protected], but [email protected] disagrees
rabbit1$ rabbitmqctl reset
Resetting node [email protected] ...done.
rabbit1$ rabbitmqctl start_app
Starting node [email protected] ...
...done.

The cluster_status command now shows all three nodes operating as independent RabbitMQ brokers:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}]
...done.

Note that [email protected] retains the residual state of the cluster, whereas [email protected] and [email protected] are freshly initialised RabbitMQ brokers. If we want to re-initialise [email protected] we follow the same steps as for the other nodes:

rabbit2$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit2$ rabbitmqctl reset
Resetting node [email protected] ...done.
rabbit2$ rabbitmqctl start_app
          Starting node [email protected] ...done.

Besides rabbitmqctl forget_cluster_node and the automatic cleanup of unknown nodes by some peer discovery plugins, there are no scenarios in which a RabbitMQ node will permanently remove its peer node from a cluster.

Upgrading clusters

You can find instructions for upgrading a cluster in the upgrade guide.

A Cluster on a Single Machine

Under some circumstances it can be useful to run a cluster of RabbitMQ nodes on a single machine. This would typically be useful for experimenting with clustering on a desktop or laptop without the overhead of starting several virtual machines for the cluster.

In order to run multiple RabbitMQ nodes on a single machine, it is necessary to make sure the nodes have distinct node names, data store locations, log file locations, and bind to different ports, including those used by plugins. See RABBITMQ_NODENAME, RABBITMQ_NODE_PORT, and RABBITMQ_DIST_PORT in the Configuration guide, as well as RABBITMQ_MNESIA_DIR, RABBITMQ_CONFIG_FILE, and RABBITMQ_LOG_BASE in the File and Directory Locations guide.

You can start multiple nodes on the same host manually by repeated invocation of rabbitmq-server ( rabbitmq-server.bat on Windows). For example:

RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached
rabbitmqctl -n hare stop_app
rabbitmqctl -n hare join_cluster [email protected]`hostname -s`
rabbitmqctl -n hare start_app

will set up a two node cluster, both nodes as disc nodes. Note that if you have RabbitMQ opening any ports other than AMQP, you'll need to configure those not to clash as well. This can be done via command line:

RABBITMQ_NODE_PORT=5672 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15672}]" RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]" RABBITMQ_NODENAME=hare rabbitmq-server -detached

will start two nodes (which can then be clustered) when the management plugin is installed.

Hostname Changes

RabbitMQ nodes use hostnames to communicate with each other. Therefore, all node names must be able to resolve names of all cluster peers. This is also true for tools such as rabbitmqctl.

In addition to that, by default RabbitMQ names the database directory using the current hostname of the system. If the hostname changes, a new empty database is created. To avoid data loss it's crucial to set up a fixed and resolvable hostname. Whenever the hostname changes RabbitMQ node must be restarted.

A similar effect can be achieved by using [email protected] as the broker nodename. The impact of this solution is that clustering will not work, because the chosen hostname will not resolve to a routable address from remote hosts. The rabbitmqctl command will similarly fail when invoked from a remote host. A more sophisticated solution that does not suffer from this weakness is to use DNS, e.g. Amazon Route 53 if running on EC2. If you want to use the full hostname for your nodename (RabbitMQ defaults to the short name), and that full hostname is resolveable using DNS, you may want to investigate setting the environment variable RABBITMQ_USE_LONGNAME=true.

See the section on hostname resolution for more information.

Firewalled nodes

The case for firewalled clustered nodes exists when nodes are in a data center or on a reliable network, but separated by firewalls. Again, clustering is not recommended over a WAN or when network links between nodes are unreliable.

In the most common configuration you will need to open a number of standard ports:

  • 4369: epmd, a peer discovery service used by RabbitMQ nodes and CLI tools
  • 5672, 5671: used by AMQP 0-9-1 and 1.0 clients without and with TLS
  • 25672: used for inter-node and CLI tools communication (Erlang distribution server port) and is allocated from a dynamic range (limited to a single port by default, computed as AMQP port + 20000). Unless external connections on these ports are really necessary (e.g. the cluster uses federation or CLI tools are used on machines outside the subnet), these ports should not be publicly exposed. See networking guide for details.
  • 35672-35682: used by CLI tools (Erlang distribution client ports) for communication with nodes and is allocated from a dynamic range (computed as server distribution port + 10000 through server distribution port + 10010). See networking guide for details.
  • 15672: HTTP API clients, management UI and rabbitmqadmin (only if the management plugin is enabled)
  • 61613, 61614: STOMP clients without and with TLS (only if the STOMP plugin is enabled)
  • 1883, 8883: (MQTT clients without and with TLS, if the MQTT plugin is enabled
  • 15674: STOMP-over-WebSockets clients (only if the Web STOMP plugin is enabled)
  • 15675: MQTT-over-WebSockets clients (only if the Web MQTT plugin is enabled)
See RabbitMQ Networking guide for details.

Erlang Versions Across the Cluster

All nodes in a cluster must run the same minor version of Erlang: 19.3.4 and 19.3.6 can be mixed but 19.0.1 and 19.3.6 (or 17.5 and 19.3.6) cannot. Compatibility between individual Erlang/OTP patch versions can vary between releases but that's generally rare.

Connecting to Clusters from Clients

A client can connect as normal to any node within a cluster. If that node should fail, and the rest of the cluster survives, then the client should notice the closed connection, and should be able to reconnect to some surviving member of the cluster. Generally, it's not advisable to bake in node hostnames or IP addresses into client applications: this introduces inflexibility and will require client applications to be edited, recompiled and redeployed should the configuration of the cluster change or the number of nodes in the cluster change. Instead, we recommend a more abstracted approach: this could be a dynamic DNS service which has a very short TTL configuration, or a plain TCP load balancer, or some sort of mobile IP achieved with pacemaker or similar technologies. In general, this aspect of managing the connection to nodes within a cluster is beyond the scope of RabbitMQ itself, and we recommend the use of other technologies designed specifically to solve these problems.

Clusters with RAM nodes

RAM nodes keep their metadata only in memory. As RAM nodes don't have to write to disc as much as disc nodes, they can perform better. However, note that since persistent queue data is always stored on disc, the performance improvements will affect only resource management (e.g. adding/removing queues, exchanges, or vhosts), but not publishing or consuming speed.

RAM nodes are an advanced use case; when setting up your first cluster you should simply not use them. You should have enough disc nodes to handle your redundancy requirements, then if necessary add additional RAM nodes for scale.

A cluster containing only RAM nodes is fragile; if the cluster stops you will not be able to start it again and will lose all data. RabbitMQ will prevent the creation of a RAM-node-only cluster in many situations, but it can't absolutely prevent it.

The examples here show a cluster with one disc and one RAM node for simplicity only; such a cluster is a poor design choice.

Creating RAM nodes

We can declare a node as a RAM node when it first joins the cluster. We do this with rabbitmqctl join_cluster as before, but passing the --ram flag:

rabbit2$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit2$ rabbitmqctl join_cluster --ram [email protected]
Clustering node [email protected] with [[email protected]] ...done.
rabbit2$ rabbitmqctl start_app
Starting node [email protected] ...done.

RAM nodes are shown as such in the cluster status:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]},{ram,[[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node [email protected] ...
[{nodes,[{disc,[[email protected]]},{ram,[[email protected]]}]},
 {running_nodes,[[email protected],[email protected]]}]
...done.

Changing node types

We can change the type of a node from ram to disc and vice versa. Say we wanted to reverse the types of [email protected] and [email protected], turning the former from a ram node into a disc node and the latter from a disc node into a ram node. To do that we can use the change_cluster_node_type command. The node must be stopped first.

rabbit2$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit2$ rabbitmqctl change_cluster_node_type disc
Turning [email protected] into a disc node ...
...done.
Starting node [email protected] ...done.
rabbit1$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit1$ rabbitmqctl change_cluster_node_type ram
Turning [email protected] into a ram node ...
rabbit1$ rabbitmqctl start_app
Starting node [email protected] ...done.

Getting Help and Providing Feedback

If you have questions about the contents of this guide or any other topic related to RabbitMQ, don't hesitate to ask them on the RabbitMQ mailing list.

Help Us Improve the Docs <3

If you'd like to contribute an improvement to the site, its source is available on GitHub. Simply fork the repository and submit a pull request. Thank you!