Cluster Formation and Peer Discovery
Overview
This guide covers various automation-oriented cluster formation and peer discovery features. For a general overview of RabbitMQ clustering, please refer to the Clustering Guide.
This guide assumes general familiarity with RabbitMQ clustering and focuses on the peer discovery subsystem. For example, it will not cover what ports must be open for inter-node communication, how nodes authenticate to each other, and so on. Besides discovery mechanisms and their configuration, this guide also covers closely related topics of feature availability during cluster formation, rejoining nodes, the problem of initial cluster formation with nodes booting in parallel as well as additional health checks offered by some discovery implementations.
The guide also covers the basics of peer discovery troubleshooting.
What is Peer Discovery?
To form a cluster, new ("blank") nodes need to be able to discover their peers. This can be done using a variety of mechanisms (backends). Some mechanisms assume all cluster members are known ahead of time (for example, listed in the config file), others are dynamic (nodes can come and go).
All peer discovery mechanisms assume that newly joining nodes will be able to contact their peers in the cluster and authenticate with them successfully. The mechanisms that rely on an external service (e.g. DNS or Consul) or API (e.g. AWS or Kubernetes) require the service(s) or API(s) to be available and reachable on their standard ports. Inability to reach the services will lead to node's inability to join the cluster.
Available Discovery Mechanisms
The following mechanisms are built into the core and always available:
Additional peer discovery mechanisms are available via plugins. The following peer discovery plugins ship with supported RabbitMQ versions:
The above plugins do not need to be installed but like all plugins they must be enabled or preconfigured before they can be used.
For peer discovery plugins, which must be available on node boot, this means they must be enabled before first node boot.
The example below uses rabbitmq-plugins' --offline
mode:
rabbitmq-plugins --offline enable <plugin name>
A more specific example:
rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s
A node with configuration settings that belong a non-enabled peer discovery plugin will fail to start and report those settings as unknown.
Specifying the Peer Discovery Mechanism
The discovery mechanism to use is specified in the config file,
as are various mechanism-specific settings, for example, discovery service hostnames, credentials, and so
on. cluster_formation.peer_discovery_backend
is the key
that controls what discovery module (implementation) is used:
cluster_formation.peer_discovery_backend = classic_config
# The backend can also be specified using its module name. Note that
# module names do not necessarily match plugin names exactly.
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
The module has to implement the rabbit_peer_discovery_backend behaviour. Plugins therefore can introduce their own discovery mechanisms.
How Peer Discovery Works
When a node starts and detects it doesn't have a previously initialised database, it will check if there's a peer discovery mechanism configured. If that's the case, it will then perform the discovery and attempt to contact each discovered peer in order. Finally, it will attempt to join the cluster of the first reachable peer.
Depending on the backend (mechanism) used, the process of peer discovery may involve contacting external services, for example, an AWS API endpoint, a Consul node or performing a DNS query. Some backends require nodes to register (tell the backend that the node is up and should be counted as a cluster member): for example, Consul and etcd both support registration. With other backends the list of nodes is configured ahead of time (e.g. config file). Those backends are said to not support node registration.
In some cases node registration is implicit or managed by an external service. AWS autoscaling groups is a good example: AWS keeps track of group membership, so nodes don't have to (or cannot) explicitly register. However, the list of cluster members is not predefined. Such backends usually include a no-op registration step and apply one of the race condition mitigation mechanisms described below.
When the configured backend supports registration, nodes unregister when they stop.
If peer discovery isn't configured, or it repeatedly fails, or no peers are reachable, a node that wasn't a cluster member in the past will initialise from scratch and proceed as a standalone node. Peer discovery progress and outcomes will be logged by the node.
If a node previously was a cluster member, it will try to contact and rejoin its "last seen" peer for a period of time. In this case, no peer discovery will be performed. This is true for all backends.
Cluster Formation and Feature Availability
As a general rule, a cluster that is only been partly formed, that is, only a subset of nodes has joined it must be considered fully available by clients.
Individual nodes will accept client connections before the cluster is formed. In such cases, clients should be prepared to certain features not being available. For instance, quorum queues won't be available unless the number of cluster nodes matches or exceeds the quorum of configured replica count.
Features behind feature flags may also be unavailable until cluster formation completes.
Nodes Rejoining Their Existing Cluster
A new node joining a cluster is just one possible case. Another common scenario is when an existing cluster member temporarily leaves and then rejoins the cluster. While the peer discovery subsystem does not affect the behavior described in this section, it's important to understand how nodes behave when they rejoin their cluster after a restart or failure.
Existing cluster members will not perform peer discovery. Instead they will try to contact their previously known peers.
If a node previously was a cluster member, when it boots it will try to contact its "last seen" peer for a period of time. If the peer is not booted (e.g. when a full cluster restart or upgrade is performed) or cannot be reached, the node will retry the operation a number of times.
Default values are 10
retries and 30
seconds per attempt,
respectively, or 5 minutes total. In environments where nodes can take a long and/or uneven
time to start it is recommended that the number of retries is increased.
If a node is reset since losing contact with the cluster, it will behave like a blank node.
Note that other cluster members might still consider it to be a cluster member, in which case
the two sides will disagree and the node will fail to join. Such reset nodes must also be
removed from the cluster using rabbitmqctl forget_cluster_node
executed against
an existing cluster member.
If a node was explicitly removed from the cluster by the operator and then reset, it will be able to join the cluster as a new member. In this case it will behave exactly like a blank node would.
A node rejoining after a node name or host name change can start as a blank node if its data directory path changes as a result. Such nodes will fail to rejoin the cluster. While the node is offline, its peers can be reset or started with a blank data directory. In that case the recovering node will fail to rejoin its peer as well since internal data store cluster identity would no longer match.
Consider the following scenario:
- A cluster of 3 nodes, A, B and C is formed
- Node A is shut down
- Node B is reset
- Node A is started
- Node A tries to rejoin B but B's cluster identity has changed
- Node B doesn't recognise A as a known cluster member because it's been reset
in this case node B will reject the clustering attempt from A with an appropriate error message in the log:
Node 'rabbit@node1.local' thinks it's clustered with node 'rabbit@node2.local', but 'rabbit@node2.local' disagrees
In this case B can be reset again and then will be able to join A, or A can be reset and will successfully join B.
How to Configure Peer Discovery
Peer discovery plugins are configured just like the core server and other plugins: using a config file.
cluster_formation.peer_discovery_backend
is the key that controls what peer discovery backend will be used.
Each backend will also have a number of configuration settings specific to it.
The rest of the guide will cover configurable settings specific to a particular mechanism
as well as provide examples for each one.
Config File Peer Discovery Backend
Config File Peer Discovery Overview
The most basic way for a node to discover its cluster peers is to read a list of nodes from the config file. The set of cluster members is assumed to be known at deployment time.
Configuration
The peer nodes are listed using the cluster_formation.classic_config.nodes
config setting:
cluster_formation.peer_discovery_backend = classic_config
# the backend can also be specified using its module name
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit@hostname1.eng.example.local
cluster_formation.classic_config.nodes.2 = rabbit@hostname2.eng.example.local
DNS Peer Discovery Backend
DNS Peer Discovery Overview
Another built-in peer discovery mechanism as of RabbitMQ 3.7.0 is DNS-based. It relies on a pre-configured hostname ("seed hostname") with DNS A (or AAAA) records and reverse DNS lookups to perform peer discovery. More specifically, this mechanism will perform the following steps:
- Query DNS A records of the seed hostname.
- For each returned DNS record's IP address, perform a reverse DNS lookup.
- Append current node's prefix (e.g.
rabbit
inrabbit@hostname1.example.local
) to each hostname and return the result.
For example, let's consider a seed hostname of
discovery.eng.example.local
. It has 2 DNS A
records that return two IP addresses:
192.168.100.1
and
192.168.100.2
. Reverse DNS lookups for those IP
addresses return node1.eng.example.local
and
node2.eng.example.local
, respectively. Current node's name is not
set and defaults to rabbit@$(hostname)
.
The final list of nodes discovered will contain two nodes: rabbit@node1.eng.example.local
and rabbit@node2.eng.example.local
.
Configuration
The seed hostname is set using the cluster_formation.dns.hostname
config setting:
cluster_formation.peer_discovery_backend = dns
# the backend can also be specified using its module name
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = discovery.eng.example.local
Peer Discovery on AWS (EC2)
AWS Peer Discovery Overview
An AWS (EC2)-specific discovery mechanism is available via a plugin.
As with any plugin, it must be enabled before it can be used. For peer discovery plugins it means they must be enabled or preconfigured before first node boot:
rabbitmq-plugins --offline enable rabbitmq_peer_discovery_aws
The plugin provides two ways for a node to discover its peers:
- Using EC2 instance tags
- Using AWS autoscaling group membership
Both methods rely on AWS-specific APIs (endpoints) and features and thus cannot work in other IaaS environments. Once a list of cluster member instances is retrieved, final node names are computed using instance hostnames or IP addresses.
Configuration and Credentials
Before a node can perform any operations on AWS, it needs to have a set of AWS account credentials configured. This can be done in a couple of ways:
- Via config file
- Using environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
EC2 Instance Metadata service for the region will also be consulted.
The following example snippet configures RabbitMQ to use the AWS peer discovery backend and provides information about AWS region as well as a set of credentials:
cluster_formation.peer_discovery_backend = aws
# the backend can also be specified using its module name
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws
cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY
If region is left unconfigured, us-east-1
will be used by default.
Sensitive values in configuration file can optionally be encrypted.
If an IAM role is assigned to EC2 instances running RabbitMQ nodes, a policy has to be used to allow said instances use EC2 Instance Metadata Service. When the plugin is configured to use Autoscaling group members, a policy has to grant access to describe autoscaling group members (instances). Below is an example of a policy that covers both use cases:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingInstances",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}
Using Autoscaling Group Membership
When autoscaling-based peer discovery is used, current node's EC2 instance autoscaling group members will be listed and used to produce the list of discovered peers.
To use autoscaling group membership, set the cluster_formation.aws.use_autoscaling_group
key
to true
:
cluster_formation.peer_discovery_backend = aws
cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY
cluster_formation.aws.use_autoscaling_group = true
Using EC2 Instance Tags
When tags-based peer discovery is used, the plugin will list EC2 instances using EC2 API and filter them by configured instance tags. Resulting instance set will be used to produce the list of discovered peers.
Tags are configured using the cluster_formation.aws.instance_tags
key. The example
below uses three tags: region
, service
, and environment
.
cluster_formation.peer_discovery_backend = aws
cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY
cluster_formation.aws.instance_tags.region = us-east-1
cluster_formation.aws.instance_tags.service = rabbitmq
cluster_formation.aws.instance_tags.environment = staging
Using Private EC2 Instance IPs
By default peer discovery will use private DNS hostnames to compute node names. This option is most convenient and is highly recommended.
However, it is possible to opt into using private IPs instead by setting
the cluster_formation.aws.use_private_ip
key to true
. For this setup to work,
RABBITMQ_NODENAME
must be set to the private IP address at node
deployment time.
RABBITMQ_USE_LONGNAME
also has to be set to true
or an IP address won't be considered a valid
part of node name.
cluster_formation.peer_discovery_backend = aws
cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY
cluster_formation.aws.use_autoscaling_group = true
cluster_formation.aws.use_private_ip = true
Peer Discovery on Kubernetes
Kubernetes Peer Discovery Overview
A Kubernetes-based discovery mechanism is available via a plugin.
As with any plugin, it must be enabled before it can be used. For peer discovery plugins it means they must be enabled or preconfigured before first node boot:
rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s
Important: Prerequisites and Deployment Considerations
With this mechanism, nodes fetch a list of their peers from a Kubernetes API endpoint using a set of configured values: a URI scheme, host, port, as well as the token and certificate paths.
There are several prerequisites and deployment choices that must be taken into account when deploying RabbitMQ to Kubernetes, with this peer discovery mechanism and in general.
Use a Stateful Set
A RabbitMQ cluster deployed to Kubernetes will use a set of pods. The set must be a stateful set.
A headless service must be used to
control network identity of the pods
(their hostnames), which in turn affect RabbitMQ node names.
On the headless service spec
, field publishNotReadyAddresses
must be set to true
to propagate SRV DNS records for its Pods for the purpose of peer discovery.
In addition, since RabbitMQ nodes resolve their own and peer hostnames during boot, CoreDNS caching timeout may need to be decreased from default 30 seconds to a value in the 5-10 second range.
If a stateless set is used recreated nodes will not have their persisted data and will start as blank nodes. This can lead to data loss and higher network traffic volume due to more frequent data synchronisation of both quorum queues and classic queue mirrors on newly joining nodes.
Use Persistent Volumes
How storage is configured is generally orthogonal to peer discovery. However, it does not make sense to run a stateful data service such as RabbitMQ with node data directory stored on a transient volume. Use of transient volumes can lead nodes to not have their persisted data after a restart. This has the same consequences as with stateless sets covered above.
Make Sure /etc/rabbitmq
is Mounted as Writeable
RabbitMQ nodes and images may need to update a file under /etc/rabbitmq
, the default configuration file location on Linux. This may involve configuration file generation
performed by the image used, enabled plugins file updates,
and so on.
It is therefore highly recommended that /etc/rabbitmq
is mounted as writeable and owned by
RabbitMQ's effective user (typically rabbitmq
).
Use Most Basic Health Checks for RabbitMQ Pod Readiness Probes
A readiness probe that expects the node to be fully booted and have rejoined its cluster peers
can prevent a deployment that restarts all RabbitMQ pods and relies on the OrderedReady
pod management policy.
Deployments that use the Parallel
pod management policy
will not be affected.
One health check that does not expect a node to be fully booted and have schema tables synced is
# a very basic check that will succeed for the nodes that are currently waiting for
# a peer to sync schema from
rabbitmq-diagnostics ping
This basic check would allow the deployment to proceed and the nodes to eventually rejoin each other, assuming they are compatible.
See Schema Syncing from Online Peers in the Clustering guide.
Examples
A minimalistic runnable example of Kubernetes peer discovery mechanism can be found on GitHub.
The example can be run using either MiniKube or Kind.