Cluster Formation and Peer Discovery
Overview
This guide covers various automation-oriented cluster formation and peer discovery features. For a general overview of RabbitMQ clustering, please refer to the Clustering Guide.
This guide assumes general familiarity with RabbitMQ clustering and focuses on the peer discovery subsystem. For example, it will not cover what ports must be open for inter-node communication, how nodes authenticate to each other, and so on. Besides discovery mechanisms and their configuration, this guide also covers closely related topics of feature availability during cluster formation, rejoining nodes, the problem of initial cluster formation with nodes booting in parallel as well as additional health checks offered by some discovery implementations.
The guide also covers the basics of peer discovery troubleshooting.
What is Peer Discovery?
To form a cluster, new ("blank") nodes need to be able to discover their peers. This can be done using a variety of mechanisms (backends). Some mechanisms assume all cluster members are known ahead of time (for example, listed in the config file), others are dynamic (nodes can come and go).
All peer discovery mechanisms assume that newly joining nodes will be able to contact their peers in the cluster and authenticate with them successfully. The mechanisms that rely on an external service (e.g. DNS or Consul) or API (e.g. AWS or Kubernetes) require the service(s) or API(s) to be available and reachable on their standard ports. Inability to reach the services will lead to node's inability to join the cluster.
Available Discovery Mechanisms
The following mechanisms are built into the core and always available:
Additional peer discovery mechanisms are available via plugins. The following peer discovery plugins ship with supported RabbitMQ versions:
The above plugins do not need to be installed but like all plugins they must be enabled or preconfigured before they can be used.
For peer discovery plugins, which must be available on node boot, this means they must be enabled before first node boot.
The example below uses rabbitmq-plugins' --offline
mode:
rabbitmq-plugins --offline enable <plugin name>
A more specific example:
rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s
A node with configuration settings that belong a non-enabled peer discovery plugin will fail to start and report those settings as unknown.
Specifying the Peer Discovery Mechanism
The discovery mechanism to use is specified in the config file,
as are various mechanism-specific settings, for example, discovery service hostnames, credentials, and so
on. cluster_formation.peer_discovery_backend
is the key
that controls what discovery module (implementation) is used:
cluster_formation.peer_discovery_backend = classic_config
# The backend can also be specified using its module name. Note that
# module names do not necessarily match plugin names exactly.
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
The module has to implement the rabbit_peer_discovery_backend behaviour. Plugins therefore can introduce their own discovery mechanisms.
How Peer Discovery Works
When a node starts and detects it doesn't have a previously initialised database, it will check if there's a peer discovery mechanism configured. If that's the case, it will then perform the discovery and attempt to contact each discovered peer in order. Finally, it will attempt to join the cluster of the first reachable peer.
Depending on the backend (mechanism) used, the process of peer discovery may involve contacting external services, for example, an AWS API endpoint, a Consul node or performing a DNS query. Some backends require nodes to register (tell the backend that the node is up and should be counted as a cluster member): for example, Consul and etcd both support registration. With other backends the list of nodes is configured ahead of time (e.g. config file). Those backends are said to not support node registration.
In some cases node registration is implicit or managed by an external service. AWS autoscaling groups is a good example: AWS keeps track of group membership, so nodes don't have to (or cannot) explicitly register. However, the list of cluster members is not predefined. Such backends usually include a no-op registration step and apply one of the race condition mitigation mechanisms described below.
When the configured backend supports registration, nodes unregister when they stop.
If peer discovery isn't configured, or it repeatedly fails, or no peers are reachable, a node that wasn't a cluster member in the past will initialise from scratch and proceed as a standalone node. Peer discovery progress and outcomes will be logged by the node.
If a node previously was a cluster member, it will try to contact and rejoin its "last seen" peer for a period of time. In this case, no peer discovery will be performed. This is true for all backends.