Menu

Cluster Formation and Peer Discovery

Introduction

This guide covers various automation-oriented cluster formation and peer discovery features. For a general overview of RabbitMQ clustering, please refer to the Clustering Guide.

To form a cluster, new ("blank") nodes need to be able to discover their peers. The following peer discovery mechanisms are built-in:

Additional peer discovery mechanisms are available via plugins:

The discovery mechanism to use is specified in the config file, as are mechanism-specific settings, for example, discovery service hostnames, credentials, and so on. cluster_formation.peer_discovery_backend is the key that controls what discovery module (implementation) is used:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
The module has to implement the rabbit_peer_discovery_backend behaviour. Plugins therefore can introduce their own discovery mechanisms.

When a node starts and detects it doesn't have a previously initialised database, it will check if there's a peer discovery mechanism configured. If that's the case, it will then perform the discovery and attempt to contact each discovered peer in order. Finally, it will attempt to join the cluster of the first reachable peer.

If peer discovery isn't configured, or it fails, or no peers are reachable, a node that wasn't a cluster member in the past will initialise from scratch and proceed as a standalone node.

If a node previously was a cluster member, it will try to contact its "last seen" peer for a period of time. In this case, no peer discovery is performed.

Peer Rejoining Timeout

If a node previously was a cluster member, when it boots it will try to contact its "last seen" peer for a period of time. Default values are 10 and 30 seconds, respectively, or 5 minutes total. In environments where nodes can take a long and/or uneven time to start it is recommended that the number of retries is increased.

Config File Peer Discovery Backend

The most basic way for a node to discover its cluster peers is to read a list of nodes from the config file.

This is done using the cluster_formation.classic_config.nodes config setting.

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config

cluster_formation.classic_config.nodes.1 = [email protected]
cluster_formation.classic_config.nodes.2 = [email protected]

The following example demonstrates the same configuration in the classic config format. The 2nd member of the rabbit.cluster_nodes tuple is the node type to use for the current node. In the vast majority of cases all nodes should be disc nodes.

[
 {rabbit, [
           {cluster_nodes, {['[email protected]',
                             '[email protected]'], disc}}
          ]}
].

DNS Peer Discovery Backend

Another built-in peer discovery mechanism as of RabbitMQ 3.7.0 is DNS-based. It relies on a pre-configured hostname ("seed hostname") with DNS A (or AAAA) records and reverse DNS lookups to perform peer discovery. More specifically, this mechanism will perform the following steps:

  • Query DNS A records of the seed hostname.
  • For each returned DNS record's IP address, perform a reverse DNS lookup.
  • Append current node's prefix (e.g. rabbit in [email protected]) to each hostname and return the result.

For example, let's consider a seed hostname of discovery.eng.example.local. It has 2 DNS A records that return two IP addresses: 192.168.100.1 and 192.168.100.2. Reverse DNS lookups for those IP addresses return node1.eng.example.local and node2.eng.example.local, respectively. Current node's name is not set and defaults to [email protected]$(hostname). The final list of nodes discovered will contain two nodes: [email protected] and [email protected].

The seed hostname is set using the cluster_formation.dns.hostname config setting:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns

cluster_formation.dns.hostname = discovery.eng.example.local

Peer Discovery on AWS (EC2)

An AWS (EC2)-specific discovery mechanism is available via a plugin. It provides two ways for a node to discover its peers:

  • Using EC2 instance tags
  • Using AWS autoscaling group membership
Both methods rely on AWS-specific APIs (endpoints) and features and thus cannot work in other IaaS environments. Once a list of cluster member instances is retrieved, final node names are computed using instance hostnames or IP addresses.

When the AWS peer discovery mechanism is used, nodes will delay their startup for a randomly picked value to reduce the probability of a race condition during initial cluster formation (see below).

Configuration and Credentials

Before a node can perform any operations on AWS, it needs to have a set of AWS account credentials configured. This can be done in a couple of of ways:

  1. Via config file
  2. Using environment variables
EC2 Instance Metadata service for the region will also be consulted.

The following example snippet configures RabbitMQ to use the AWS peer discovery backend and provides information about AWS region as well as a set of credentials:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY
If region is left unconfigured, us-east-1 will be used by default. Sensitive values in configuration file can optionally be encrypted.

If an IAM role is assigned to EC2 instances running RabbitMQ nodes, a policy has to be used to allow said instances use EC2 Instance Metadata Service. Below is an example of such policy:

{
"Version": "2012-10-17",
"Statement": [
              {
              "Effect": "Allow",
              "Action": [
                         "autoscaling:DescribeAutoScalingInstances",
                         "ec2:DescribeInstances"
                         ],
              "Resource": [
                           "*"
                           ]
              }
              ]
}

Using Autoscaling Group Membership

When autoscaling-based peer discovery is used, current node's EC2 instance autoscaling group members will be listed and used to produce the list of discovered peers.

To use autoscaling group membership, set the cluster_formation.aws.use_autoscaling_group key to true:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY

cluster_formation.aws.use_autoscaling_group = true

Using EC2 Instance Tags

When tags-based peer discovery is used, the plugin will list EC2 instances using EC2 API and filter them by configured instance tags. Resulting instance set will be used to produce the list of discovered peers.

Tags are configured using the cluster_formation.aws.instance_tags key. The example below uses three tags: region, service, and environment.

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY

cluster_formation.aws.instance_tags.region = us-east-1
cluster_formation.aws.instance_tags.service = rabbitmq
cluster_formation.aws.instance_tags.environment = staging

Using Private EC2 Instance IPs

By default peer discovery will use private DNS hostnames to compute node names. It is possible to opt into using private IPs instead by setting the cluster_formation.aws.aws_use_private_ip key to true:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = ANIDEXAMPLE
cluster_formation.aws.secret_key = WjalrxuTnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY

cluster_formation.aws.use_autoscaling_group = true
cluster_formation.aws.use_private_ip = true

Peer Discovery on Kubernetes

An Kubernetes-based discovery mechanism is available via a plugin.

Nodes register with Kubernetes on boot and unregister when they leave. This backend relies on randomized startup delay to reduce the probability of a race condition during initial cluster formation (see below).

To use Kubernetes for peer discovery, set the cluster_formation.peer_discovery_backend to rabbit_peer_discovery_k8s:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

# Kubernetes API hostname (or IP address). Default value is kubernetes.default.svc.cluster.local
cluster_formation.k8s.host = kubernetes.default.example.local

It is possible to configure Kubernetes API port and URI scheme:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local
# 443 is used by default
cluster_formation.k8s.port = 443
# https is used by default
cluster_formation.k8s.scheme = https

Kubernetes token file path is configurable via cluster_formation.k8s.token_path:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local
# default value is /var/run/secrets/kubernetes.io/serviceaccount/token
cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
It must point to a local file that exists and is readable by RabbitMQ.

Certificate and namespace paths use cluster_formation.k8s.cert_path and cluster_formation.k8s.cert_path, respectively:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local
# default value is /var/run/secrets/kubernetes.io/serviceaccount/token
cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token

# default value is /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

# default value is /var/run/secrets/kubernetes.io/serviceaccount/namespace
cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace
Just like with the token path key both must point to a local file that exists and is readable by RabbitMQ.

When a list of peer nodes is computed from a list of pod containers returned by Kubernetes, either contain hostnames or IP addresses can be used. This is configurable using the cluster_formation.k8s.cert_path key:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace

# should result set use hostnames or IP addresses
# of Kubernetes API-reported containers?
# supported values are "hostname" and "ip"
cluster_formation.k8s.address_type = ip
Supported values are ip or hostname, the former is used by default.

It is possible to append a suffix to peer hostnames returned by Kubernetes using cluster_formation.k8s.hostname_suffix:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace

# no suffix is appended by default
cluster_formation.k8s.hostname_suffix = rmq.eng.example.local

Service name is rabbitmq by default but can be overridden using the cluster_formation.k8s.service_name key if needed:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s

cluster_formation.k8s.host = kubernetes.default.example.local

cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace

# overrides Kubernetes service name. Default value is "rabbitmq".
cluster_formation.k8s.service_name = rmq-qa

Peer Discovery Using Consul

A Consul-based discovery mechanism is available via a plugin. Consul 0.8.0 and later versions are supported.

Nodes register with Consul on boot and unregister when they leave. Prior to registration, nodes will attempt to acquire a lock in Consul to reduce the probability of a race condition during initial cluster formation (see below). When a node registers with Consul, it will set up a periodic health check for itself (more on this below).

To use Consul for peer discovery, set the cluster_formation.peer_discovery_backend to to rabbit_peer_discovery_consul:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

# Consul host (hostname or IP address). Default value is localhost
cluster_formation.consul.host = consul.eng.example.local

It is possible to configure Consul port and URI scheme:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
# 8500 is used by default
cluster_formation.consul.port = 8500
# http is used by default
cluster_formation.consul.scheme = http

To configure Consul ACL token, use :

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
cluster_formation.consul.acl_token = acl-token-value

Service name (as registered in Consul) defaults to "rabbitmq" but can be overridden:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
# rabbitmq is used by default
cluster_formation.consul.svc = rabbitmq

Service hostname (address) as registered in Consul will be fetched by peers and therefore must resolve on all nodes. The hostname can be computed by the plugin or specified by the user. When computed automatically, a number of nodes and OS properties can be used:

  • Hostname (as returned by gethostname(2))
  • Node name (without the [email protected] prefix)
  • IP address of an NIC (network controller interface)
When cluster_formation.consul.svc_addr_auto is set to false, service name will be taken as is from cluster_formation.consul.svc_addr. When it is set to true, other options explained below come into play.

In the following example, the service address reported to Consul is hardcoded to hostname1.rmq.eng.example.local instead of being computed automatically from the environment:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq
# do not compute service address, it will be specified below
cluster_formation.consul.svc_addr_auto = false
# service address, will be communicated to other nodes
cluster_formation.consul.svc_addr = hostname1.rmq.eng.example.local
# use long RabbitMQ node names?
cluster_formation.consul.use_longname = true

In this example, the service address reported to Consul is parsed from node name (the [email protected] prefix will be dropped):

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq
# do compute service address
cluster_formation.consul.svc_addr_auto = true
# compute service address using node name
cluster_formation.consul.svc_addr_nodename = true
# use long RabbitMQ node names?
cluster_formation.consul.use_longname = true
Note that cluster_formation.consul.svc_addr_nodename is a boolean field.

In the next example, the service address is computed using hostname as reported by the OS instead of node name:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq
# do compute service address
cluster_formation.consul.svc_addr_auto = true
# compute service address using host name and not node name
cluster_formation.consul.svc_addr_nodename = false
# use long RabbitMQ node names?
cluster_formation.consul.use_longname = true

In the example below, the service address is computed by taking the IP address of a provided NIC, en0:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq
# do compute service address
cluster_formation.consul.svc_addr_auto = true
# compute service address using the IP address of a NIC, en0
cluster_formation.consul.svc_addr_nic = en0
cluster_formation.consul.svc_addr_nodename = false
# use long RabbitMQ node names?
cluster_formation.consul.use_longname = true

Service port as registered in Consul can be overridden. This is only necessary if RabbitMQ uses a non-standard port for client (technically AMQP 0-9-1 and AMQP 1.0) connections since default value is 5672.

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
# 5672 is used by default
cluster_formation.consul.svc_port = 6674

When a node registers with Consul, it will set up a periodic health check for itself. Online nodes will periodically send a health check update to Consul to indicate the service is available. This interval can be configured:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
# health check interval (node TTL) in seconds
# default: 30
cluster_formation.consul.svc_ttl = 40
A node that failed its health check is considered to be in the warning state by Consul. Such nodes can be automatically unregistered by Consul after a period of time (note: this is a separate interval value from the TTL above). The period cannot be less than 60 seconds.
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
# health check interval (node TTL) in seconds
cluster_formation.consul.svc_ttl = 30
# how soon should nodes that fail their health checks be unregistered by Consul?
# this value is in seconds and must not be lower than 60 (a Consul requirement)
cluster_formation.consul.deregister_after = 90
Please see a section on automatic cleanup of nodes below.

Nodes in the warning state are excluded from peer discovery results by default. It is possible to opt into including them by setting cluster_formation.consul.include_nodes_with_warnings to true:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local
# health check interval (node TTL) in seconds
cluster_formation.consul.svc_ttl = 30
# include node in the warning state into discovery result set
cluster_formation.consul.include_nodes_with_warnings = true

If node name is computed and long node names are used, it is possible to append a suffix to node names retrieved from Consul. The format is .node.{domain_suffix}. This can be useful in environments with DNS conventions, e.g. when all service nodes are organised in a separate subdomain. Here's an example:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul

cluster_formation.consul.host = consul.eng.example.local

cluster_formation.consul.svc = rabbitmq
# do compute service address
cluster_formation.consul.svc_addr_auto = true
# compute service address using node name
cluster_formation.consul.svc_addr_nodename = true
# use long RabbitMQ node names?
cluster_formation.consul.use_longname = true
# append a suffix (node.rabbitmq.example.local) to node names retrieved from Consul
cluster_formation.consul.domain_suffix = example.local
With this setup node names will be computed to [email protected] instead of [email protected].

Peer Discovery Using etcd

An etcd-based discovery mechanism is available via a plugin. etcd v3 and v2 are supported.

Nodes register with etcd on boot and unregister when they leave. Prior to registration, nodes will attempt to acquire a lock in etcd to reduce the probability of a race condition during initial cluster formation (see below).

Nodes contact etcd periodically to refresh their keys. Those that haven't done so in a configurable period of time (node TTL) are cleaned up from etcd. If configured, such nodes can be forcefully removed from the cluster.

To use etcd for peer discovery, set the cluster_formation.peer_discovery_backend to rabbit_peer_discovery_etcd:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

# etcd host (hostname or IP address). Default value is localhost
cluster_formation.etcd.host = etcd.eng.example.local

It is possible to configure etcd port and URI scheme:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local
# 2379 is used by default
cluster_formation.etcd.port = 2379
# http is used by default
cluster_formation.etcd.scheme = http

etcd keys used by peer discovery will be prefixed with "rabbitmq" by default:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local
# rabbitmq is used by default
cluster_formation.etcd.key_prefix = rabbitmq_discovery

Key used for node registration will have a TTL interval set for them. Online nodes will periodically refresh their key(s). The TTL value can be configured:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local
# node TTL in seconds
# default: 30
cluster_formation.etcd.ttl = 40
Key refreshes will be performed every TTL/2 seconds. It is possible to forcefully remove the nodes that fail to refresh their keys from the cluster. This is covered later in this guide.

When a node tries to acquire a lock on boot and the lock is already taken, it will wait for the lock to be come available with a timeout. Default value is 300 seconds but it can be configured:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local
# lock acquisition timeout in seconds
# default: 300
cluster_formation.etcd.lock_wait_time = 60

If multiple RabbitMQ clusters share an etcd instance, each must use a unique cluster name:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd

cluster_formation.etcd.host = etcd.eng.example.local
# default name: "default"
cluster_formation.etcd.cluster_name = staging

Race Conditions During Initial Cluster Formation

Consider a deployment where the entire cluster is provisioned at once and all nodes start in parallel. In this case there's a natural race condition between node registration and more than one node can become "first to register" (discovers no existing peers and thus starts as standalone).

Different peer discovery backends use different approaches to minimize the probability of such scenario. Some use locking (etcd, Consul), others use a technique known as randomized startup delay. With randomized startup delay nodes will delay their startup for a randomly picked value (between 5 and 60 seconds by default).

Some backends (config file, DNS) rely on a pre-configured set of peers and avoid the issue that way.

Effective delay interval, if used, is logged on node boot.

Node Health Checks and Cleanup

Sometimes a node is a cluster member but not known to the discovery backend. For example, consider a cluster that uses the AWS backend configured to use autoscaling group membership. If an EC2 instance in that group fails and is later re-created, it will be considered an unavailable node in the RabbitMQ cluster. Such unknown nodes can be logged or forcefully removed from the cluster.

To log warnings for the unknown nodes, cluster_formation.node_cleanup.only_log_warning should be set to true. This is the default behavior.

To forcefully delete the unknown nodes from the cluster, cluster_formation.node_cleanup.only_log_warning should be set to false. Note that this option should be used with care, in particular with discovery backends other than AWS.

Some backends (Consul, etcd) support node health checks (or TTL). Nodes periodically notify their respective discovery service (e.g. Consul) that they are still available. If no notifications from a node come in after a period of time, the node is considered to be in the warning state. With etcd, such nodes will no longer show up in discovery results. With Consul, they can either be removed (deregistered) or their warning state can be reported. Please see documentation for those backends to learn more.