RabbitMQ 4.1: New Kubernetes Peer Discovery Mechanism
RabbitMQ 4.1 includes a completely redesigned peer discovery plugin for Kubernetes.
No configuration changes should be needed when upgrading to 4.1, so if you want,
you can just stop reading here. If you are interested in the details, read on.
This blog post explains the peer discovery subsystem in general
and the changes to rabbitmq_peer_discovery_k8s
in particular.
What Is Peer Discovery?
Say you want to have a 3-node RabbitMQ cluster - you start 3 instances of RabbitMQ but then
what? You can manually tell two of them to join the third one using
rabbitmqctl join_cluster
command and voilà, you have a 3-node cluster.
Most users would prefer this process to be automated however. That's where peer discovery comes in. There is a handful of peer discovery plugins available in RabbitMQ for different situations. The simplest one is called classic peer discovery and allows you to just put the hostnames of the nodes in the configuration file, so that RabbitMQ automatically initiates the cluster formation with them upon startup.
It is a common misconception that the peer discovery is performed every time a node starts. This is not the case, it is only performed when a node starts for the first time (when it has an empty data folder).
However, based on how you deploy RabbitMQ, the hostnames may not be known upfront. Even if they are, you need a different configuration file for each cluster, which may be inconvenient if you want a quick way to spin up new clusters for testing environments for example.
In such cases, you can use other peer discovery plugins, which allow nodes to register with some external systems such as Consul or etcd and query these systems for a list of registered nodes. This way you don't need to know the hostnames upfront - the nodes discover each other automatically.
Kubernetes Peer Discovery before RabbitMQ 4.1
Before RabbitMQ 4.1, rabbitmq_peer_discovery_k8s
performed the peer discovery by querying
the Kubernetes API server for a list of endpoints behind a service (Kubernetes automatically
registers pods of a given StatefulSet as endpoints). However, there were a few issues with
this approach:
- some users reported that occasionally, cluster formation would fail and the pods would form multiple separate clusters; we never received enough data to diagnose this issue and it never occurred in our testing (we tried thousands of times...)
- it required permissions to query the Kubernetes API; not a big deal, but it was unnecessary and some security-conscious users were asking why we needed this
- it was a convoluted way of asking a question, we already know the answer to...
Kubernetes Peer Discovery in RabbitMQ 4.1
When deploying RabbitMQ to Kubernetes, you should always use a StatefulSet.
All pods that belong to a StatefulSet are named consistently with the name of the StatefulSet,
followed by a hyphen and an
ordinal index.
The ordinal index start is configurable, but is almost always 0
, so let's just assume it is 0
.
Given that, a 3-node cluster deployed to Kubernetes will always have nodes with suffixes -0
, -1
and -2
.
There's no need to query the Kubernetes API to know this!
The new plugin doesn't perform any Kubernetes API queries. It just assumes that a pod with -0
suffix
will exist and treats it as the "seed" node. All other nodes will join the cluster by joining
the -0
node. If the -0
node is not up, other nodes will wait forever for it to come up
(they will never form a cluster without the -0
node). Remember that peer discovery only
happens when a node starts for the first time, so "waiting forever for node -0
" only
applies to the first time you deploy a given cluster.
Advanced Configuration
For the vast majority of users, this upgrade should be completely transparent. First of all, since peer discovery is only performed when a node starts for the first time, if you upgrade an existing cluster, peer discovery changes won't affect you.
Second, the new plugin accepts, but ignores, all configuration options of the old plugin. You will see some warnings in the logs about deprecated options being used, but you can safely ignore them.
If the default configuration doesn't work for you, there are two settings you can use:
-
If you are using an ordinal start other than
0
(and seriously, why would you?!), you should configure the plugin by settingcluster_formation.k8s.ordinal_start = N
whereN
is the ordinal start. When set, all nodes will try to join the-N
node, rather than the-0
node. -
Additionally, you can set
cluster_formation.k8s.seed_node = rabbit@seed-node-hostname
to just say what the seed node is. We don't expect this setting to ever be needed, but it's there if you really need it.
What If I'm Using The Cluster Operator?
Cluster Operator
is the recommended way of deploying RabbitMQ to Kubernetes, so if you are using it - great.
You should be able to continue using it with no changes. You will see the aforementioned warnings in the logs,
because the Cluster Operator allows deploying different RabbitMQ versions, not just 4.1.
Therefore, for the time being, it will continue setting values required by the old version of
rabbitmq_peer_discovery_k8s
in the configuration file. Such a configuration works for both 4.1 and older
versions. At some point in the future, Cluster Operator will drop support for RabbitMQ versions older
than 4.1 and we'll remove these settings from the ConfigMap declared by the Cluster Operator.