Menu

Posts Tagged ‘diy’

Deploying RabbitMQ to Kubernetes: What’s Involved?

Monday, August 10th, 2020

Introduction

Over time, we have seen the number of Kubernetes-related queries on our community mailing list and Slack channels soar. In this post we'd like to explain the basics of a DIY deployment of RabbitMQ on Kubernetes: what Kubernetes resources will be necessary, how to make sure RabbitMQ nodes use durable storage, how to approach configuration of sensitive values, and so on.

Deploying a statful data service such as RabbitMQ to Kubernetes is a bit like assembling a jigsaw puzzle. There are multiple pieces involved:

In this post, we will try to cover the key parts as well as mention a couple more steps that are not technically required to run RabbitMQ on Kubernetes, but every production system operator will have to worry about sooner rather than later:

  • How to set up cluster monitoring with Prometheus and Grafana
  • How to deploy a PerfTest instance to do basic functional and load testing of the cluster

This post by no means covers every aspect that may be relevant when deploying RabbitMQ to Kubernetes; our goal is to highlight the most important parts. Deployment- and workload-specific decisions such as what resource limits to apply to RabbitMQ node pod (containers), what kind of durable storage to use, how to approach TLS certificate/key pair rotation, log aggregation, and upgrades are great topics for separate blog posts. Let us know what you'd like to see in a follow-up!

Executable Examples

The files that accompany this post can be found in the DIY RabbitMQ on Kubernetes example repository. This post uses a Google Kubernetes Engine (GKE) cluster but Kubernetes concepts are universal.

To follow along the examples,

This post assumes that the reader is familiar with kubectl usage basics and the tool is set up to work with a GKE cluster.

RabbitMQ Docker Image

We recommend using the community RabbitMQ Docker image. The image is maintained by the Docker Community and is built with the latest versions of RabbitMQ, Erlang and OpenSSL. The image has a variant built with RabbitMQ release candidates for early testing and adoption.

Now let's begin with the first building block of a RabbitMQ cluster running on Kubernetes: picking a namespace to deploy to.

Kubernetes Namespace and Permissions (RBAC)

Every set of Kubernetes objects belongs to a Kubernetes Namespace. RabbitMQ cluster resources are no exception.

We recommend using a dedicated Namespace to keep the RabbitMQ cluster separate from other services that may be deployed in the Kubernetes cluster. Having a dedicated namespace makes logical sense and makes it easy to grant just enough permissions to the cluster nodes. This is a good security practice.

RabbitMQ's Kubernetes peer discovery plugin relies on the Kubernetes API as a data source. On first boot, every node will try to discover their peers using the Kubernetes API and attempt to join them. Nodes that finish booting emit a Kubernetes event to make it easier to discover such events in cluster activity (event) logs.

The plugin requires the following access to Kubernetes resources:

  • get access to the endpoints resource
  • create access to the events resource

Specify a Role, Role Binding and a Service Account to configure this access.

An example namespace, along with RBAC rules can be seen in the rbac.yaml example file.

If following from the example, use the following command to create a namespace and the required RBAC rules. Note that this creates a namespace called test-rabbitmq.

The kubectl examples below will use the test-rabbitmq namespace. This namespace can be set to be the default one for convenience:

Alternatively, --namespace="test-rabbitmq" can be appended to all kubectl commands demonstrated below.

Use a Stateful Set

RabbitMQ requires using a Stateful Set to deploy a RabbitMQ cluster to Kubernetes. The Stateful Set ensures that the RabbitMQ nodes are deployed in order, one at a time. This avoids running into a potential peer discovery race condition when deploying a multi-node RabbitMQ cluster.

There are other, equally important reasons for using a Stateful Set instead of a Deployment: sticky identity, simple network identifiers, stable persistent storage and the ability to perform ordered rolling upgrades.

The Stateful Set definition file is packed with detail such as mounting configuration, mounting credentials, opening ports, etc, which is explained topic-wise in the following sections.

The final Stateful Set file can be found in the under gke directory.

Create a Service For Clustering and CLI Tools

The Stateful Set definition can reference a Service which gives the Pods of the Stateful Set their network identity. Here, we are referring to the v1.StatefulSet.Spec.serviceName property.

This is required by RabbitMQ for clustering, and as mentioned in the Kubernetes documentation, has to be created before the Stateful Set.

RabbitMQ uses port 4369 for port 4369 for node discovery and port 25672 for inter-node communication. Since this Service is used internally and does not need to be exposed, we create a Headless Service. It can be found in the example headless-service.yaml file.

If following from the example, run the following to create a Headless Service for inter-node and CLI tool traffic:

The service now can be observed in the test-rabbitmq namespace:

Use a Persistent Volume for Node Data

In order for RabbitMQ nodes to retain data between Pod restarts, node's data directory must use durable storage. A Persistent Volume must be attached to each RabbitMQ Pod.

If a transient volume is used to back a RabbitMQ node, the node will lose its identity and all of its local data in case of a restart. This includes both schema and durable queue data. Syncing all of this data on every node restart would be highly inefficient. In case of a loss of quorum during a rolling restart, this will also lead to data loss.

In our statefulset.yaml example, we create a Persistent Volume Claim to provision a Persistent Volume.

The Persistent Volume is mounted at /var/lib/rabbitmq/mnesia. This path is used for a RABBITMQ_MNESIA_BASE location: the base directory for all persistent data of a node.

A description of default file paths for RabbitMQ can be found in the RabbitMQ documentation.

Node's data directory base can be changed using the RABBITMQ_MNESIA_BASE variable if needed. Make sure to mount a Persistent Volume at the updated path.

Node Authentication Secret: the Erlang Cookie

RabbitMQ nodes and CLI tools use a shared secret known as the Erlang Cookie, to authenticate to each other. The cookie value is a string of alphanumeric characters up to 255 characters in size. The value must be generated before creating a RabbitMQ cluster since it is needed by the nodes to form a cluster.

With the community Docker image, RabbitMQ nodes will expect the cookie to be at /var/lib/rabbitmq/.erlang.cookie. We recommend creating a Secret and mounting it as a Volume on the Pods at this path.

This is demonstrated in the statefulset.yaml example file.

The secret is expected to have the following key/value pair:

To create a cookie Secret, run

This will create a Secret with a single key, cookie, taken from the file name, and the file contents as its value.

Administrator Credentials

RabbitMQ will seed a default user with well-known credentials on first boot. The username and password of this user are both guest.

This default user can only connect from localhost by default. It is possible to lift this restriction by opting in. This may be useful for testing but very insecure. Instead, an administrative user must be created using generated credentials.

The administrative user credentials should be stored in a Kubernetes Secret, and mounting them onto the RabbitMQ Pods. The RABBITMQ_DEFAULT_USER and RABBITMQ_DEFAULT_PASS environment variables then can be set to the Secret values. The community Docker image will use them to override default user credentials.

Example for reference.

The secret is expected to have the following key/value pair:

To create an administrative user Secret, use

This will create a Secret with two keys, user and pass, taken from the file names, and file contents as their respective values.

Users can be create explicitly using CLI tools as well. See RabbitMQ doc section on user management to learn more.

Node Configuration

There are several ways to configure a RabbitMQ node. The recommended way is to use configuration files.

Configuration files can be expressed as Config Maps, and mounted as a Volume onto the RabbitMQ pods.

To create a Config Map with RabbitMQ configuration, apply our minimal configmap.yaml example:

Use an Init Container

Since Kubernetes 1.9.4, Config Maps are mounted as read-only volumes onto Pods. This is problematic for the RabbitMQ community Docker image: the image can try to update the config file at the time of container startup.

Thus, the path at which the RabbitMQ config is mounted must be read-write. If a read-only file is detected by the Docker image, you'll see the following warning:

While the Docker image does work around the issue, it is not ideal to store the configuration file in /tmp and we recommend instead making the mount path read-write.

As a few other projects in the Kubernetes community, we use an init container to overcome this.

Examples:

Run The Pod As the rabbitmq User

The Docker image runs as the rabbitmq user with uid 999 and writes to the rabbitmq.conf file. Thus, the file permissions on rabbitmq.conf must allow this. A Pod Security Context can be added to the Stateful Set definition to achieve this. Set the runAsUser, runAsGroup and the fsGroup to 999 in the Security Context.

See Security Context in the Stateful Set definition file.

Importing Definitions

RabbitMQ nodes can importi definitions exported from another RabbitMQ cluster. This may also be done at node boot time.

Following from the RabbitMQ documentation, this can be done using the following steps:

  1. Export definitions from the RabbitMQ cluster you wish to replicate and save the file
  2. Create a Config Map with the key being the file name, and the value being the contents of the file (See the rabbitmq.conf Config Map example)
  3. Mount the Config Map as a Volume on the RabbitMQ Pod in the Stateful Set definition
  4. Update the rabbitmq.conf Config Map with load_definitions = /path/to/definitions/file

Readiness Probe

Kubernetes uses a check known as the readiness probe to determine if a pod is ready to serve client traffic. This is effectively a specialized health check defined by the system operator.

When an ordered pod deployment policy is used — and this is the commended option for RabbitMQ clusters — the probe controls when the Kubernetes controller will consider the currently deployed pod to be ready and proceed to deploy the next one. This check, if not chosen appropriately, can deadlock a rolling cluster node restart.

RabbitMQ nodes that belong to a clsuter will attempt to sync schema from their peers on startup. If no peer comes online within a configurable time window (five minutes by default), the node will give up and voluntarily stop. Before the sync is complete, the node won't mark itself as fully booted.

Therefore, if a readiness probe assumes that a node is fully booted and running, a rolling restart of RabbitMQ node pods using such probe will deadlock: the probe will never succeed, and will never proceed to deploy the next pod, which must come online for the original pod to be considered ready by the deployment.

It is therefore recommended to use a very basic RabbitMQ health check for readiness probe:

While this check is not thorough, it allows all pods to be started and re-join the cluster within a certain time period, even when pods are restarted one by one, in order.

This is covered in a dedicated section of the RabbitMQ clustering guide: Restarts and Health Checks (Readiness Probes).

The readiness probe section in the Stateful Set definition file demonstrates how to configure a readiness probe.

Liveness Probe

Similarly to the readiness probe described above, Kubernetes allows for pod health checks using a different health check called the liveness probe. The check determines if a pod must be restarted.

As with all health checks, there is no single solution that can be recommended for all deployments. Health checks can produce false positives, which means reasonably healthy, operational nodes will be restarted or even destroyed and re-created for no reason, reducing system availability.

Moreover, a RabbitMQ node restart won't necessarily address the issue. For example, restarting a node that is in an alarmed state because it is low on available disk space won't help.

All this is to say that liveness probes must be chosen wisely and with false positives and "recoverability by a restart" taken into account. Liveness probes also must use node-local health checks instead of cluster-wide ones.

RabbitMQ CLI tools provide a number of pre-defined health checks that vary in how thorough they are, how intrusive they are and how likely they are to produce false positives in different scenarios, e.g. when the system is under load. The checks are composable and can be combined. The right liveness probe choice is a system-specific decision. When in doubt, start with a simpler, less intrusive and less thorough option such as

The following checks can be reasonable liveness probe candidates:

Note, however, that they will fail for the nodes paused by the "pause minority" partition handliner strategy.

The liveness probe section in the Stateful Set definition file demonstrates how to configure a liveness probe.

Plugins

RabbitMQ supports plugins. Some plugins are essential when running RabbitMQ on Kubernetes, e.g. the Kubernetes-specific peer discovery implementation.

The rabbitmq_peer_discovery_k8s plugin is required to deploy RabbitMQ on Kubernetes. It is quite common to also enable rabbitmq_management plugin in order to get a browser-based management UI and an HTTP API, and rabbitmq_prometheus for monitoring.

Plugins can be enabled in different ways. We recommend mounting the plugins file, enabled_plugins, to the node configuration directory, /etc/rabbitmq. A Config Map can be used to express the value of the enabled_plugins file. It can then be mounted as a Volume onto each RabbitMQ container in the Stateful Set definition.

In our configmap.yaml example file, we demonstrate how to popular the the enabled_plugins file and mount it under the /etc/rabbitmq directory.

Ports

The final consideration for the Stateful Set is the ports to open on the RabbitMQ Pods. Protocols supported by RabbitMQ are all TCP-based and require the protocol ports to be opened on the RabbitMQ nodes. Depending on the plugins that are enabled on a node, the list of required ports can vary.

The example enabled_plugins file mentioned above enables a few plugins: rabbitmq_peer_discovery_k8s (mandatory), rabbitmq_management and rabbitmq_prometheus. Therefore, the service must open several ports relevant for the core server and the enabled plugins:

  • 5672: used by AMQP 0-9-1 and AMQP 1.0 clients
  • 15672: management UI and HTTP API)
  • 15692: Prometheus scraping endpoint)

Deploy the Stateful Set

These are the key components in the Stateful Set file. Please have a look at the file, and if following from the example, deploy the Stateful Set:

This will start spinning up a RabbitMQ cluster. To watch the progress:

Create a Service for Client Connections

If all the steps above succeeded, you should have functioning RabbitMQ cluster deployed on Kubernetes! ? However, having a RabbitMQ cluster on Kubernetes is only useful clients can connect to it.

Time to create a Service to make the cluster accessible to client connections.

The type of the Service depends on your use case. The Kubernetes API reference gives a good overview of the types of Services available.

In the client-service.yaml example file, we have gone with a LoadBalancer Service. This gives us an external IP that can be used to access the RabbitMQ cluter.

For example, this should make it possible to visit the RabbitMQ management UI by visiting {external-ip}:15672, and signing in. Client applications can connect to endpoints such as {external-ip}:5672 (AMQP 0-9-1, AMQP 1.0) or {external-ip}:1883 (MQTT). Please refer to the get started guide to learn how to use RabbitMQ.

If following from the example, run

to create a Service of type LoadBalancer with an external IP address. To find out what the external IP address is, use kubectl get svc:

Resource Usage and Limits

Container resource management is a topic that deserves its own post. Capacity planning recommendations are entirely workload-, environment- and system-specific. Optimal values are usually found via extensive monitoring of the system, trial, and error. However, when picking the limits and resource allocation settings, consider a few RabbitMQ-specific things.

Use the Latest Major Erlang Release

RabbitMQ runs on the Erlang runtime. Recent Erlang/OTP releases have introduced a number of improvements highly relevant to the users who run RabbitMQ on Kubernetes:

  • In Erlang 22, inter-node communication [latency and head-of-line blocking(http://blog.erlang.org/OTP-22-Highlights/) have been significantly reduced. In earlier versions, link congestion was known to make cluster node heartbeat false positives likely.
  • In Erlang 23, the runtime will respect the container CPU quotas when computing the default number of schedulers to start. This means that nodes will respect the Kubernetes-managed CPU resource limits.

Docker community image for RabbitMQ ships with Erlang 23 at the time of writing. Users of custom Docker images are highly recommended to provision Erlang 23 as well.

CPU Resource Usage

RabbitMQ was designed for workloads that involve multiple queues and where a node serves multiple clients at the same time. Nodes will generally use all the CPU cores allowed without any explicit configuration. As the number of cores grows, some tuning may be necessary to reduce CPU context switching.

How CPU time is spent can be monitored via the runtime thread activity metrics which are also exposed via the RabbitMQ Prometheus plugin.

If RabbitMQ pods hover around their CPU resource allowance and experience throttling in environments with a large number of relatively idle clients, the load likely can be reduced with a modest amount of configuration.

Memory Limits

RabbitMQ uses the concept of a runtime memory high watermark. By default a node will use 40% of detected (available) memory as the watermark. When the watermark is crossed, publishers across the entire cluster will be blocked and more aggressive paging out to disk initiated. The watermark value may seem like a memory quota on Kubernetes at first but there is an important difference: RabbitMQ resource alarms assume a node can typically recover from this state. For example, a large backlog of messages will eventually be consumed.

Kubernetes memory limits are enforced by the OOM killer: no recovery is expected. This means that a RabbitMQ node's high memory watermark must be lower than the memory limit imposed on the node container. Kubernetes deployments should use the relative watermark values in the recommended range.

Memory usage breakdown data should be used to determine what consumes most memory on the node.

Disk Usage

We highly recommend overprovisioning the disk space available to RabbitMQ containers. A node that has run out of disk space won't always be able to recover from such an event. Such nodes must be decomissioned and replaced.

Consider Available Network Link Bandwidth

Finally, consider what kind of links and Kubernetes networking options are used for inter-node communication. Network link congestion can be a significant limiting factor to system throughput and affect its availability.

Below is a very simplistic formula to calculate the amount of bandwidth needed by a workload, in bits:

Therefore a workload with average message size of 3 kiB and expected peak message rate of 20K messages a second can consume up to

of bandwidth.

Team RabbitMQ maintains a Grafana dashboard for inter-node communication link metrics.

Using rabbitmq-perf-test to Run a Functional and Load Test of the Cluster

RabbitMQ comes with a load simulation tool, PerfTest, which can be executed from outside of a cluster or deployed to Kubernetes using the perf-test public docker image. Here's an example of how the image can be deployed to a Kubernetes cluster

Here the {username} and {password} are the user credentials, e.g. those set up in the rabbitmq-admin Secret. The {serivce} is the hostname to connect to. We use the name of the client service that will resolve as a hostname when deployed.

The above kubectl run command will start a PerfTest pod which can be observed in

For a functioning RabbitMQ cluster, running kubectl logs -f {perf-test-pod-name} where {perf-test-pod-name} is the name of the pod as reported by kubectl get pods, will produce output similar to this:

To learn more about PerfTest, its settings, capabilities and output, see the PerfTest doc guide.

PerfTest is not meant to be running permanently. To tear down the perf-test pod, use

Monitoring the Cluster

Monitoring is a critically important part of any production deployment.

RabbitMQ comes with in-built support for Prometheus. To enable it, enable the rabbitmq_prometheus plugin. This in turn can be done by adding rabbitmq_promethus to the enabled_plugins Config Map as explained above.

The Prometheus scraping port, 15972, must be open on both the Pod and the client Service.

Node and cluster metrics can be visualised with Grafana.

Alternative Option: the Kubernetes Cluster Operator for RabbitMQ

As this post demonstrates, there are quite a few parts involved in hosting a stateful data services such as RabbitMQ on Kubernetes. It may seem like a daunting task. There are several alternatives to this kind of DIY deployment demonstrated in this post.

Team RabbitMQ at VMware has open sourced a Kubernetes Operator pattern implementation for RabbitMQ. As of August 2020, this is a young project under active development. While it currently has limitations, it is our recommended option over the manual DIY setup demonstrated in this post.

See RabbitMQ Cluster Operator for Kubernetes to learn more. The project is developed in the open at rabbitmq/cluster-operator on GitHub. Give it a try and let us know how it goes. Besides GitHub, two great venues for providing feedback to the team behind the Operator are the RabbitMQ mailing list and the #kubernetes channel in RabbitMQ community Slack.