Production Checklist

Introduction

Data services such as RabbitMQ often have many tunable parameters. Some configurations make a lot of sense for development but are not really suitable for production. No single configuration fits every use case. It is, therefore, important to assess your configuration before going into production. This guide aims to help with that.

Virtual Hosts, Users, Permissions

Virtual Hosts

In a single-tenant environment, for example, when your RabbitMQ cluster is dedicated to power a single system in production, using default virtual host (/) is perfectly fine.

In multi-tenant environments, use a separate vhost for each tenant/environment, e.g. project1_development, project1_production, project2_development, project2_production, and so on.

Users

For production environments, delete the default user (guest). Default user only can connect from localhost by default, because it has well-known credentials. Instead of enabling remote connections, consider using a separate user with administrative permissions and a generated password.

It is recommended to use a separate user per application. For example, if you have a mobile app, a Web app, and a data aggregation system, you'd have 3 separate users. This makes a number of things easier:

  • Correlating client connections with applications
  • Using fine-grained permissions
  • Credentials roll-over (e.g. periodically or in case of a breach)
In case there are many instances of the same application, there's a trade-off between better security (having a set of credentials per instance) and convenience of provisioning (sharing a set of credentials between some or all instances). For IoT applications that involve many clients performing the same or similar function and having fixed IP addresses, it may make sense to authenticate using x509 certificates or source IP addresse ranges.

Resource Limits

RabbitMQ uses Resource-driven alarms to throttle publishers when consumers do not keep up. It is important to evaluate resource limit configurations before going into production.

Memory

By default, RabbitMQ will use up to 40% of available RAM. For nodes that are dedicated to run RabbitMQ, it is often reasonable to raise the limit. However, care should be taken as the OS and file system caches need RAM to operate as well. Failing to do so will have a severe throughput drop due to OS swapping, or even result in the RabbitMQ process terminated by the OS.

Below are some basic guidelines for determining what RAM limit is recommended:

  • At least 128 MB
  • 65% of the configured RAM limit when the limit is up to 4 GB of RAM
  • 70% of the configured RAM limit when the limit is between 4 and 8 GB of RAM
  • 75% of the configured RAM limit when the limit is between 8 and 16 GB of RAM
  • 80% of the configured RAM limit when the limit is above 16 GB of RAM
Values higher than 0.85 can be dangerous and are not recommended.

Free Disk Space

Some free disk space should be available to avoid disk space alarms. By default RabbitMQ requires 50 MiB of free disk space at all times. This improves developer experience on many popular Linux distributions which may place the /var directory on a small partition. However, this is not a value recommended for production environments, since they may have significantly higher RAM limits. Below are some basic guidelines for determining how much free disk space is recommended:

  • At least 2 GB
  • 50% of the configured RAM limit when the limit is between 1 and 8 GB of RAM
  • 40% of the configured RAM limit when the limit is between 8 and 32 GB of RAM
  • 30% of the configured RAM limit when the limit is above 32 GB of RAM
The rabbit.disk_free_limit configuration setting can be set to {mem_relative, N} to make it calculated as a percentage of the RAM limit. For example, use {mem_relative, 0.5} for 50%, {mem_relative, 0.25} for 25%, and so on.

Open File Handles Limit

Operating systems limit maximum number of concurrently open file handles, which includes network sockets. Make sure that you have limits set high enough to allow for expected number of concurrent connections and queues.

Make sure your environment allows for at least 50K open file descriptors for effective RabbitMQ user, including in development environments.

As a rule of thumb, multiple the 95th percentile number of concurrent connections by 2 and add total number of queues to calculate recommended open file handle limit. Values as high as 500K are not inadequate and won't consume a lot of hardware resources, and therefore are recommended for production setups. See Networking guide for more information.

Security Considerations

Users and Permissions

See the section on vhosts, users, and credentials above.

Erlang Cookie

On Linux and BSD systems, it is necessary to restrict Erlang cookie access only to the users that will run RabbitMQ and tools such as rabbitmqctl.

TLS

We recommend using TLS connections when possible, at least to encrypt traffic. Peer verification (authentication) is also recommended. Development and QA environments can use self-signed TLS certificates. Self-signed certificates can be appropriate in production environments when RabbitMQ and all applications run on a trusted network or isolated using technologies such as VMware NSX.

While RabbitMQ tries to offer a secure TLS configuration by default (e.g. SSLv3 is disabled), we recommend evaluating what TLS versions and cipher suites are enabled. Please see our TLS guide for more information.

Networking Configuration

Production environments may require network configuration tuning. Please refer to the Networking Guide for details.

Automatic Connection Recovery

Some client libraries, for example, Java, .NET, and Ruby ones, support automatic connection recovery after network failures. If the client used provides this feature, it is recommended to use it instead of developing your own recovery mechanism.

Clustering Considerations

Cluster Size

When determining cluster size, it is important to take several factors into consideration:

  • Expected throughput
  • Expected replication (number of mirrors)
  • Data locality

Since clients can connect to any node, RabbitMQ may need to perform inter-cluster routing of messages and internal operations. Try making consumers and producers connect to the same node, if possible: this will reduce inter-node traffic. Equally helpful would be making consumers connect to the node that currently hosts queue master (can be inferred using HTTP API). When data locality is taken into consideration, total cluster throughput can reach non-trivial volumes.

For most environments, mirroring to more than half of cluster nodes is sufficient. It is recommended to use clusters with an odd number of nodes (3, 5, and so on).

Partition Handling Strategy

It is important to pick a partition handling strategy before going into production. When in doubt, use the autoheal strategy.