Data services such as RabbitMQ often have many tunable parameters. Some configurations make a lot of sense for development but are not really suitable for production. No single configuration fits every use case. It is, therefore, important to assess your configuration before going into production. This guide aims to help with that.
In a single-tenant environment, for example, when your RabbitMQ cluster is dedicated to power a single system in production, using default virtual host (/) is perfectly fine.
In multi-tenant environments, use a separate vhost for each tenant/environment, e.g. project1_development, project1_production, project2_development, project2_production, and so on.
For production environments, delete the default user (guest). Default user only can connect from localhost by default, because it has well-known credentials. Instead of enabling remote connections, consider using a separate user with administrative permissions and a generated password.
It is recommended to use a separate user per application. For example, if you have a mobile app, a Web app, and a data aggregation system, you'd have 3 separate users. This makes a number of things easier:
RabbitMQ uses Resource-driven alarms to throttle publishers when consumers do not keep up. It is important to evaluate resource limit configurations before going into production.
By default, RabbitMQ will use up to 40% of available RAM. For nodes that are dedicated to run RabbitMQ, it is often reasonable to raise the limit. However, care should be taken as the OS and file system caches need RAM to operate as well. Failing to do so will have a severe throughput drop due to OS swapping, or even result in the RabbitMQ process terminated by the OS.
Below are some basic guidelines for determining what RAM limit is recommended:
Some free disk space should be available to avoid disk space alarms. By default RabbitMQ requires 50 MiB of free disk space at all times. This improves developer experience on many popular Linux distributions which may place the /var directory on a small partition. However, this is not a value recommended for production environments, since they may have significantly higher RAM limits. Below are some basic guidelines for determining how much free disk space is recommended:
Operating systems limit maximum number of concurrently open file handles, which includes network sockets. Make sure that you have limits set high enough to allow for expected number of concurrent connections and queues.
Make sure your environment allows for at least 50K open file descriptors for effective RabbitMQ user, including in development environments.
As a rule of thumb, multiple the 95th percentile number of concurrent connections by 2 and add total number of queues to calculate recommended open file handle limit. Values as high as 500K are not inadequate and won't consume a lot of hardware resources, and therefore are recommended for production setups. See Networking guide for more information.
On Linux and BSD systems, it is necessary to restrict Erlang cookie access only to the users that will run RabbitMQ and tools such as rabbitmqctl.
We recommend using TLS connections when possible, at least to encrypt traffic. Peer verification (authentication) is also recommended. Development and QA environments can use self-signed TLS certificates. Self-signed certificates can be appropriate in production environments when RabbitMQ and all applications run on a trusted network or isolated using technologies such as VMware NSX.
While RabbitMQ tries to offer a secure TLS configuration by default (e.g. SSLv3 is disabled), we recommend evaluating what TLS versions and cipher suites are enabled. Please see our TLS guide for more information.
Production environments may require network configuration tuning. Please refer to the Networking Guide for details.
When determining cluster size, it is important to take several factors into consideration:
Since clients can connect to any node, RabbitMQ may need to perform inter-cluster routing of messages and internal operations. Try making consumers and producers connect to the same node, if possible: this will reduce inter-node traffic. Equally helpful would be making consumers connect to the node that currently hosts queue master (can be inferred using HTTP API). When data locality is taken into consideration, total cluster throughput can reach non-trivial volumes.
For most environments, mirroring to more than half of cluster nodes is sufficient. It is recommended to use clusters with an odd number of nodes (3, 5, and so on).
It is important to pick a partition handling strategy before going into production. When in doubt, use the autoheal strategy.