Memory and Disk Alarms
Overview
During operation, RabbitMQ nodes will consume varying amount of memory and disk space based on the workload. When usage spikes, both memory and free disk space can reach potentially dangerous levels. In case of memory, the node can be killed by the operating system's low-on-memory process termination mechanism (known as the "OOM killer" on Linux, for example). In case of free disk space, the node can run out of memory, which means it won't be able to perform many internal operations.
To reduce the likelihood of these scenarios, RabbitMQ has two configurable resource watermarks. When they are reached, RabbitMQ will block connections that publish messages.
More specifically, RabbitMQ will block connections that publish messages in order to avoid being killed by the OS (out-of-memory killer) or exhausting all available free disk space:
- When memory use goes above the configured watermark (limit)
- When free disk space drops below the configured watermark (limit)
Nodes will temporarily block publishing connections by suspending reading from client connection. Connections that are only used to consume messages will not be blocked.
Connection heartbeat monitoring will be deactivated, too.
All network connections will show in rabbitmqctl
and the
management UI as either blocking
, meaning they
have not attempted to publish and can thus continue, or
blocked
, meaning they have published and are now
paused. Compatible clients will be notified
when they are blocked.
Connections that only consume are not blocked by resource alarms; deliveries to them continue as usual.
Client Notifications
Modern client libraries support connection.blocked notification (a protocol extension), so applications can monitor when they are blocked.
Alarms in Clusters
When running RabbitMQ in a cluster, the memory and disk alarms are cluster-wide; if one node goes over the limit then all nodes will block connections.
The intent here is to stop producers but let consumers continue unaffected. However, since the protocol permits producers and consumers to operate on the same channel, and on different channels of a single connection, this logic is necessarily imperfect. In practice that does not pose any problems for most applications since the throttling is observable merely as a delay. Nevertheless, other design considerations permitting, it is advisable to only use individual connections for either producing or consuming.
Effects on Data Safety
When an alarm is in effect, publishing connections will be blocked by TCP back pressure. In practice this means that publish operations will eventually time out or fail outright. Application developers must be prepared to handle such failures and use publisher confirms to keep track of what messages have been successfully handled and processed by RabbitMQ.
Running Out of File Descriptors
When the server is close to using all the file descriptors that the OS has made available to it, it will refuse client connections. See Networking guide to learn more.
Transient Flow Control
When clients attempt to publish faster than the server can accept their messages, they go into transient flow control.