Reasoning About Memory Footprint
Overview
Operators need to be able to reason about node's memory use, both absolute and relative ("what uses most memory"). This is an important aspect of system monitoring.
This guide focusses on reasoning about node's reported (monitored) memory footprint. It is accompanied by a few closely related guides:
RabbitMQ provides tools that report and help analyse node memory use:
rabbitmq-diagnostics memory_breakdown
rabbitmq-diagnostics status
includes the above breakdown as a section- Prometheus and Grafana-based monitoring makes it possible to observe memory breakdown over time
- Management UI provides the same breakdown on the node page as
rabbitmq-diagnostics status
- HTTP API provides the same information as the management UI, useful for monitoring
- rabbitmq-top and
rabbitmq-diagnostics observer
provide a more fine-grained top-like per Erlang process view
Obtaining a node memory breakdown should be the first step when reasoning about node memory use.
Note that all measurements are somewhat approximate, based on values returned by the underlying runtime or the kernel at a specific point in time, usually within a 5 seconds time window.
Running RabbitMQ in Containers and on Kubernetes
When a RabbitMQ runs in an environment where cgroups are used, namely in various containerized environments and on Kubernetes, certain aspects related to memory limits and kernel page cache must be taken into account, in particular in clusters where streams and super streams are used.
Memory Use Breakdown
A RabbitMQ node can report its memory usage breakdown. The breakdown is provided as a list of categories (shown below) and the memory footprint of that category.
Each category is a sum of runtime-reported memory footprint of every process or table of that kind. This means that the connections category is the sum of memory used by all connection processes, the channels category is the sum of memory used by all channel processes, ETS tables is the sum of memory used of all in-memory tables on the node, and so on.
How Memory Breakdown Works
Memory use breakdown reports allocated memory distribution on the target node, by category:
- Connections (further split into four categories: readers, writers, channels, other)
- Quorum queue replicas
- Stream replicas
- Classic queue message store and indices
- Binary heap references
- Node-local metrics (management plugin stats database)
- Internal schema database tables
- Plugins, including protocols that transfer messages, such as Shovel and Federation, and their internal queues
- Memory allocated but not yet used
- Code (bytecode, module metadata)
- ETS (in memory key/value store) tables
- Atom tables
- Other
Generally there is no overlap between the categories (no double accounting for the same memory). Plugins and runtime versions may affect this.
Producing Memory Use Breakdown Using CLI Tools
A common way of producing memory breakdown is via rabbitmq-diagnostics memory_breakdown
.
quorum_queue_procs: 0.4181 gb (28.8%)
binary: 0.4129 gb (28.44%)
allocated_unused: 0.1959 gb (13.49%)
connection_other: 0.1894 gb (13.05%)
plugins: 0.0373 gb (2.57%)
other_proc: 0.0325 gb (2.24%)
code: 0.0305 gb (2.1%)
quorum_ets: 0.0303 gb (2.09%)
connection_readers: 0.0222 gb (1.53%)
other_system: 0.0209 gb (1.44%)
connection_channels: 0.017 gb (1.17%)
mgmt_db: 0.017 gb (1.17%)
metrics: 0.0109 gb (0.75%)
other_ets: 0.0073 gb (0.5%)
connection_writers: 0.007 gb (0.48%)
atom: 0.0015 gb (0.11%)
mnesia: 0.0006 gb (0.04%)
msg_index: 0.0002 gb (0.01%)
queue_procs: 0.0002 gb (0.01%)
reserved_unallocated: 0.0 gb (0.0%)
Report Field | Category | Details |
total | Total amount as reported by the effective memory calculation strategy (see above) | |
connection_readers | Connections | Processes responsible for connection parser and most of connection state. Most of their memory attributes to TCP buffers. The more client connections a node has, the more memory will be used by this category. See Networking guide for more information. |
connection_writers | Connections | Processes responsible for serialisation of outgoing protocol frames and writing to client connection sockets. The more client connections a node has, the more memory will be used by this category. See Networking guide for more information. |
connection_channels | Channels | The more channels client connections use, the more memory will be used by this category. |
connection_other | Connections | Other memory related to client connections |
quorum_queue_procs | Queues | Quorum queue processes, both currently elected leaders and followers. Memory footprint can be capped on a per-queue basis. See the Quorum Queues guide for more information. |
queue_procs | Queues | Classic queue leaders, indices and messages kept in memory. The greater the number of messages enqueued, the more memory will generally be attributed to this section. However, this greatly depends on queue type and properties. See Memory, Classic Queues for more information. |
metrics | Stats DB | Node-local metrics. The more connections, channels, queues are node hosts, the more stats there are to collect and keep. See management plugin guide for more information. |
stats_db | Stats DB | Aggregated and pre-computed metrics, inter-node HTTP API request cache and everything else related to the stats DB. See management plugin guide for more information. |
binaries | Binaries | Runtime binary heap. Most of this section is usually message bodies and properties (metadata). |
plugins | Plugins | Plugins such as Shovel, Federation, or protocol implementations such as STOMP can accumulate messages in memory. |
allocated_unused | Preallocated Memory | Allocated by the runtime but not yet used. |
reserved_unallocated | Preallocated Memory | Allocated/reserved by the kernel but not the runtime |
mnesia | Internal Database | Virtual hosts, users, permissions, queue metadata and state, exchanges, bindings, runtime parameters and so on. |
quorum_ets | Internal Database | Raft implementation's WAL and other memory tables. Most of these are periodically moved to disk. |
other_ets | Internal Database | Some plugins can use ETS tables to store their state |
code | Code | Bytecode and module metadata. This should only consume double digit % of memory on blank/empty nodes. |
other | Other | All other processes that RabbitMQ cannot categorise |
Producing Memory Use Breakdown Using Management UI
Management UI can be used to produce a memory breakdown chart. This information is available on the node metrics page that can be accessed from Overview:
On the node metrics page, scroll down to the memory breakdown buttons:
Memory and binary heap breakdowns can be expensive to calculate and are produced on demand when the Update
button is pressed:
It is also possible to display a breakdown of binary heap use by various things in the system (e.g. connections, queues):
Producing Memory Use Breakdown Using HTTP API and curl
It is possible to produce memory use breakdown over HTTP API
by issuing a GET
request to the /api/nodes/{node}/memory
endpoint.
curl -s -u guest:guest http://127.0.0.1:15672/api/nodes/rabbit@mercurio/memory | python -m json.tool
{
"memory": {
"atom": 1041593,
"binary": 5133776,
"code": 25299059,
"connection_channels": 1823320,
"connection_other": 150168,
"connection_readers": 83760,
"connection_writers": 113112,
"metrics": 217816,
"mgmt_db": 266560,
"mnesia": 93344,
"msg_index": 48880,
"other_ets": 2294184,
"other_proc": 27131728,
"other_system": 21496756,
"plugins": 3103424,
"queue_procs": 2957624,
"total": 89870336
}
}
It is also possible to retrieve a relative breakdown using the
GET
request to the /api/nodes/{node}/memory
endpoint.
Note that reported relative values are rounded to integers. This endpoint is
intended to be used for relative comparison (identifying top contributing categories),
not precise calculations.
curl -s -u guest:guest http://127.0.0.1:15672/api/nodes/rabbit@mercurio/memory/relative | python -m json.tool
{
"memory": {
"allocated_unused": 32,
"atom": 1,
"binary": 5,
"code": 22,
"connection_channels": 2,
"connection_other": 1,
"connection_readers": 1,
"connection_writers": 1,
"metrics": 1,
"mgmt_db": 1,
"mnesia": 1,
"msg_index": 1,
"other_ets": 2,
"other_proc": 21,
"other_system": 19,
"plugins": 3,
"queue_procs": 4,
"reserved_unallocated": 0,
"total": 100
}
}