Menu

High Availability in RabbitMQ: solving part of the puzzle

In RabbitMQ 2.6.0 we introduced Highly Available queues. These necessitated a new extension to AMQP, and a fair amount of documentation, but to date, little has been written on how they work.

High Availability (HA) is a typically over-used term and means different things to different people. In the context of RabbitMQ, there are a number of aspects of high availability, some of which this work does solve and some of which it does not. The things it does not solve include:

  1. Maintaining connections to a RabbitMQ broker or node: using some sort of TCP load-balancer or proxy is the best route here, though other solutions such as dynamically updating DNS entries or just pre-loading your clients with a list of addresses to connect to may work just as well.

  2. Recovery from failure: in the event that a client is disconnected from the broker owing to failure of the node to which the client was connected, if the client was a publishing client, it's possible for the broker to have accepted and passed on messages from the client without the client having received confirmation for them; and likewise on the consuming side it's possible for the client to have issued acknowledgements for messages and have no idea whether or not those acknowledgements made it to the broker and were processed before the failure occurred. In short, you still need to make sure your consuming clients can identify and deal with duplicate messages.

  3. Auto-healing from network partitions or splits. RabbitMQ makes use of the Erlang distributed database Mnesia. This database itself does not cope with network partitions: it very much chooses Consistency and Availability and not Partitions from the CAP triangle. As RabbitMQ depends on Mnesia, RabbitMQ itself has the same properties. Thus the HA work in RabbitMQ can prevent queues from disappearing in the event of a node failure, but does not have anything to say about automatically rejoining the failed node when it is repaired: this still requires manual intervention.

These are not new problems at all; and RabbitMQ's HA work does not attempt to address these problems. Instead, it focuses solely on preventing queues from being bound to a single node in a cluster.

The previous situation was that a queue exists only on one node. If that node fails, the queue becomes unavailable. The HA work solves this by mirroring a queue on other nodes: all actions that occur on the queue's master are intercepted and applied in the same order to each of the slaves within the mirror.

This requires:

  1. The ability to intercept all actions being performed on a queue. Fortunately, the code abstractions we already have makes this fairly easy.

  2. The ability for those actions to be communicated reliably, consistently and in order to all the slaves within the mirror. For this we have written a new guaranteed multicast module (also known as atomic broadcast).

  3. The ability to reliably detect the loss of a node in such a way that no messages sent from that node reach a subset of the slaves: to ensure the members of the mirrored queue stay in sync with each other, it's crucial that in the event of the failure of the master, any messages that the master was in the process of sending to the slaves either fail completely or succeed completely (this is really the atomic in atomic broadcast).

In addition, all this communication between the members of the mirror occurs in an asynchronous fashion. This has advantages such as it prevents the master from being slowed down if one of the slaves starts struggling; but it also has disadvantages such as the complexity of interleavings of actions in the event of failure of the master and promotion of a slave.

Once the master does fail, a slave is chosen for promotion. The slave chosen is the eldest slave, in the belief that it's the most likely to have contents that match the contents of the failed master queue. This is important because currently there is no eager synchronisation of mirrored queues. Thus if you create a mirrored queue, send messages into it, and then add another node which then mirrors that queue, the slave on the new node will not receive the existing messages. Only new messages published to the queue will be sent to all current members of the mirrored queue. Thus by consuming from the queue and thus processing the messages at the head of the queue, the non-fully-mirrored messages will be eliminated. Consequently, by promoting the eldest slave, you minimise the number of messages at the head of the queue that may have only been known to the failed master.

Tags: ,

3 Responses to “High Availability in RabbitMQ: solving part of the puzzle”

  1. joe miller Says:

    Can you elaborate on what kind of manual intervention is required to re-introduce a master to a cluster after it has been repaired? such as: delete the mnesia db on the old master, then start it?

    Does this also have implications for graceful shutdowns of entire clusters, such as for upgrades or planned reboots of nodes? Does a strict ordering need to be observed?

  2. Matthew Sackman Says:

    @joemiller.me

    The manual intervention will be a stop_app/reset/cluster/start_app cycle. However, that'll demand an already-running RabbitMQ which may not be possible - it might refuse to start up. In that case then yes, manually moving the database dir out of the way, then it'll start up, then the stop_app/reset/cluster/start_app cycle.

    Graceful shutdowns should be ok - Rabbit/Mnesia essentially ensure these days that nodes must be started up in the reverse order to that in which they were stopped. However, bear in mind that in the event of a graceful shutdown of the entire cluster, you'll probably find that all the mirrored queues have their masters ending up on the first node that starts (i.e. the last node to be stopped). This is almost certainly not what you want from a load-balancing pov, and right now, there's no way to solve this issue - there is no means to manually (or automatically) select some other slave as the new master in order to rebalance the load.

  3. htoma Says:

    Hi,

    What happens when the master is shut down and then comes back to life? Will it become a slave that syncs to the new master?

    If I have a tcp load balancer or a reverse proxy in front of my cluster, what will happen when the master is down? Is there a way to determine the new master or it doesn't really matter and I can't talk to any of the slaves?