Menu

This Month in RabbitMQ, June 2020 Recap

July 30th, 2020 by Michael Klishin

This month in RabbitMQ features the release of the RabbitMQ Cluster Kubernetes Operator, benchmarks and cluster sizing case studies by Jack Vanlightly (@vanlightly), and a write up of RabbitMQ cluster migration by Tobias Schoknecht (@tobischo), plus lots of other tutorials by our vibrant community!

Project Updates

Community Writings and Resources

Learn More

Ready to learn more? Check out these upcoming opportunities to learn more about RabbitMQ

  • Udemy is offering “Learn RabbitMQ: In-Depth Concepts from Scratch with Demos” for $13.99, an 85% discount. You can’t afford not to learn RabbitMQ at this price!

Disaster Recovery and High Availability 101

July 7th, 2020 by Jack Vanlightly

In this post I am going to cover perhaps the most commonly asked question I have received regarding RabbitMQ in the enterprise.

How can I make RabbitMQ highly available and what architectures/practices are recommended for disaster recovery?
RabbitMQ offers features to support high availability and disaster recovery but before we dive straight in I’d like to prepare the ground a little. First I want to go over Business Continuity Planning and frame our requirements in those terms. From there we need to set some expectations about what is possible. There are fundamental laws such as the speed of light and the CAP theorem which both have serious impacts on what kind of DR/HA solution we decide to go with.

Finally we’ll look at the RabbitMQ features available to us and their pros/cons.

Read the rest of this entry »

This Month in RabbitMQ, May 2020 Recap

June 30th, 2020 by Michael Klishin

This month, Jack Vanlightly continues his blog series on Quorum Queues in RabbitMQ. Also, be sure to watch the replay of his related webinar.

Finally, Episode 5 of TGI RabbitMQ is out -- Gerhard Lazu walks us through how to run RabbitMQ on Kubernetes. Don’t miss!

Project Updates

  • RabbitMQ 3.8.4 was released in late May, the first release to feature Erlang 23 compatibility. Three weeks later 3.8.5 followed with complete Erlang 23 support.
  • Docker community-maintained RabbitMQ image has adopted Erlang 23 in less than two weeks since its release
  • rabbit-hole, the most popular Go RabbitMQ HTTP API client, has reached version 2.2.0
  • Merged an impressive pull request from GitHub user @joseliber that fixed the generation of password-encrypted certificates in the tls-gen project. This project is used by RabbitMQ, its client libraries, and other projects to easily generate self-signed certificates.

    Read the rest of this entry »

How quorum queues deliver locally while still offering ordering guarantees

June 23rd, 2020 by Jack Vanlightly

The team was recently asked about whether and how quorum queues can offer the same message ordering guarantees as classic queues given that they will deliver messages from a local queue replica (leader or follower) when possible. Mirrored queues always deliver from the master (the leader), so delivering from any queue replica sounds like it could impact those guarantees. 

That is the subject of this post. Be warned, this post is a technical deep dive for the curious and the distributed systems enthusiast. We’ll take a look at how quorum queues can deliver messages from any queue replica, leader or follower, without additional coordination (extra to Raft) but maintaining message ordering guarantees.

Read the rest of this entry »

Cluster Sizing Case Study – Quorum Queues Part 2

June 18th, 2020 by Jack Vanlightly

In the last post we started a sizing analysis of our workload using quorum queues. We focused on the happy scenario that consumers are keeping up meaning that there are no queue backlogs and all brokers in the cluster are operating normally. By running a series of benchmarks modelling our workload at different intensities we identified the top 5 cluster size and storage volume combinations in terms of cost per 1000 msg/s per month.

  1. Cluster: 7 nodes, 8 vCPUs (c5.2xlarge), gp2 SDD. Cost: $54
  2. Cluster: 9 nodes, 8 vCPUs (c5.2xlarge), gp2 SDD. Cost: $69
  3. Cluster: 5 nodes, 8 vCPUs (c5.2xlarge), st1 HDD. Cost: $93
  4. Cluster: 5 nodes, 16 vCPUs (c5.4xlarge), gp2 SDD. Cost: $98
  5. Cluster: 7 nodes, 16 vCPUs (c5.4xlarge), gp2 SDD. Cost: $107
 

There are more tests to run to ensure these clusters can handle things like brokers failing and large backlogs accumulating during things like outages or system slowdowns.

All quorum queues are declared with the following properties:

  • x-quorum-initial-group-size=3
  • x-max-in-memory-length=0
 

The x-max-in-memory-length property forces the quorum queue to remove message bodies from memory as soon as it is safe to do. You can set it to a longer limit, this is the most aggressive - designed to avoid large memory growth at the cost of more disk reads when consumers do not keep up. Without this property message bodies are kept in memory at all times which can place memory growth to the point of memory alarms setting off which severely impacts the publish rate - something we want to avoid in this workload case study.

Read the rest of this entry »

Cluster Sizing Case Study – Quorum Queues Part 1

June 18th, 2020 by Jack Vanlightly

In a first post in this sizing series we covered the workload, the tests, and the cluster and storage volume configurations on AWS ec2. In this post we’ll run a sizing analysis with quorum queues. We also ran a sizing analysis on mirrored queues.

In this post we'll run the increasing intensity tests that will measure our candidate cluster sizes at varying publish rates, under ideal conditions. In the next post we'll run resiliency tests that measure whether our clusters can handle our target peak load under adverse conditions.

All quorum queues are declared with the following properties:

  • x-quorum-initial-group-size=3 (replication factor)
  • x-max-in-memory-length=0
 

The x-max-in-memory-length property forces the quorum queue to remove message bodies from memory as soon as it is safe to do. You can set it to a longer limit, this is the most aggressive - designed to avoid large memory growth at the cost of more disk reads when consumers do not keep up. Without this property message bodies are kept in memory at all times which can place memory growth to the point of memory alarms setting off which severely impacts the publish rate - something we want to avoid in this workload case study.

Read the rest of this entry »

Cluster Sizing Case Study – Mirrored Queues Part 2

June 18th, 2020 by Jack Vanlightly

In the last post we started a sizing analysis of our workload using mirrored queues. We focused on the happy scenario that consumers are keeping up meaning that there are no queue backlogs and all brokers in the cluster are operating normally. By running a series of benchmarks modelling our workload at different intensities we identified the top 5 cluster size and storage volume combinations in terms of cost per 1000 msg/s per month.

  1. Cluster: 5 nodes, 8 vCPUs, gp2 SDD. Cost: $58
  2. Cluster: 7 nodes, 8 vCPUs, gp2 SDD. Cost: $81
  3. Cluster: 5 nodes, 8 vCPUs, st1 HDD. Cost: $93
  4. Cluster: 5 nodes, 16 vCPUs, gp2 SDD. Cost: $98
  5. Cluster: 9 nodes, 8 vCPUs, gp2 SDD. Cost: $104
 

There are more tests to run to ensure these clusters can handle things like brokers failing and large backlogs accumulating during things like outages or system slowdowns.

Read the rest of this entry »

Cluster Sizing Case Study – Mirrored Queues Part 1

June 18th, 2020 by Jack Vanlightly

In a first post in this sizing series we covered the workload, cluster and storage volume configurations on AWS ec2. In this post we’ll run a sizing analysis with mirrored queues.

The first phase of our sizing analysis will be assessing what intensities each of our clusters and storage volumes can handle easily and which are too much.

All tests use the following policy:

  • ha-mode: exactly
  • ha-params: 2
  • ha-sync-mode: manual
Read the rest of this entry »

Cluster Sizing and Other Considerations

June 18th, 2020 by Jack Vanlightly

This is the start of a short series where we look at sizing your RabbitMQ clusters. The actual sizing wholly depends on your hardware and workload, so rather than tell you how many CPUs and how much RAM you should provision, we’ll create some general guidelines and use a case study to show what things you should consider.

Common Questions

What is the best combination of VM size and VM count for your RabbitMQ cluster? Should you scale up and go for three nodes with 32 CPU threads? Or should you scale out and go for 9 nodes with 8 CPU threads? What type of disk offers the best value for money? How much memory do you need? Which hardware configuration is better for throughput, latency, cost of ownership?

First of all, there is no single answer. If you run in the cloud then there are fewer options but if you run on-premise then the sheer number of virtualisation, storage and networking products and configurations out there makes this an impossible question.

While there is no single sizing guide with hard numbers, we can go through a sizing analysis and hopefully that will help you with your own sizing.

Read the rest of this entry »

How to run benchmarks

June 4th, 2020 by Jack Vanlightly

There can be many reasons to do benchmarking:

  • Sizing and capacity planning
  • Product assessment (can RabbitMQ handle my load?)
  • Discover best configuration for your workload
In this post we’ll take a look at the various options for running RabbitMQ benchmarks. But before we do, you’ll need a way to see the results and look at system metrics.

Read the rest of this entry »