Menu

Posts Tagged ‘management’

Simplifying rolling upgrades between minor versions with feature flags

Tuesday, April 23rd, 2019

In this post we will cover feature flags, a new subsystem in RabbitMQ 3.8. Feature flags will allow a rolling cluster upgrade to the next minor version, without requiring all nodes to be stopped before upgrading.

Minor Version Upgrades Today: RabbitMQ 3.6.x to 3.7.x

It you had to upgrade a cluster from RabbitMQ 3.6.x to 3.7.x, you probably had to use one of the following solutions:

  • Deploy a new cluster alongside the existing one (this strategy is known as the blue-green deployment), then migrate data & clients to the new cluster
  • Stop all nodes in the existing cluster, upgrade the last node that was stopped first, then continue upgrading all other nodes, one-by-one

Blue-green deployment strategy is low risk but also fairly complex to automate. On the other hand, a cluster-wide shutdown affects availability. Feature flags are meant to provide a 3rd option by making rolling cluster upgrades possible and reasonably easy to automate.

The Feature Flags Subsystem

Feature flags indicate a RabbitMQ node's capabilities to its cluster peers. Previously nodes used versions to assess compatibility with cluster versions. There are many ways in which nodes in a distributed system can become incompatible, including Erlang and dependency versions. Many of those aspects are not reflected in a set of version numbers. Feature flags is a better approach as it can reflect more capabilities of a node, whether it is a particular feature or internal communication protocol revision. In fact, with some message protocols RabbitMQ supports clients have a mechanism for clients to indicate their capabilities. This allows client libraries to evolve and be upgraded independently of RabbitMQ nodes.

For example, RabbitMQ 3.8.0 introduces a new queue type, quorum queues. To implement them, an internal data structure and a database schema were modified. This impacts the communication with other nodes because the data structure is exchanged between nodes, and internal data store schema is replicated to all nodes.

Without the feature flags subsystem, it would be impossible to have a RabbitMQ 3.8.0 node inside a cluster where other nodes are running RabbitMQ 3.7.x. Indeed, the 3.7.x nodes would be unable to understand the data structure or the database schema from 3.8.0 node. The opposite is also true. That's why RabbitMQ today prevents this from happening by comparing versions and by denying clustering when versions are considered incompatible (the policy considers different minor/major versions to be incompatible).

New in RabbitMQ 3.8.0 is the feature flags subsystem: when a single node in a 3.7.x cluster is upgraded to 3.8.0 and restarted, it will not immediately enable the new features or migrate its database schema because the feature flags subsystem told it not to. It could determine this because RabbitMQ 3.7.x supports no feature flags at all, therefore new features or behaviours in RabbitMQ 3.8.0 cannot be used before all nodes in the cluster are upgraded.

After a partial upgrade of a cluster to RabbitMQ 3.8.0, all nodes are acting as 3.7.x nodes with regards to incompatible features, even the 3.8.0 one. In this situation, quorum queues are unavailable. Operator must finish the upgrade by upgrading all nodes. When that is done, the operator can decide to enable the new feature flags provided by RabbitMQ 3.8.0: one of them enables quorum queues. This is done using RabbitMQ CLI tools or management UI and supposed to be performed by deployment automation tools. The idea is that the operator needs to confirm that her cluster doesn't have any remaining RabbitMQ 3.7.x nodes that might rejoin the cluster at a later point.

Once a new feature flag is enabled, it is impossible to add a node that runs an older version to that cluster.

Demo with RabbitMQ 3.8.0

Let's go through a complete upgrade of a RabbitMQ 3.7.x cluster. We will take a look at the feature flags in the process.

We have the following 2-node cluster running RabbitMQ 3.7.12:

We now upgrade node A to RabbitMQ 3.8.0 and restart it. Here is what the management overview page looks like after the node is restarted:

We can see the difference of versions in the list of nodes: their version is displayed just below their node name.

The list of feature flags provided by RabbitMQ 3.8.0 is now available in the management UI on node A:

This page will not exist on node B because it is still running RabbitMQ 3.7.12.

On node A, we see that the quorum_queue feature flag is marked as Unavailable. The reason is that node B (still running RabbitMQ 3.7.12) does not known about quorum_queue feature flag, therefore node A is not allowed to use that new feature flag. This feature flag cannot be enabled until all nodes in the cluster support it.

For instance, we could try to declare a quorum queue on node A, but it is denied:

After node B is upgraded, feature flags are available and they can be enabled. We proceed and enable quorum_queue by clicking the Enable button:

Now, we can declare a quorum queue:

To Learn More

The Feature Flags subsystem documentation describes in greater details how it works and what operators and plugin developers should pay attention to.

Note that feature flags are not a guarantee that a cluster shutdown will never be required for upgrades in the future: the ability to implement a change using a feature flag depends on the nature of the change, and is decided on a case-by-case basis. Correct behaviour of a distributed system may require that all of its components behave in a certain way, and sometimes that means that they have to be upgraded in lockstep.

Please give feature flags a try and let us know what you think on the RabbitMQ mailing list!

rabbitmq-tracing – a UI for the firehose

Friday, September 9th, 2011

While the firehose is quite a cool feature, I always thought that it was a shame we didn't have a simple GUI to go on top and make it accessible to system administrators. So I wrote one. You can download it here.

(more…)

Management plugin – preview release

Tuesday, September 7th, 2010

The previously mentioned management plugin is now in a state where it's worth looking at and testing. In order to make this easy, I've made a special once-only binary release just for the management plugin (in future we'll make binary releases of it just like the other plugins). Download all the .ez files from here and install them as described here, then let us know what you think. (Update 2010-09-22: Note that the plugins referenced in this blog post are for version 2.0.0 of RabbitMQ. We've now released 2.1.0 - for this and subsequent versions you can get the management plugin from here).

After installation, point your browser at http://server-name:55672/mgmt/. You will need to authenticate as a RabbitMQ user (on a fresh installation the user "guest" is created with password "guest"). From here you can manage exchanges, queues, bindings, virtual hosts, users and permissions. Hopefully the UI is fairly self-explanatory.

The management UI is implemented as a single static HTML page which makes background queries to the HTTP API. As such it makes heavy use of Javascript. It has been tested with recent versions of Firefox, Chromium and Safari, and with versions of Microsoft Internet Explorer back to 6.0. Lynx users should use the HTTP API directly 🙂

The management plugin will create an HTTP-based API at http://server-name:55672/api/. Browse to that location for more information on the API. For convenience the documentation can also be obtained from our Mercurial server.

WARNING: The management plugin is still at an early stage of development. You should be aware of the following limitations:

  • Permissions are only enforced sporadically. If a user can authenticate with the HTTP API, they can do anything.
  • Installing the management plugin will turn on fine-grained statistics in the server. This can slow a CPU-bound server by 5-10%.
  • All sorts of other features may be missing or buggy. See the TODO file for more information.
Note: if you want to build the plugin yourself, you should be aware that right now the Erlang client does not work in the default branch, so you need a mix of versions. The following commands should work:
hg clone http://hg.rabbitmq.com/rabbitmq-public-umbrella
cd rabbitmq-public-umbrella
make checkout
hg update -r rabbitmqv200 -R rabbitmq-server
hg update -r rabbitmqv200 -R rabbitmq-codegen
hg update -r rabbitmqv20_0 -R rabbitmq-erlang-client
hg clone http://hg.rabbitmq.com/rabbitmq-management
make
cd rabbitmq-management
make
Of course this will be fixed soon. (Ignore the above, this is fixed.)

Finally, this post would not be complete without some screenshots...

Management, monitoring and statistics

Friday, August 6th, 2010

For a long time the management and monitoring capabilities built into RabbitMQ have consisted of rabbitmqctl. While it's a reasonable tool for management (assuming you like the command line), rabbitmqctl has never been very powerful as a monitoring tool. So we're going to build something better.

Of course, plenty of people don't like command line tools and so several people have built alternate means of managing RabbitMQ. Alice / Wonderland and Spring AMQP are two that leap to mind. However there isn't much standardisation, and people don't always find it very convenient to talk to Rabbit via epmd. So we're going to build something easier.

With that in mind, I'd like to announce the existence of RabbitMQ Management. This is a plugin to provide management and monitoring via a RESTful interface, with a web GUI. Think of it as being like a super-Alice, and also an integration point for any other tools that people want to build.

So RabbitMQ Management will allow easier management and better monitoring. How better? Well, as part of the upcoming rabbit release we've added a statistics gathering feature to the broker. This can count messages as they're published, routed, delivered and acked, on a per channel / queue / exchange basis, so we can determine things like which channels are publishing fast or consuming slowly, which queues are being published to from which exchanges, which connections and hosts are busiest, and so on.

Of course, doing all this extra bookkeeping comes at a cost; when statistics gathering is turned on the server can run around 10% slower (assuming it's CPU-bound; which usually means handling transient messages. If it's IO bound the performance impact will likely be less). At the moment statistics gathering is turned on automatically when the management plugin is installed but we'll make it configurable.

With that, I'd like to add a warning: it's currently at a very early stage of development, and really should not be trusted for anything other than experimentation and playing with. The REST API will change, the UI will change, the plugin might crash your server, and the TODO is currently almost hilariously long. But hopefully this gives you an idea of what we're going to do.