Posts Tagged ‘performance’

RabbitMQ Performance Measurements, part 2

Wednesday, April 25th, 2012

Welcome back! Last time we talked about flow control and latency; today let's talk about how different features affect the performance we see. Here are some simple scenarios. As before, they're all variations on the theme of one publisher and one consumer publishing as fast as they can.


RabbitMQ Performance Measurements, part 1

Tuesday, April 17th, 2012

So today I would like to talk about some aspects of RabbitMQ's performance. There are a huge number of variables that feed into the overall level of performance you can get from a RabbitMQ server, and today we're going to try tweaking some of them and seeing what we can see.


Performance of Queues: when less is more

Thursday, October 27th, 2011

Since the new persister arrived in RabbitMQ 2.0.0 (yes, it's not so new anymore), Rabbit has had a relatively good story to tell about coping with queues that grow and grow and grow and reach sizes that preclude them from being able to be held in RAM. Rabbit starts writing out messages to disk fairly early on, and continues to do so at a gentle rate so that by the time RAM gets really tight, we've done most of the hard work already and thus avoid sudden bursts of writes. Provided your message rates aren't too high or too bursty, this should all happen without any real impact on any connected clients.

Some recent discussion with a client made us return to what we'd thought was a fairly solved problem and has prompted us to make some changes. (more…)

Very fast and scalable topic routing – part 2

Monday, March 28th, 2011

In our previous blog post we talked about a few approaches to topic routing optimization and described the two more important of these in brief. In this post, we will talk about a few things we tried when implementing the DFA, as well as some performance benchmarking we have done on the trie and the DFA. (more…)

Very fast and scalable topic routing – part 1

Tuesday, September 14th, 2010

Among other things, lately we have been preoccupied with improving RabbitMQ's routing performance. In particular we have looked into speeding up topic exchanges by using a few well-known algorithms as well as some other tricks. We were able to reach solutions many times faster than our current implementation.

First, a little about the problem we are trying to solve. Here is a quote from the AMQP 0-9-1 spec:

The topic exchange type works as follows:
  • 1. A message queue binds to the exchange using a routing pattern, P.
  • 2. A publisher sends the exchange a message with the routing key R.
  • 3. The message is passed to the message queue if R matches P.
The routing key used for a topic exchange MUST consist of zero or more words delimited by dots. Each word may contain the letters A-Z and a-z and digits 0-9.

The routing pattern follows the same rules as the routing key with the addition that * matches a single word, and # matches zero or more words. Thus the routing pattern *.stock.# matches the routing keys usd.stock and eur.stock.db but not stock.nasdaq.

Our goal is to match messages (routing keys) against bindings (patterns) in a fast and scalable manner.

Here is a list of approaches that we tried out:

  • 1. Caching messages' topics on a per-word basis. This is what the AMQP spec suggests and there are some studies on this already.
  • 2. Indexing patterns on a per-word basis. This is similar with 1, except we prepare the patterns beforehand, rather than preparing for topic keys that have been previously sent.
  • 3. Trie implementation. Arrange the words in the patterns in a trie structure and follow a route down the trie to see if a particular topic matches.
  • 4. A deterministic finite automate (DFA) implementation. This is a well-known approach for string matching, in general.
Each of these approaches have pros and cons. We generally aimed for:
  • good complexity in both space and time, to make it scalable
  • ease of implementation
  • good performance for the commonly used situations
  • good worst-case performance
  • making it quick in the simple cases (where scalability in number of bindings is not a concern)
From the start, we were able to beat the current implementation by a factor of 3 times (in all cases) just by being more careful when splitting the keys into words (not repeating splitting both the pattern and the topic for each pattern, every time).

We found approaches 1 and 2 to be particularly unfit for the needs. They were the slowest, they do not have a good complexity, because they involve intersecting sets for each level, and they can not be adapted to include functionality for "#". Thus, we concentrated our attention on approaches 3 and 4.

The trie

Here is an example of a trie structure, if we were to add patterns "a.b.c", "a.*.b.c", "a.#.c", "b.b.c":

In order to match a pattern (say for example "a.d.d.d.c"), we start at root and follow the topic string down the tree word by word. We can go deeper either through an exact match, a "*" or a "#". In the case of the "#" we can go deeper with all the versions of the tail of the topic. For our example, we would go through "#" with "d.d.d.c", "d.d.c", "d.c", "c" and "".

The trie implementation has a number of advantages: good size complexity; adding a new binding is cheap; and it is the easiest to implement; but, also the disadvantage that it backtracks for "*" and "#", in order to find all possible matches.


This approach is based on constructing an NFA that accepts the patterns of the bindings, and from it constructing the equivalent DFA and using it instead. Since we are also interested in which pattern matches and not only if it matches or not, we cannot merge the tails of the patterns in the NFA.

To construct the DFA, we modeled the behaviour of "#" like this:

For example, the patterns "a.b.c", "a.*.b.c", "a.#.c", "b.b.c" would be represented in an NFA like this:

The nodes 11, 4, 6 and 8 would have information attached to them which would point to the respective bindings.

In order to convert the NFA to a DFA, we tried various approaches and went as far as generating source code for the structures behind the graphs, to make it as fast as possible. The best solution we ended up with was building the DFA on the fly, the same way it is built in good regular expressions compilers (see for example this article).

The advantage of the DFA approach is that there is no need to backtrack, once the DFA has been built. On the other hand, there are quite a number of disadvantages: it occupies significantly more memory than the trie; there is a significant cost for adding new bindings, since the entire DFA has to be dropped and rebuilt; and it is more complex and therefore harder to implement and maintain.

In the following articles we will present more details about the two structures, how they performed in benchmarks, their space and time complexities and the details behind the DFA optimizations that we have tried.

To be continued.