RabbitMQ

An Open Source Messaging Broker That
Just Works

Alexis Richardson
Matthias Radestock
Tony Garnock-Jones


Rabbit Technologies Ltd.


[email protected]

Copyright © 2008 Rabbit Technologies Ltd.

Part II

presented by Matthias Radestock

A Brief Introduction to Erlang/OTP

20 years of history

  • Erlang is not some PhD student's recent pet project - it was born out of real commercial concerns 20 years ago, and has seen continuous development and commercial use over that time.
  • Because for the first few years initial deployments were few and all within Ericsson, the designers of the language had the opportunity and incentive to evolve the language and correct any mistakes.
  • Even now the language and platform are still evolving, but in a very controlled way - Ericsson do eat their own dogfood.

recent surge in interest

  • Massive growth in community over the last couple of years
  • probably because of multi-core, grid / cloud, internet

Erlang Processes

Erlang systems are built out of communicating processes. Here's how processes are created ...

Communication and Concurrency

The Open Telecom Platform

Questions?

Why Erlang/OTP?

Why Erlang/OTP?

AMQP architecture (revisited)

  • concentrate on server for rest of the talk
  • similarities between structure of client and server, in particular when it comes to protocol handling

AMQP architecture (refined)

  • refine relationship between connection and the channels it carries
  • introduction of framing - AMQP commands (create queue, delete queue, publish message, etc) are chopped up into frames for interleaving on transport - useful for e.g. large messages

RabbitMQ server design

  • domain of discourse of AMQP is very similar to Erlang/OTP
  • actual implementation is identical to architecture
  • each of the nodes corresponds to a separate Erlang module, and process (taking into account the multiplicities)
  • only two minor exceptions - mux, which is handled as part of gen_tcp, and mnesia, which is a collection of processes

Why Erlang/OTP?

  • naturalness of the from the AMQP logical architecture to the Erlang-based design of RabbitMQ mapping results in small, readable codebase where there is a direct correspondence between AMQP features and code fragments
  • this makes the code easy to understand, modify and extend - keep up with evolution of spec, try experimental features, build extensions (recall that this is a major focus for RabbitMQ)
So how small is the code base, really? Here are some stats ...

Look, it is tiny!

from left to right:
  • networking stack on top of gen_tcp - mostly generic, i.e. not RabbitMQ-specific
  • reader - dealing with AMQP connection management, demux, error handling
  • framing channel and writer - codec; mostly auto-generated from spec
  • channel, amqqueue, exchanges/routing (on top of mnesia) - the AMQP model; bulk of the code

also
  • clustering - more about that later
  • persistence - messages can be stored durably on disc and survive broker restarts
  • management - CLI for administering RabbitMQ
  • deprecated code is realms and tickets; will disappear from spec soon
summary - ~5000 LOC
but this is not just down to the naturalness of the mapping between the AMQP architecture and RabbitMQ design. Erlang language features play a big part too. Some examples of that next ...

Binaries

decode_method('basic.publish',
              <<F0:16/unsigned,
                F1Len:8/unsigned, F1:F1Len/binary,
                F2Len:8/unsigned, F2:F2Len/binary,
                F3Bits:8>>) ->
    F3 = (F3Bits band 1) /= 0,
    F4 = (F3Bits band 2) /= 0,
    #'basic.publish'{ticket      = F0,
                     exchange    = F1,
                     routing_key = F2,
                     mandatory   = F3,
                     immediate   = F4};
This is some of the auto-generated code; decoding an AMQP method frame into an Erlang data structure
  • length prefixed strings

It is hard to see how this could be any more concise.

Going the other way - packing terms into a binary blob, rather than unpacking - uses exactly the same syntax, and is just as smooth.

List Comprehension and HOF


upmap(F, L) ->
    Parent = self(),
    Ref = make_ref(),
    [receive {Ref, Result} -> Result end
     || _ <- [spawn(fun() -> Parent ! {Ref, F(X)} end)
              || X <- L]].

    
  • Functional programming at work
  • While a lot of code for a presentation, it is not a lot for what it does!
  • upmap - unordered parallel map

Why Erlang/OTP?

Exploiting Parallelism

three different areas of parallelism that arise naturually from AMQP and that RabbitMQ is exploiting, thanks to Erlang's fine-grained parallelism
  • 8-stage pipeline end-to-end between producer and consumer
  • parallelism across connections and channels - with queues acting as synchronisation points (queues are about the only stateful part of the core AMQP model)
  • queues as active processes

Clustering

logical view remains unchanged, but under the covers
  • multiple nodes share routing information
  • queues are distributed
  • messages are routed efficiently between nodes

Why Erlang/OTP?

  • for free, almost
  • the Erlang shell in conjunction with the myriad of OTP tools to inspect, trace, profile, debug, modify a running system
  • augmented by our own command line tools
...on to some examples of both...

Where has all the memory gone?

([email protected])1> [{_, Pid} | _] =
               lists:reverse(
                 lists:sort(
                   [{process_info(P, memory), P}
                    || P <- processes()])).
[{{memory,16434364},<0.160.0>}, ...]
([email protected])2> process_info(Pid, dictionary).
{dictionary,
    [{'$ancestors',
      [rabbit_amqqueue_sup,rabbit_sup,<0.106.0>]},
     {'$initial_call',
      {gen,init_it,
       [gen_server,<0.138.0>,<0.138.0>,
        rabbit_amqqueue_process,
        {amqqueue,
         {resource,<<"/">>,queue,<<"test queue">>},
         false,false,[],[],none},
        []]}}]}
just tell the story

Setting up a RabbitMQ cluster

rabbit2$ rabbitmqctl stop_app
Stopping node [email protected] ...done.
rabbit2$ rabbitmqctl reset
Resetting node [email protected] ...done.
rabbit2$ rabbitmqctl cluster [email protected]
Clustering node [email protected] with [[email protected]] ...
done.
rabbit2$ rabbitmqctl start_app
Starting node [email protected] ...done.

rabbit2$ rabbitmqctl status
Status of node [email protected] ...
[...,
 {nodes,[[email protected],[email protected]]},
 {running_nodes,[[email protected],[email protected]]}]
done.
don't talk through it

Why Erlang/OTP?

You do not need to know anything about Erlang to use RabbitMQ

Making Erlang disappear

  • two aspects - one to do with the protocol, the other with the operational side
  • users write client code, typically using one the available broker-neutral libs; they don't need to care what the server is written in
  • one still needs to get a server up and running, configure it and look after it. Need to hide Erlang there too. Hence ...

Why Erlang/OTP? - Summary


protocol handling is an Erlang/OTP sweet spot

(more detail here)

Part III

presented by Tony Garnock-Jones

Web messaging & the future

Messaging and queueing are ubiquitous. Everybody's doing it.

Everybody's doing it; but noone's got it right -- yet.

The same basic ideas of messaging and queueing keep coming up over and over and over again. There's the problem of idempotency to deal with. There's the problem of ensuring your request is heard to deal with. There're all of the issues of network failure to deal with. Lots of variations on the theme are already in wide use -- not just the email infrastructure, but all the tools listed above and more have been pressed into service as ways of getting messages or lists of work-items delivered between separate parties.

Building our XMPP-AMQP gateway taught us a lot. We're starting to be able to see the shape of the essence of messaging.

Messaging at internet scale

Delegation at the application level, not the transport level. This is what SMTP got wrong: relaying is transport-level delegation. Lifting responsibility-transfer to the application level fixes the issue: the application (not the transport) makes smart decisions about whether to delegate responsibility or not.

Threefold nature of shared queues

Transfer from the ends

Browsing from the side

Responsibility transfer (1/2)

Responsibility transfer (2/2)

Synchronisation

Layering is vital here.

  • The simplest choice of operations at the low levels assembles to support the next layer
  • Proper layering makes implementation easy, as well as pinning down corner-cases in the semantics

The holy grail (1/3)

WebDAV: and by extension, other (networked) file systems
Some of these things -- duplicate detection, for instance -- are application-level issues; others include the decision about how durably to store messages: persistent storage to disk is an option that can be switched on or off on a per-queue basis
http://wiki.secondlife.com/wiki/Chttp
http://www.mnot.net/drafts/draft-nottingham-http-poe-00.txt

The holy grail (2/3)

Protocol Injection Browsing Subscription Responsibility Transfer Duplicate detection Federated / Global Peer-to-peer / Symmetric (vs. Client-Server)
IDEAL Y Y Y Optional Optional Y Y
AMQP Y N [6] Y Y [7] N N Partial [8]
Twitter Y Y Y N ? N N
TCP Y N Y Y Y N Y
XMPP Y N N N [1] N Y Y
Atompub Y Y N Y [3] N Y N
Pubsub Y N Y Partial [4] N Y N
Atom/RSS N Y N N -- Y Y
POE, CHTTP Y N N Y Y Y N
WebDAV Y Y N Y -- Y N
SMTP Y N N Y N Y N
IMAP Partial Y Y Partial [3] -- N N
Mailman N Y Y N -- N N
POP3 N Y N Partial [3] -- N N
Bittorrent N Y N N Y N N
HTTP N N N Y N Y N
Comet N N N N N N N

Talk about: Resp xfer; federation; peer-to-peer

1	There are XEPs that touch on this area
2	The "offline storage" modules often provided act as a kind of buffer
3	Deletion can act as an acceptance of responsibility
4	There's an acknowledgement for publication only, not delivery
5	Aggregators sort of do this
6	Queue browsing is being worked on in the AMQP WG
7	AMQP 0-10 introduces a form of responsibility transfer
8	0-10's message transfers are quasi-symmetric; the wire-protocol is symmetric; the model is not
9	Management and monitoring is being worked on in the AMQP WG

The holy grail (3/3)

Protocol Unicast Routing Multicast / Complex Routing Relaying Queueing / Buffering Presence / Lifecycle Events Management / Monitoring Trust model
IDEAL Y Y N Y Y Y ???
AMQP Y Y N Y N Y [9] --
Twitter Y Y N Y N -- --
TCP N N N Y N Some --
XMPP Y N N N [2] Y N DNS
Atompub N N N Y N -- --
Pubsub N Y N N N N XMPP
Atom/RSS N N [5] N [5] Y N -- --
POE, CHTTP N N N N N -- --
WebDAV N N N N N N --
SMTP Y N Implied N N N complicated
IMAP N N N Y N N --
Mailman N Y N N N N --
POP3 N N N Y N N --
Bittorrent N N N N N N --
HTTP N N N N N N --
Comet N N N N N N --

Talk about: relaying; presence

1	There are XEPs that touch on this area
2	The "offline storage" modules often provided act as a kind of buffer
3	Deletion can act as an acceptance of responsibility
4	There's an acknowledgement for publication only, not delivery
5	Aggregators sort of do this
6	Queue browsing is being worked on in the AMQP WG
7	AMQP 0-10 introduces a form of responsibility transfer
8	0-10's message transfers are quasi-symmetric; the wire-protocol is symmetric; the model is not
9	Management and monitoring is being worked on in the AMQP WG

Data languages (1/3)

AMQP's current wire protocols and data languages perform well, but are missing a few features that will be required as AMQP grows to internet scale.

The ideal data language:

Human-readable concrete syntax is a debuggable representation

Data languages (2/3)

Language Schemas Multiple concrete syntaxes Parseable without schema Metacircular Extensible Unbounded integers (bignums) Good support for binary data
IDEAL Y Optional Y Y Y Y Y
ASN.1 Y Y Y N Y Y Y
Protocol Buffers Y N Y N [5] Y Y Y
XML Y Y Y Y Y Y N
JSON N N Y N Y Y N
SPKI N N N [2] N Y Y Y
AMQP Y N N N N N Y
1	Ordering of key-value pairs within tables is missing; otherwise, it's more-or-less canonical as it stands
2	The "display type" attached to a byte-string compensates partially for this, but numbers are not distinct from byte-strings
3	SPKI doesn't care what interpretation you place on the bytes it packages
4	XML schemas don't line up with common programming-language data types very well (not even RelaxNG)
5	But it could be!
6	There are good tools, but they're commercial. C and C++ are well served; Java less so; other languages still less so.

Data languages (3/3)

Language Canonical form defined Multiple programming-language support In wide use already Unicode support "Efficient" concrete syntax Good tools widely available already Simple to use, understand, implement
IDEAL Y Y Y Y Y Y Y
ASN.1 Y Y Y Y Y N [6] N
Protocol Buffers N Y Y Y Y Y Y
XML Y Y Y Y N Y N [4]
JSON N Y Y Y N Y Y
SPKI Y Y N N [3] Y N Y
AMQP N [1] Y N Y Y N N
1	Ordering of key-value pairs within tables is missing; otherwise, it's more-or-less canonical as it stands
2	The "display type" attached to a byte-string compensates partially for this, but numbers are not distinct from byte-strings
3	SPKI doesn't care what interpretation you place on the bytes it packages
4	XML schemas don't line up with common programming-language data types very well (not even RelaxNG)
5	But it could be!
6	There are good tools, but they're commercial. C and C++ are well served; Java less so; other languages still less so.

Learning from XMPP and AMQP

Implementing our XMPP-AMQP gateway provided a good foundation for the preceding analysis. XMPP and AMQP complement each other: the areas where they differ give strong design and layering clues.

What next for AMQP?

(Conclusion of this segment) We've been looking at the ways messaging is in use on the internet today, and incorporating those patterns and techniques into the development of RabbitMQ and of AMQP itself.

What next for RabbitMQ?


www.rabbitmq.com

RabbitMQ is available; it's in use by customers; grab a copy and try it out! We also have a public server running; join the mailing list to report any problems or suggestions.