Menu

Troubleshooting Network Connectivity

Introduction

This guide accompanies the one on networking and focuses on troublshooting of network connections. For connections that use TLS there is a separate guide on troubleshooting TLS.

Methodology

Troubleshooting of network connectivity issues is a broad topic. There are entire books written about it. This guide provides some starting points for most common issues.

Networking protocols are layered. So are problems with them. An effective troubleshooting strategy typically uses the process of elimination to pin point the issue (or multiple issues), starting at higher levels. Specifically for messaging technologies, the following steps are often effective and sufficient:

  • Verify client configuration
  • Verify server configuration using rabbitmqctl status (specifically the listeners section) and rabbitmqctl environment
  • Check server logs (see above)
  • Verify hostname resolution
  • Verify TCP port connectivity
  • Verify IP routing
  • If needed, take and analyze a traffic dump (traffic capture)
These steps, when performed in sequence, usually help identify the root cause of the vast majority of networking issues. Troubleshooting tools and techniques for levels lower than the Internet (networking) layer are outside of the scope of this guide.

Verify Client Configuration

All developers and operators have been there: typos, outdated values, issues in provisioning tools, mixed up public and private key paths, and so on. Step one is to double check application and client library configuration.

Verify Server Configuration

Verifying server configuration helps prove that RabbitMQ is running with the expected set of settings related to networking. It also verifies that the node is actually running. Here are the recommended steps:

The listeners sections will look something like this:

 % ...
 {listeners,
     [{clustering,25672,"::"},
      {amqp,5672,"::"},
      {'amqp/ssl',5671,"::"},
      {http,15672,"::"}]}
 % ...
In this example, there are 4 TCP listeners on the node:
  • Inter-node and CLI tool communication port, 25672
  • AMQP 0-9-1 (and 1.0, if enabled) listener for non-TLS connections, 5672
  • AMQP 0-9-1 (and 1.0, if enabled) listener for TLS-enabled connections, 5671
  • HTTP API, 15672
All listeners are bound to all available interfaces.

Inspecting TCP listeners used by a node helps spot non-standard port configuration, protocol plugins (e.g. MQTT) that are supposed to be configured but aren't, cases when the node is limited to only a few network interfaces, and so on. If a port is not on the listener list it means the node cannot accept any connections on it.

Hostname Resolution

It is very common for applications to use hostnames or URIs with hostnames when connecting to RabbitMQ. dig and nslookup are commonly used tools for troubleshooting hostnames resolution.

Port Access

Besides hostname resolution and IP routing issues, TCP port inaccessibility for outside connections is a common reason for failing client connections. telnet is a commonly used, very minimalistic tool for testing TCP connections to a particular hostname and port.

The following example uses telnet to connect to host localhost on port 5672. There is a running node with stock defaults running on localhost and nothing blocks access to the port, so the connection succeeds. 12345 is then entered for input followed by Enter. Since 12345 is not a correct AMQP protocol header, so the server closes TCP connection:

telnet localhost 5672
# => Trying ::1...
# => Connected to localhost.
# => Escape character is '^]'.
12345 # enter this and hit Enter to send
# => AMQP	Connection closed by foreign host.
After telnet connection succeeds, use Control + ] and then Control + D to quit it. The following example connects to localhost on port 5673. The connection fails (refused by the OS) since there is no process listening on that port.
telnet localhost 5673
# => Trying ::1...
# => telnet: connect to address ::1: Connection refused
# => Trying 127.0.0.1...
# => telnet: connect to address 127.0.0.1: Connection refused
# => telnet: Unable to connect to remote host

Failed or timing out telnet connections strongly suggest there's a proxy, load balancer or firewall that blocks incoming connections on the target port. It could also be due to RabbitMQ process not running on the target node or uses a non-standard port. Those scenarios should be eliminated at the step that double checks server listener configuration.

There's a great number of firewall, proxy and load balancer tools and products. iptables is a commonly used firewall on Linux and other UNIX-like systems. There is no shortage of iptables tutorials on the Web.

Open ports, TCP and UDP connections of a node can be inspected using netstat, ss, lsof. rabbitmqctl status can be used to list configured ports as well.

The following example uses lsof to display OS processes that listen on port 5672 and use IPv4:

lsof -n -i4TCP:5672 | grep LISTEN
Similarly, for programs that use IPv6:
lsof -n -i6TCP:5672 | grep LISTEN
On port 1883:
lsof -n -i4TCP:1883 | grep LISTEN
lsof -n -i6TCP:1883 | grep LISTEN
If the above commands produce no output then no local OS processes listen on the given port.

The following example uses ss to display listening TCP sockets that use IPv4 and their OS processes:

ss --tcp -f inet --listening --numeric --processes
Similarly, for TCP sockets that use IPv6:
ss --tcp -f inet6 --listening --numeric --processes

For the list of ports used by RabbitMQ and its various plugins, see above. Generally all ports used for external connections must be allowed by the firewalls and proxies.

IP Routing

Messaging protocols supported by RabbitMQ use TCP and require IP routing between clients and RabbitMQ hosts to be functional. There are several tools and techniques that can be used to verify IP routing between two hosts. traceroute and ping are two common options available for many operating systems. Most routing table inspection tools are OS-specific.

Note that both traceroute and ping use ICMP while RabbitMQ client libraries and inter-node connections use TCP. Therefore a successful ping run alone does not guarantee successful client connectivity.

Both traceroute and ping have Web-based and GUI tools built on top.

Capturing Traffic

All network activity can be inspected, filtered and analyzed using a traffic capture.

tcpdump and its GUI sibling Wireshark are the industry standards for capturing traffic, filtering and analysis. Both support all protocols supported by RabbitMQ. See the Using Wireshark with RabbitMQ guide for an overview.

TLS Connections

For connections that use TLS there is a separate guide on troubleshooting TLS.

When adopting TLS it is important to make sure that clients use correct port to connect (see the list of ports above) and that they are instructed to use TLS (perform TLS upgrade). A client that is not configured to use TLS will successfully connect to a TLS-enabled server port but its connection will then time out since it never performs the TLS upgrade that the server expects.

A TLS-enabled client connecting to a non-TLS enabled port will successfully connect and try to perform a TLS upgrade which the server does not expect, this triggering a protocol parser exception. Such exceptions will be logged by the server.