Publish-Subscribe Pattern 1/2

This series of two posts presents an overview of the Publish-Subscribe pattern and analyses the features of some Pub/Sub implementations.

What is Pub/Sub? 1

The publish subscribe pattern is a messaging pattern, where publishers publish messages to topics and subscribers subscribe to topics to receive messages. In this pattern, publishers and subscribers are not directly connected to each other (loosely coupling).

Pub-sub principles

Pub-sub principles

Hence, publishers and subscribers focus more on what they want to talk about than who they talk to. Publishers broadcast information and subscribers select relevant topics of their interest. This many-to-many exchange mode is what differentiates the pub/sub pattern from the more usual one-to-one server-client patterns.

Benefits

Pub/sub is a good pattern for highly scalable systems over distributed networks where thousands of agents exchange messages seamlessly. It is for example easy to add a logging agent in an already existing network. It is also possible to add nodes that bridge data through other protocol (e.g. cloud services). However, as publishers and subscribers are not connected together, some interactions are difficult or even impossible. As pub/sub are one-way exchanges, a publisher can only assert that a message has been sent, not that it has been received. It is also unknown to publishers whether subscribers are able to process intense message streams.

Characteristics

The main characteristics of pub/sub protocols can be categorized as follows:

Transport

How are messages distributed? Broker or broker-less? Above which Internet protocol?

  • Broker-based transport centralizes communications: each agent is connected to the broker, which receives notifications from publishers, and distribute them to subscribers according to their filtering criteria. This may create an undesirable weak point in the network.

  • Broker-less transport usually relies on broadcast/multicast capabilities of the physical network to distribute messages.

  • Peer-to-peer transport where each subscriber is directly connected to publishers, or other systems where data is shared amongst agents of a swarm (e.g., Distributed Hash Table).

Payload format

Is the protocol data-agnostic? Are data organised and typed as attribute/value pairs? Payload size limits?

Security 2

Is security available on this protocol? To what extent? Which agent takes responsible for it?

  • Transport based security: the connection between the agents and the broker is secured (e.g. TLS over TCP/IP). In this case, the broker decrypts incoming notifications and encrypts them separately for each notified subscriber. Hence, data are not protected against indiscrete brokers. The broker authenticates the other agents.

  • Payload based security: the content of the payload is encrypted. This may involve third party encryption algorithms and/or key exchange algorithms. The broker, if any, is unaware of the encrypted data.

Quality of Service (QoS)

Are there means to verify message delivery? Are there properties on delays?

Filtering

Topic-based, content-based (values in the messages are used to apply filters)?

  • Filtering is done by the broker, if any, based on connected subscriber preferences. When there is no broker, each subscriber receives all the messages and filters them, before deciding to keep or drop them.

  • Content-based filtering cannot be done when the payload is not organized or typed. Also, when encrypted, the broker must be able to decrypt the payload to filter it or the encryption must be homomorphic.

Queueing

Pub/sub is an asynchronous messaging system.

  • Broker-based systems decide whether undelivered messages are dropped or queued.

  • It is more difficult to differ delivery on broker-less systems.

Discovery

Discover other agents? Topics? Discover other configurations?

Configuration

Is configuration mandatory? Is it exchanged between agents?

Footprint

How many services? Possible lightweight implementations?

Implementations

Open-source? Multiple implementations? Which languages?

Some pub/sub protocols

Pub/sub protocols define roles for their agents (publisher, subscriber, broker) and define the communications between them. They form a message-oriented middleware.

MQTT

The Message Queuing Telemetry Transport is an open standard 3 4.

Transport

Broker based, TCP/IP.

Payload

Blob (MQTT is data-agnostic in v3, MIME style types can be used in v5), up to 256 MB. Applications need third party agreements on how to decode data from the payload.

Security
  • Transport can be protected with TLS between brokers and agents. This provides authenticity, confidentiality, and integrity between agents and broker. The broker is aware of the content of the messages.

  • Authentication between agents and the broker with user:password (plain text if not in a TLS tunnel). Agents choose their session ID.

  • Payload encryption/signature may be added by third party applications.

QoS

Publish messages are acknowledged by the receiver if QoS is used. To be clear, let’s A and B be two agents and D the broker. Given that B subscribes to a topic published by A, there are two connections involved in the QoS process: A -> D and D -> B. Each of these connections has a QoS level. A specifies the desired QoS level with D. B specifies the desired QoS level with D. Hence, when QoS is used, D acknowledges A that it received the publication, and B acknowledges D that it received the publication. If B uses a lower QoS than the one suggested by A, the lower QoS level is used between B and D. It is important to note that publishers are not aware of who has subscribed to the topic or who received the notifications. 3 QoS levels are defined:

  • 0: best effort, fire and forget, no acknowledgment,

  • 1: at least once, messages are acknowledged, but duplicates may have been sent in between,

  • 2: exactly once, messages are acknowledged (with a 4-message exchange), no duplicate.

Filtering

Topic-based only.

Queueing

Broker implementations may support queueing. Agents may open “persistent” sessions. Undelivered messages addressed to disconnected subscribers with unclosed sessions shall be saved by the broker and delivered upon subscriber’s reconnection.

Discovery

Topics or encodings are not discoverable. However, wildcards can be used in topic names to subscribe to topics otherwise unknown. It is upon reception of notifications on these topics that one knows that they are used.

Configuration

Broker address and QoS levels must be preconfigured. Payload content is never described by the protocol. Topics are not registered before they are used, hence it is difficult to make an exhaustive list of existing topics.

Footprint

14 message types with very few optional fields. MQTT is compatible with lightweight agent implementations. v5 has more features than v3, but footprint should be similar.

Implementations

Wikipedia lists 5 11 broker implementations and 8 agent implementations in various languages (C, C++, Java, Python, Javascript, Go). This list is not exhaustive (Apache ActiveMQ use MQTT as one of its protocols) but show the wide adoption of the MQTT protocol (Eclipse, Apache, etc.).

We did a quick test implementation of an instant messaging app. The broker used was Eclipse’s Mosquitto and was easy to setup. The test app is build on Eclipse’s Paho, Python3 API, easily installed in a virtual env. Paho is a client library with wrappers in Java, Python, JavaScript, Go, C, C++, Rust, C#, and a version dubbed “Embedded C/C++”.

MQTT-SN

MQTT for Sensor Network v1.2 is a variant of the MQTT protocol. It is close to MQTT, but adapted to wireless communication with low bandwidth, small payloads, high link failures. Its main differences with MQTT are:

  • Transport: requires only a transport layer with bi-directional data transfer to a gateway (Raw Ethernet, UDP, Bluetooth, RS 232, etc.).

  • A gateway translates MQTT-SN messages back and forth to an MQTT broker (there are no MQTT-SN broker).

  • Short topic names (2 bytes) can be used as aliases for classical string topic names.

  • Gateways may advertise themselves, easing 0-configuration setups of MQTT-SN agents.

  • A keep-alive procedure which is compatible with sleeping agents.

It is important to note that the information security cannot be provided by the transport layer using TLS. However, Datagram Transport Layer Security (DTLS) may be used on top of UDP. This tunneling technique is not advertised by the MQTT-SN specification.

On the list of implementations provided by Wikipedia, 5 of them support MQTT-SN, including Eclipse Paho (which provides a gateway and a C client API).

DDS

The Data Distribution Service is an open specification 6.

It seems that the DDS specification only specifies abstractions of an application. Hence, it is not meant to produce interoperable software. It is meant to define what’s called a Publisher and a Subscriber, and what are their functions (and how they shall be named in their APIs). This standard does not specify a protocol. The page What’s in the DDS Standard provides a good overview of the complexity of the DDS.

The DDSI/RTPS 7 (Real-time Publish-Subscribe Protocol (RTPS) DDS Interoperability Wire protocol) is a 200 page-long specification of the reference implementation of the transport layers of the DDS protocol. Hence providing interoperability. The transport requirements are as follows: unicast addresses with a port, able to send/receive datagrams on this port, and drop incomplete/corrupted messages. DDSI provides discovery and QoS features on such transport.

The following analysis is based on the DDSI for transport “details”:

Transport

Broker-less, DDSI suggests UDP/IP. Multicast or peer-to-peer unicast.

Payload

Typed data-objects (Topics).

Security
  • Transport (DTLS over UDP/IP if unicast).

  • Security plugins: the DDS Security specification 8 introduces interfaces for pluggable modules for security. They define key exchange protocols, encryption and signature schemes, etc. In combination with the DDSI, encryption is assured by AES-GCM (128 or 256 bits), key exchange is assured by Diffie-Hellman with DSA or RSA PKI. I think (to be verified) that keys are exchanged based on certificates of the agents in the domain. This would be similar to configured PKI, as seen in TLS or classical OPC-UA.

QoS

Various (reliability, deadline, etc.), implementation depends on the chosen transport. The distributed object model specifies which QoS type may be used by the application. It is the DDS implementation’s responsibility to implement them, even if the transport does not implement them.

Filtering

DataReader based (type/topic based).

Queuing

Unknown; probably no queueing, because DataReader may not be plugged to DataWriter if they don’t share a common domain at a given time.

Discovery

Uses “well-known” calculated port number (implies configuration) over unicast or multicast.

Configuration

Topics/data-writer discovery is available in DDSI.

Footprint

Unknown. The interface is rather large. The QoS features are various and their implementation seems not an easy task. All methods of the DDS model must be implemented by each of the agents in a domain. Vortex Lite implementation claims as low as 400KB memory usage.

Implementations

ADLINK’s Vortex line (OpenSplice is open sourced, with API in Ada, C, C++, Java, JavaScript, etc. ; Vortex Lite has API in C99/C++), RTI Connext DDS is proprietary (with API in C, C++, C#, Java) and RTI Micro for embedded I/O modules, OpenDDS is the open source OMG’s implementation (with API in C++ and Java).

Note

The following post will continue the analysis of protocols with OPC-UA PubSub and draw some conclusions.

Comments