Publish/subscribe
From Wikipedia, the free encyclopedia
Publish/subscribe (or pub/sub) is an asynchronous messaging paradigm where senders (publishers) of messages are not programmed to send their messages to specific receivers (subscribers). Rather, published messages are characterized into classes, without knowledge of what (if any) subscribers there may be. Subscribers express interest in one or more classes, and only receive messages that are of interest, without knowledge of what (if any) publishers there are. This decoupling of publishers and subscribers can allow for greater scalability and a more dynamic network topology.
Pub/sub is a sibling of the Message Queue paradigm, and is typically one part of a larger Message-Oriented Middleware solution. Most messaging systems support in their API (e.g. JMS) both the pub/sub and Message Queue models.
Contents |
[edit] Message filtering
In the pub/sub model, subscribers typically receive only a sub-set of the total messages published. The process of selecting messages for reception and processing is called filtering. There are two common forms of filtering: topic-based and content-based.
In a topic-based system, messages are published to "topics" or named logical channels. Subscribers in a topic-based system will receive all messages published to the topics to which they subscribe, and all subscribers to a topic will receive the same messages. The publisher is responsible for defining the classes of messages to which subscribers can subscribe.
In a content-based system, messages are only delivered to a subscriber if the attributes or content of those messages match constraints defined by the subscriber. The subscriber is responsible for classifying the messages.
Some systems support a hybrid of the two; publishers post messages to a topic while subscribers register content-based subscriptions to one or more topics.
[edit] Topologies
In many pub/sub systems, publishers post messages to an intermediary broker and subscribers register subscriptions with that broker, letting the broker perform the filtering. The broker normally performs a store-and-forward function to route messages from publishers to subscribers.
Other pub/sub systems eliminate the message broker by distributing the routing and filtering functions among the publishers and subscribers, sometimes with the help of daemons. Especially high-performance options involve mapping pub/sub topics to multicast groups (using protocols such as IP multicast, PSYC or even IRC), so that data can be sent directly from publisher to subscribers with a single message per wire. Further content-based filtering can then be performed in the subscriber, before messages are passed up to the application layer.
[edit] History
The first publish-subscribe system was the "news" subsystem in the Isis Toolkit and was described in the 1987 ACM Symposium on Operating Systems Principles conference, in a paper "Exploiting Virtual Synchrony in Distributed Systems. 123-138". The publish-subscribe technology described there was invented by Frank Schmuck, who probably should get the credit as the first person to ever invent a fully functional publish-subscribe solution.
[edit] Advantages
Loosely-coupled: Publishers are loosely coupled to subscribers, and needn't even know of their existence. With the topic being the focus, publishers and subscribers are allowed to remain ignorant of system topology. Each can continue to operate normally regardless of the other. In the traditional tightly-coupled client-server paradigm, the client cannot post messages to the server while the server process is not running, nor can the server receive messages unless the client is running. Many pub/sub systems decouple not only the locations of the publishers and subscribers, but also decouple them temporally. A common strategy with such pub/sub systems is to take down a publisher to allow the subscriber to work through the backlog (a form of bandwidth throttling).
Scalable: For relatively small installations, pub/sub provides the opportunity for better scalability than traditional client-server, through parallel operation, message caching, tree-based or network-based routing, etc. However, as systems scale up to become datacenters with thousands of servers sharing the pub/sub infrastructure, this benefit is often lost; in fact, scalability for pub/sub products under high load in large deployments is very much a research challenge.
[edit] Disadvantages
The most serious problems with pub/sub systems are a side-effect of their main advantage: the decoupling of publisher from subscriber. The problem is that it can be hard to specify stronger properties that the application might need on an end-to-end basis:
- As a first example, many pub/sub systems will try to deliver messages for a little while, but then give up. If an application actually needs a stronger guarantee (such as: messages will always be delivered or, if delivery cannot be confirmed, the publisher will be informed), the pub/sub system probably won't have a way to provide that property.
- Another example arises when a publisher "assumes" that a subscriber is listening. Suppose that we use a pub/sub system to log problems in a factory: any application that senses an error publishes an appropriate message, and the messages are displayed on a console by the logger daemon, which subscribes to the errors "topic". If the logger happens to crash, publishers won't have any way to see this, and all the error messages will vanish.
As noted above, while pub/sub scales very well with small installations, a major difficulty is that the technology often scales poorly in larger ones. These manifest themselves as instabilities in throughput (load surges followed by long silence periods), slowdowns as more and more applications use the system (even if they are communicating on disjoint topics), and so-called IP broadcast storms, which can shut down a local area network by saturating it with overhead messages that choke out all normal traffic, even traffic unrelated to pub/sub.
For pub/sub systems that use brokers (servers), the agreement for a broker to send messages to a subscriber is in-band, and can be subject to security problems. Brokers might be fooled into sending notifications to the wrong client, amplifying denial of service requests against the client. Brokers themselves could be overloaded as they allocate resources to track created subscriptions.
Even with systems that do not rely on brokers, a subscriber might be able to receive data that it is not authorized to receive. An unauthorized publisher may be able to introduce incorrect or damaging messages into the pub/sub system. This is especially true with systems that broadcast or multicast their messages. Encryption (e.g. SSL/TLS) can be the only strong defense against unauthorized access.
[edit] Middleware that implement pub/sub
- Apache ActiveMQ
- Tervela
- TIBCO Rendezvous -- payloads with structured text. A local pub/sub daemon has to exist on each client machine.
- TIBCO Messaging Appliance - TIBCO Rendezvous protocol implemented in FPGA hardware.
- TIBCO SmartSockets -- payloads with structured text. A "cloud" of federated pub/sub daemons on different machines with no local daemons.
- Solace Systems 3230/3260 Content Networking Router -- Payloads are routed either by topic or content attributies.
- Internet Communications Engine ICE -- payloads consist of objects. pub/sub daemon(s) on different node(s) with no local daemon.
- Tuxedo (software) -- TUXEDO is a Transaction Processing System that provides scalable and transactional Publish/Subscribe capabilities.
- XMPP Pubsub -- The XMPP protocol offers support for publish/subscribe. This extension is specified in XEP-0060.
- Data_Distribution_Service -- The OMG Data Distribution Service (DDS) for Real-Time Systems addresses the performance requirements and hard real-time requirements of distributed data-centric applications. OSS implementation: OpenDDS
[edit] Articles
- Publish/subscribe and SOA: How EDA extends SOA and why it is important Jack van Hoof
- Virtual synchrony: a distributed execution model that can be used with pub/sub infrastructures to increase their fault-tolerance and consistency guarantees.
- Majumdar, Saugat; Kulkarni, Dhananjay; Ravishankar, Chinya (2007). "Addressing Click Fraud in Content Delivery Systems". Infocom, IEEE.
- Book "Distributed Event-Based Systems", 2007 by Gero Muehl, Ludger Fiege, and Peter R. Pietzuch, Springer-Verlag, Germany.
[edit] See also
- Event-driven programming
- Observer Pattern
- DDS
- For an open source example, see Distributed Publish/Subscribe Event System and A REconfigurable Dispatching System (REDS)
- An another open source example written in python: see Python Message Service
- Push-pull strategy