Modeling distributed pub-sub patterns

Modeling distributed Publish-Subscribe (Pub-Sub) patterns involves designing a system where publishers send messages (events, data, notifications) to a topic, and subscribers receive messages from that topic. This pattern decouples producers (publishers) from consumers (subscribers), making it highly scalable and flexible. Let’s dive into how to model such a system, breaking it down into key components and considerations.

Key Components of Pub-Sub Systems

Publisher:
- A publisher is any entity that sends messages to a topic. In a distributed Pub-Sub system, multiple publishers can exist.
- Publishers do not need to know the identity or number of subscribers. They only need to know the topic to which they will publish messages.
Subscriber:
- A subscriber is an entity that expresses interest in receiving messages from a specific topic.
- Subscribers can either pull messages (synchronous) or receive them asynchronously when they are pushed from the message broker.
- Subscribers may have different filtering requirements (e.g., filtering based on content, time, or other metadata).
Topic:
- A topic acts as a channel or category to which messages are sent by publishers.
- Topics help organize the flow of information, ensuring messages are directed appropriately to interested subscribers.
- Topics may be further divided into subtopics for more granular control of message delivery.
Message Broker:
- The message broker is the intermediary responsible for routing messages from publishers to subscribers.
- It ensures that messages are delivered to subscribers based on their topic subscriptions.
- Common message brokers include Apache Kafka, RabbitMQ, and Google Cloud Pub/Sub.
Message:
- A message is the actual payload sent by the publisher. It typically includes data, metadata (such as timestamps), and potentially headers.
- Messages can be different in size and structure depending on the system’s requirements. In some cases, messages are serialized, and in others, they might be raw data.
Subscription:
- A subscription is the process through which a subscriber registers its interest in receiving messages for a particular topic.
- Subscriptions can be durable (where the broker remembers the subscriber’s state) or non-durable (where the broker forgets about the subscriber after message delivery).

Types of Pub-Sub Patterns

There are several variations of the Pub-Sub pattern, each suitable for different use cases in distributed systems:

Topic-based Pub-Sub:
- In topic-based systems, publishers send messages to a specific topic, and subscribers receive messages from topics they are subscribed to.
- This pattern is great for scenarios where messages are categorized by topic (e.g., a sports news website publishing updates on specific teams or games).
- Example: Apache Kafka, Google Cloud Pub/Sub, and MQTT often follow this pattern.
Content-based Pub-Sub:
- In content-based Pub-Sub systems, subscribers express interest in messages based on content attributes, rather than a fixed topic.
- Subscribers can filter messages by specific attributes (e.g., a user might subscribe to receive only “weather updates for their city”).
- Example: DDS (Data Distribution Service) uses this pattern.
Hybrid Pub-Sub:
- Hybrid Pub-Sub systems combine topic-based and content-based filtering.
- A subscriber can filter messages by both topic and content, allowing for highly specific message delivery.

Architectural Design

When modeling distributed Pub-Sub systems, there are several architectural design considerations to keep in mind:

Scalability:
- A distributed Pub-Sub system must handle a large number of publishers, subscribers, and messages without degradation in performance.
- One approach to scalability is to partition topics into partitions. Each partition can be processed independently by different nodes.
- Horizontal scaling is often achieved by replicating the message broker and distributing the load across multiple servers.
Fault Tolerance and Reliability:
- The system must ensure that messages are delivered even in the face of failures. This involves message persistence and replication strategies to maintain availability.
- Some brokers offer acknowledgment mechanisms (e.g., message acknowledgment after delivery) and retry policies to ensure reliable delivery.
Event Delivery Semantics:
- Pub-Sub systems can offer different delivery guarantees, such as:
  - At-most-once: The message is delivered no more than once, but may be lost.
  - At-least-once: The message is guaranteed to be delivered, but might be delivered multiple times.
  - Exactly-once: The message is delivered once and only once, even if there are retries.
- The desired event delivery semantic will dictate the underlying implementation details, such as message deduplication or transactional support.
Latency:
- In real-time systems, low latency is crucial. Optimizations such as in-memory message delivery, low-latency networking, and efficient serialization of messages may be employed to minimize delay.
- However, latency must often be balanced with other factors, such as reliability and throughput.
Security:
- A distributed Pub-Sub system must ensure secure communication, often via TLS or SSL encryption for message delivery.
- Authentication and authorization mechanisms must be in place to ensure that only authorized publishers and subscribers interact with the system.
- Additionally, message integrity and privacy need to be addressed, especially in sensitive applications.

Example Workflow

Let’s walk through an example scenario in which a distributed Pub-Sub system is used:

Setup:
- A weather monitoring service (publisher) produces real-time weather updates on various cities (topics).
- Users (subscribers) express interest in receiving weather updates for specific cities (e.g., “San Francisco”).
Message Flow:
- The publisher sends a message to the “San Francisco” topic with real-time data about the weather in San Francisco.
- The message broker receives the weather data and forwards it to all subscribers subscribed to the “San Francisco” topic.
Scalability and Fault Tolerance:
- As more cities are monitored and the number of subscribers increases, the system scales by partitioning the “San Francisco” topic across multiple broker nodes.
- If one broker fails, another broker with replicated data can take over without losing any messages.
Subscriber Behavior:
- A subscriber might receive the weather update immediately after it is published, depending on its subscription type (e.g., durable vs. non-durable).
- The subscriber may perform filtering on the message (e.g., only show alerts for extreme weather conditions).

Conclusion

Modeling distributed Pub-Sub patterns requires careful attention to the scalability, fault tolerance, and real-time requirements of the system. The choice between topic-based, content-based, or hybrid Pub-Sub systems depends on the use case and the level of flexibility required for message delivery. Additionally, designing the system with considerations for reliability, event delivery semantics, and security will ensure a robust and performant solution.

Share this Page your favorite way: Click any app below to share.

See all the ways to share this page

Key Components of Pub-Sub Systems

Types of Pub-Sub Patterns

Architectural Design

Example Workflow

Conclusion

Check Out Our Newest Posts we wrote about

Why your ML system design must support partial retraining

Why your ML pipeline must detect missing or stale features

Why your ML feedback loop must consider label quality

Why your ML deployment plan must include fallback logic