Kafka and JMS both are messaging system. Java message service is an api which are provided by Java. It is used for implementing messaging system in your application. JMS supports queue and publisher /subscriber(topic) messaging system . With queues, when first consumer consumes a message, message gets deleted from the queue and others cannot take it anymore. With topics, multiple consumers receive each message but it is much harder to scale. Let see what are the Difference between Apache Kafka and JMS.
Kafka is a generalization of these two concepts – it allows scaling between members of the same consumer group, but it also allows broadcasting the same message between many different consumer groups. Kafka also provides automatic rebalancing when new consumer join or left the consumer group.
WHAT IS KAFKA?
Apache Kafka is an open-source software platform that utilises stream-processing to provide a low-latency, high-throughput platform that handles real-time data feeds.
Kafka was originally developed by LinkedIn and used as a scalable messaging platform for the social media network’s central data pipeline to accommodate its growing membership. It was later donated to the Apache Software Foundation.
Kafka is written in Java and Scala, and it works by connecting to external systems for data import/export, which is done via Kafka Connect. It provides Kafka Streams, which is a Java stream-processing library. Transaction logs heavily influence the design of Kafka.
Apache Kafka also works as a messaging system that is distributed and follows the publish-subscribe model. It also acts as a robust queue capable of handling a high volume of data. Users can also pass messages between endpoints with the help of it.
Kafka is suitable for both online and offline message consumption. Furthermore, Kafka messages persist on the disk, and within the cluster, they can replicate by preventing data loss.
COMPONENTS OF APACHE KAFKA
- Topic: This is a category name where messages are published. These are always multi-subscriber, which means a topic can have zero, one, or many consumers that subscribe to the data.
- Partition: Partitions are created by splitting topics. There should be a minimum of one partition for each topic. Messages persist on partitions in an immutable sequential order. A partition is implemented as a set of equal-sized segment files. Topics can handle any amount of data as they have many partitions in it.
- Leader: This is a node responsible for all reads/writes for a given partition. A leader is basically one server that acts per partition.
- Follower: This is also a node that follows the instructions of the leader and acts as a normal consumer. Once the leader fails, one of the followers becomes the new leader automatically.
- Broker: Brokers are systems that maintain the published data. Each broker may have zero or more partitions per topic.
- Producer: Producers publish data on the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin manner or following some semantic partition function.
- Consumer: Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance. If all the consumer instances have the same consumer group, then the records will effectively be load-balanced over the consumer instances. If all the consumer instances have different consumer groups, each record will be broadcast to all the consumer processes.
KAFKA MESSAGE DELIVERY MODEL
Kafka keeps up with messages within topics. Producers create the data within the topics, and consumers read from those topics. As Kafka is distributed, partitions separate topics and are replicated across various nodes (leader & follower). While consuming from a topic, we can also configure a group with multiple consumers. Each of the consumers in a specific group can access messages from a particular subset of partitions within the topics they subscribe to. This will ensure that every message is delivered to one consumer in the group, and all of the messages that carry the same key make it to the same consumer.
Kafka’s uniqueness is that it handles each topic partition as a log, which is nothing but an ordered set of messages. Also, every message within a given partition is assigned a unique, one-of-a-kind offset. Kafka doesn’t track which message is actually read by what consumers. Instead of holding just unread messages, Kafka holds all of the messages for a pre-specified amount of time.
WHAT IS KAFKA USED FOR? SOME OF THE KAFKA USE CASES
MESSAGING: Kafka is the better choice and replacement for a more traditional message broker where there is a requirement for very high throughput for distributed systems. Kafka is also well suited to large scale message processing applications because it has better throughput, built-in partitioning, replication, and fault-tolerance.
WEBSITE ACTIVITY TRACKING: Different site activities (page views, searches, or other actions) can be published to central topics. One topic is used per activity type. These feeds can be subscribed for a range of use cases, including real-time processing, loading into Hadoop, real-time monitoring, and offline data warehousing systems for offline processing and reporting.
STATISTICAL METRICS: Kafka is also often used for operational monitoring of data.
LOG AGGREGATION: Kafka is a good substitute for a log aggregation solution. Log aggregation means physically collecting log files off servers and putting them in a central repository, which is either a file server or HDFS for processing. Kafka creates an abstraction on the details of files and provides a cleaner abstraction of log or event data as a stream of messages. This allows easier support for multiple data sources, lower-latency processing, and distributed data consumption.
STREAM PROCESSING: Apache Kafka has a library called Kafka Streams, a lightweight but powerful stream processing library that supports data processing pipelines which consist of multiple stages. In this case, the processor consumes raw input data from the Kafka topic, aggregates it and enriches it, or transforms it into new topics for further consumption.
EVENT SOURCING: Event sourcing is a form of application design where users can log state changes as a time-ordered sequence of records. Kafka supports huge, stored, log data, making it an excellent backend for an application built in this style.
COMMIT LOG: Kafka serves the purpose of an external commit log for a distributed system. The log helps in replicating data between nodes and applying a re-sync mechanism to restore data for the failed nodes. It is the log compaction feature in Kafka that helps support this usage.
ADVANTAGES OF KAFKA
High-throughput: Kafka is capable of handling high-velocity and high-volume data, despite not having large hardware. It can also support the message throughput of thousands of messages per second.
Low Latency: Kafka can handle messages with very low latency in the milliseconds range.
Fault-Tolerant: Kafka has the inherent capability to be resistant to node failure within a cluster.
Durability: Since Kafka can perform message replication, messages are never lost. This ensures the persistence of data or messages on disk.
Scalable: You can scale up Kafka by adding extra nodes without incurring any downtime. Moreover, Kafka can handle messages in a fully transparent and seamless manner.
Distributed: Kafka has a distributed architecture which makes it scalable along with the capabilities like replication and partitioning.
Consumer Friendly: Kafka can be very versatile and be tailor-made for a variety of consumers. Moreover, Kafka can integrate with a variety of consumers instead of being written in various languages.
DISADVANTAGES OF KAFKA
No Complete Set of Monitoring Tools: Kafka does not have a full set of monitoring and management tools. This is a point of concern for many enterprise support staff when choosing Kafka.
Issues with Message Tweaking: Kafka doesn’t run optimally when modifying the message, as the performance reduces significantly when doing this. On the contrary, it performs well if the message is unchanged.
Does not support wildcard topic selection: Kafka only matches the exact topic name. Hence, if there is any wildcard topic selection, it fails. This makes Kafka incapable of addressing certain use cases.
Reduced Performance due to compression: While Kafka doesn’t have issues on message sizes, if the messages get compressed by the broker and consumers, the throughput and overall performance may be affected.
Lacks some Messaging Paradigms: Kafka does not support messaging paradigms like queues, request/reply, point-to-point, etc. This can be problematic for specific use cases.
WHAT IS JMS?
JMS stands for Java Message Service, which is a java-oriented middleware API and the first enterprise messaging API that has gained industry-wide support. The primary function of JMS is to send messages between two or more clients.
JMS is a messaging standard that allows Java components to create, read, send, and receive messages. At the same time, it leverages communications between different components in a distributed application to make it loosely coupled, asynchronous, and reliable.
Components of JMS application:
JMS provider: A JMS provider, commonly known as Message-Oriented-Middleware (MOM), is a messaging system that works to implement JMS interfaces in providing functionalities, such as administrative and control features.
JMS client: A Java application that produces and receives messages.
JMS producer: A JMS producer works as a JMS client that creates and sends messages.
JMS consumer: This is a JMS client that receives messages.
JMS message: An object that contains the data during the transfer between JMS clients.
JMS Destination: This is either a JMS topic or a queue that works as a destination for messages between the consumer and producer.
JMS queue: This is a staging area during message transmission. Messages from the producer wait here before it is read by one of the consumers. Unlike the concept of a queue, the messages don’t maintain any order here. The queue only guarantees that there is a single time process of the message.
JMS topic: The JMS topic facilitates the distribution for publishing messages which are delivered to multiple subscribers.
MESSAGE DELIVERY MODELS IN JMS
Messaging models are basically programming models, and JMS follows an asynchronous messaging model between heterogeneous systems. It supports two types of messaging models:
- Point to Point Messaging model (P2P)
- Publish-Subscribe model (Pub-Sub)
1. POINT-TO-POINT MESSAGING MODEL (P2P MODEL):
In the P2P model, the JMS queue is used as a destination. The model follows a message routing pattern where messages are routed to individual consumers.
2. PUBLISH-AND-SUBSCRIBE MODEL (PUB/SUB MODEL):
The publish-subscribe model uses the JMS topic as the means of the messaging system. In this model, neither the publisher nor the subscriber know each other. However, consumers can register to receive messages published in a particular JMS topic.
If the JMS consumers subscribe to a specific topic, it can consume all the messages under that topic. However, it is a time-bound activity where JMS subscribers can only consume published messages on a topic after it subscribes to that topic. In this case, if any message is published to the topic before the subscription or during when it is inactive, such a message cannot be delivered to the consumer. Unlike a queue, the topic does not store messages.
WHAT IS JMS USED FOR?
JMS has multiple capabilities.
- It makes it easy to develop applications that follow an asynchronous messaging pattern for business data and events.
- It defines an enterprise messaging API that efficiently and easily supports a wide range of enterprise messaging products.
- It supports Java applications for enterprise messaging systems.
- It is a message-oriented middleware (MOM) that provides a low-level abstraction between database and application adapters, business process automation, and event processing.
- It provides a common set of messaging concepts and facilities.
- It needs minimum work to implement the provider.
- It maximizes the portability of messaging applications.
- It provides the client interface for both pub-sub and P2P domains.
ADVANTAGES OF USING JMS
Supports Asynchronous Communication: As JMS follows an asynchronous messaging pattern, users can expect the JMS queue to perform well and provide high throughput. Apart from this, the JMS queue is able to stream messages for consumers. These messages are processed together in RAM whenever the consumer is present. JMS can also send multiple messages within a second – thousands even – by utilizing multiple threads and processes.
Great Industry Support: The JMS specification is widely available in message brokers. JMS was the first messaging API that had substantial industry support when applied to enterprise applications.
Reliability: Messaging in JMS is reliable as it ensures the delivery of messages to the intended consumer once sent by the producer. It also excludes duplicate delivery of messages.
Standardised Messaging API: The standard schemes and conventions for JMS have been widely accepted by other vendors, allowing JMS to address any portability issues more efficiently while facilitating simple application development.
Java’s Simplicity: The JMS API is easy to learn. Thus, developers will be able to write portable, messaging enterprise applications at a faster and more efficient rate.
Loose Coupling: JMS can decouple unrelated systems via system boundaries. It doesn’t have to share with a common database.
Can be Processed by Message Driven Beans: As Message Driven Beans is based on JMS, developers are able to implement asynchronous enterprise java beans for scalable and robust applications.
Interoperability Between Different Providers: JMS’ high interoperability allows two distinct applications to communicate with one another despite using different messaging providers.
DISADVANTAGES OF JMS
JMS is Java-based. In multi-tiered applications using microservices, where multiple languages and frameworks are used, this can become a hindrance.
In JMS, although APIs are specified, the message format is not. This is a limitation of JMS. They just have to use the same API.
Differences between Kafka and JMS:
|Order of Messages||There is no guarantee that the messages will be received in order.||The receiving of messages follows the order in which they are sent to the partition.|
|Filters||This is a JMS API message selector that allows the consumers to specify which messages they are interested in. This way, message filtering happens in JMS. Message selection can follow specific criteria. The filtering occurs at the producer.||There is no concept of the filter at the broker level. Hence, messages picked up by the consumer do not specify any criteria. The filtering can happen only at the consumer level.|
|Persistence of Messages||It provides either in- memory or disk-based storage of messages.||It stores the messages for a specified period whether or not it has been picked up by the consumer.|
|Push vs. Pull of Messages||The providers push the JMS message to queues and topics.||The consumers pull the message from the broker.|
|Programming Style||JMS is an imperative programming style||Apache Kafka is a reactive programming style|
|Message Programming Type||JMS-based services are of push-type in nature where the providers push the messages to the consumers||Apache Kafka is a pull-type messaging platform where consumers pull the messages from the broker|
|Storage||JMS provides the disk or in-memory based storage facility. And once the message read, it gets permanently deleted.||Messages are stored for a defined amount of time in Apache Kafka irrespective of whether they are received by the consumers or not.|
|Partitioning of Topics||In the case of JMS-based tools, the segregation is not done in a sequential manner. This leads to lower throughput in the case of JMS-based tools||Apache Kafka allows you the functionality to segregate the topics as independent portioned logs. It ensures a high throughput for Kafka.|
|Content Segregation||In the case of JMS, such provision is not present, so you need to compartmentalize the messages as per the requirement.||In the case of Apache Kafka, its system sorts the messages in the same order as they were sent from the partition level.|
|Load Balancing||Load balancing can be designed by implementing some clustering mechanism. Thus, once the producer sends the messages, the load will be distributed across the clusters.||Here load balancing happens automatically. Because once the Kafka nodes publish its metadata that indicates which servers are up and running in the cluster. Also, it tells the producer where the leader is. Thus, the client can send messages to the appropriate partition.|
Apache Kafka and JMS both are efficient tools, the key thing is to understand that the circumstances which help one of them to perform better than the other.
Apache Kafka is more suitable to handle a large volume of data due to its scalability and high availability while JMS systems are used when you need to work with multi-node clusters and highly complicated systems.
Also, Apache Kafka is used when there is a requirement of higher throughput (more than 100K/sec), and JMS is used when you need to work with the low throughputs.
JMS bases tools provide an HTTP API, CLI based operators that give JMS systems a faster deployment and ease of operation.
Apache Kafka uses partition-based operation with CLI and is more useful in the case of cases that require multiple changes in a short span.