Skip to Content
Author's profile photo Jeff Wootton

How does SDS compare to Kafka?

I’ve been getting this question a lot lately – especially around Kafka, but it’s always been a question that gets asked, especially by people who are new to “event processing”, “complex event processing” and “streaming analytics”.  But first a comment on why I put all those terms in quotes – because I think it’s relevant to this discussion.

One challenge for SAP HANA smart data streaming (SDS) is that there is no widely accepted industry term to define this technology.  Various labels apply – including all those that I put in quotes.  We tend to refer to it as event stream processing, but there is definite merit to the term Forrester uses, which is streaming analytics.  And one of the things that I like about the term “streaming analytics” is that it actually helps distinguish it from messageing technology.


So back to the original question.  And specifically, what I’m getting asked a lot lately is: “How does SDS compare to Kafka?”.  So here goes…


First the easy bit:

Apache Kafka is a message broker. Think JMS, MQ, AMQP, etc.  I won’t get into the differences between various messaging technologies here. There are differences between message queues, message buses, and there are differences in the types of patterns supported. But all messaging technologies address the problem of delivering messages from producers to consumers.  Producers send messages to the broker,  and the broker holds them in a queue where consumers can read them (or in the case of a bus,  delivers them to all subscribers).

The key point here is that the message broker simply delivers the messages unchanged from producers to consumers.  Think of the post office (though with the added ability of one-to-many distribution).  Or Twitter, which is really just a message broker.

HANA SDS is designed for streaming analytics.  It receives messages (we tend to talk about events, but the information is delivered as messages) and applies business logic to analyze or otherwise transform those raw messages into useful information.  Bottom line: in most cases,  the output events (messages) from SDS are different from the input events.  The focus on SDS is not on simply delivering the events but analyzing or transforming them.

In terms of analyzing and transforming the data, just to add a bit of clarity, here are a few of the common things that SDS is used for:

  • filter the incoming data to only look at data of interest. This can be simple value based filtering, but can extend to complex, dynamic filtering logic
  • aggregate the incoming data.  This can be used to change the data frequency by sampling the data, or can be used to monitor trends, current positions, etc
  • watch for patterns of events – situation detection.  This is typically used for alerting or real-time response to emerging situations.  Predictive maintenance is an example, as is fraud detection
  • transform and enrich the data, getting it into the desired structure and adding context or other information to make it meaningful

So in fact, SDS is often used in conjunction with a message broker.  Inputs to SDS may flow from the producers to SDS via a message broker, and the outputs of SDS may be distributed to destinations via a message broker.  In fact I would guess that probably as many as half of the SDS (and ESP) deployments using messaging technology – most often as an input channel to SDS/ESP, but in some cases for distribution of output as well.

One analogy I heard long ago (apologies to the originator – I can’t remember where I heard it) is to compare the combination of Event Processing and Messaging to the central nervous system, where Messaging is the nerves, transmitting “data” from the finger tips to the brain, and the Event Processing engine is the brain, making sense of all those messages.

While it’s tempting to stop there, I do need to acknowledge that there is overlap.  Yes, SDS can be used to “route” messages from producer to consumer, but the semantics are very different. We’re definitely seeing interest in using SDS to apply rules to incoming data to determine which data gets put into HANA in-memory tables, which data goes into HANA extended tables, and which data goes to Hadoop, but in most of those cases SDS is doing more to the data than just delivering it to the desired destination(s).

Bottom line: while there’s some overlap, they were designed for different purposes. So in general – use the best tool for the job. After all, the technology world is full of overlapping technologies – and that’s generally the best rule to follow. And in many cases, use them together

Assigned Tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Manjunath Baburao
      Manjunath Baburao

      Concise and well written Jeff!

      Author's profile photo Former Member
      Former Member

      Thanks for such a short & nice explanation to a complex topic.

      Author's profile photo Former Member
      Former Member

      Hi Jeff, Great article! though i feel that the comparison is little out of place based on my limited understanding that Apache Kafka/Confluent Kafka are messaging system with known lack of support for transformation or enrichment. So from that perspective, i believe that the comparison is little out of context and a better comparison would have been Apache Flink/Apache Spark along with streaming with SDS. Let me know if this is a wrong assumption. Will be looking forward to more great articles from you!

      Author's profile photo Jeff Wootton
      Jeff Wootton
      Blog Post Author

      You're absolutely right - that's the point I was trying to make.  People that ask how SDS (now called HANA streaming analytics) compares to Kafka either don't understand either or both.  Kafka is the mail service, delivering the unopened letters.  HANA streaming analytics analyzes the incoming mail. And yes,  Flink, and to a lesser extent Spark Streaming are two of the more appropriate comparisons.  But we continue to get the question about SDS vs Kafka a lot - just due to lack of understanding.

      Author's profile photo Mihir L Kiri
      Mihir L Kiri

      Hi Jeff Wootton,

      My client wants to use EQUALUM as a data provisioning tool for SAP HANA. I have never read any forum wherein EQUALUM is used in conjunction with HANA DB. Can you throw some light on such a setup ?

      Regards,

      Mihir

       

      Author's profile photo Jeff Wootton
      Jeff Wootton
      Blog Post Author

      Sorry - I'm not familiar with EQUALUM.