2. Spring Cloud Stream Main Concepts

Spring Cloud Stream provides a number of abstractions and primitives that simplify writing message-driven microservices. In this section we will provide an overview of:

2.1 Application structure

A Spring Cloud Stream application consists of a middleware-neutral core that communicates with the outside world through input and output channels. The channels are managed and injected into it by the framework, and a Binder connects them to the external brokers. Different Binder implementations exist for different types of middleware, such as Kafka, Rabbit MQ, Redis or Gemfire, and an extensible API allows you to write your own Binder. There is also TestSupportBinder that leaves the channel as-is so a test author can interact with the channels directly and easily assert on what is received.

Figure 2.1. Spring Cloud Stream Application

SCSt with binder

Spring Cloud Stream uses Spring Boot for configuration, and the Binder makes it possible for Spring Cloud Stream applications to be flexible in terms of how it connects to the middleware. For example, deployers can dynamically choose the destinations that these channels connect to at runtime (e.g. Kafka topics or Rabbit MQ exchanges). This can be done through external configuration properties in any form that is supported by Spring Boot (application arguments, environment variables, application.yml files, etc). Taking the sink example from the previous section, providing the spring.cloud.stream.bindings.input.destination=raw-sensor-data property to the application will cause it to read from the raw-sensor-data Kafka topic, or from a queue bound to the raw-sensor-data exchange in Rabbit MQ. See Section 4.2, “Binding properties” for more information on the available binder properties you can configure. You are also able to configure middleware specific properties, see ??? for more information.

Spring Cloud Stream will automatically detect and use a binder that is found on the classpath, so you can easily use different types of middleware with the same code, just by including a different binder at build time. For more complex use cases, Spring Cloud Stream also provides the ability of packaging multiple binders within the same application and choosing what type of binder should be used at runtime, and even if multiple binders should be used at runtime for different channels.

2.1.1 Fat JAR

Spring Cloud Stream applications can be run in standalone mode from your IDE for testing. To run in production you can create an executable (or "fat") JAR using the standard Spring Boot tooling provided for Maven or Gradle.

2.2 Persistent publish subscribe and consumer groups

Communication between different applications follows a publish-subscribe pattern, with data being broadcast through shared topics. This can be seen in the following picture, which shows a typical deployment for a set of interacting Spring Cloud Stream applications.

Figure 2.2. Spring Cloud Stream Application topologies

SCSt with binder

Data reported by sensors to an HTTP endpoint is sent to a common destination named raw-sensor-data, from where it is independently processed by a microservice that computes time windowed averages, as well as by a microservice that ingests the raw data into HDFS. In order to do so, both applications will declare the topic as their input at runtime. The publish-subscribe communication model reduces the complexity of both the producer and the consumer, and allows adding new applications to the topology without disrupting the existing flow. For example, downstream from the average calculator we can have a component that calculates the highest temperature values in order to display and monitor them. Later on, we can add an application that interprets the very same flow of averages for fault detection. The fact that all the communication is done through shared topics rather than point to point queues reduces the coupling between microservices.

While the concept of publish-subscribe messaging is not new, Spring Cloud Stream takes the extra step of making it an opinionated choice for its application model. It also makes it easy for users to work with it across different platform by using the native support of the middleware.

2.2.1 Consumer Groups

While the publish subscribe model ensures that it is easy to connect multiple application by sharing a topic, it is equally important to be able to scale up by creating multiple instances of a given application. When doing so, the different instances would find themselves in a competing consumer relationship with each other: only one of the instances is expected to handle the message. Spring Cloud Stream models this behavior through the concept of a consumer group, which is similar to (and inspired by) the notion of consumer groups in Kafka. Each consumer binding can specify a group name such as spring.cloud.stream.bindings.input.group=hdfsWrite or spring.cloud.stream.bindings.input.group=average, as shown in the picture. All groups that subscribe to a given destination will receive a copy of the published data, but only one member of the group will receive a given message from that destination. By default, when a group is not specified, Spring Cloud Stream assigns the application to an anonymous, independent, single-member consumer group that will be in a publish-subscribe relationship with all the other consumer groups.

Figure 2.3. Spring Cloud Stream Consumer Groups

SCSt groups

2.2.2 Durability

Consistent with the opinionated application model of Spring Cloud Stream, consumer group subscriptions are durable. This is to say that the binder implementation will ensure that group subscriptions are persistent and, once at least one subscription for a group has been created, that group will receive messages, even if they are sent while all the applications of the group were stopped. Anonymous subscriptions are non-durable by nature. For some binder implementations (e.g. Rabbit) it is possible to have non-durable group subscriptions.

In general, it is preferable to always specify a consumer group when binding an application to a given destination. When scaling up a Spring Cloud Stream application, a consumer group must be specified for each of its input bindings, in order to prevent its instances from receiving duplicate messages (unless that behavior is desired, which is a less common use case).

2.3 Partitioning

Spring Cloud Stream provides support for partitioning data between multiple instances of a given application. In a partitioned scenario, one or more producer application instances will send data to multiple consumer application instances, ensuring that data with common characteristics is processed by the same consumer instance. The physical communication medium (e.g. the broker topic) is viewed as structured into multiple partitions. This happens regardless of whether the broker type is naturally partitioned (e.g. Kafka) or not (e.g. Rabbit), Spring Cloud Stream provides a common abstraction for implementing partitioned processing use cases in a uniform fashion.

Figure 2.4. Spring Cloud Stream Partitioning

SCSt partitioning

Partitioning is a critical concept in stateful processing, where ensuring that all the related data is processed together is critical for either performance or consistency. For example, in the time-windowed average calculation example, it is important that measurements from the same sensor land in the same application instance.

Setting up a partitioned processing scenario requires configuring both the data producing and the data consuming end.