7 Streams and state

This chapter covers

Adding stateful operations to Kafka Streams
Using state stores in Kafka Streams
Enriching event streams with joins
Learning how timestamps drive Kafka Streams

In the last chapter, we dove headfirst into the Kafka Streams DSL and built a processing topology to handle streaming requirements from purchase activity. Although you created a nontrivial processing topology, it was one-dimensional in that all transformations and operations were stateless. You considered each transaction in isolation, without regard to other events coinciding or within certain time boundaries, either before or after the transaction. Also, you only dealt with individual streams, ignoring any possibility of gaining additional insight by joining streams together.

In this chapter, you’ll extract the maximum amount of information from the Kafka Streams application. To get this level of information, you’ll need to use state. State is nothing more than the ability to recall information you’ve seen before and connect it to current information. You can utilize state in different ways. We’ll look at one example when we explore the stateful operations provided by the Kafka Streams DSL, such as the accumulation of values.

7.1 Stateful vs. stateless

7.2 Adding stateful operations to Kafka Streams

7.2.1 Group-by details

7.2.2 Aggregation vs. reducing

7.2.3 Repartitioning the data

7.2.4 Proactive repartitioning

7.2.5 Repartitioning to increase the number of tasks

7.2.6 Using Kafka Streams optimizations

7.3 Stream-stream joins

7.3.1 Implementing a stream-stream join

7.3.2 Join internals

7.3.3 ValueJoiner

7.3.4 JoinWindows

7.3.5 Co-partitioning

7.3.6 StreamJoined

7.3.7 Other join options

7.3.8 Outer joins

7.3.9 Left-outer join

7.4 State stores in Kafka Streams

7.4.1 Changelog topics restoring state stores

7.4.2 Standby tasks