Architecting Microservices Applications with Instant Analytics

1
Architecting Microservices
Applications with Instant Analytics
Tim Berglund, Sr. Director, Developer Experience, Confluent
Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data

2
What the heck is Apache
Kafka and Why Should I Care?

…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
producer

consumer A
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic

consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic

consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A

consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A

Stream
Processingapps
rdbms
nosql
dwh/
hadoop

consumer A
consumer A
consumer A

Streams
Application
Streams
Application
Streams
Application

public static void main(String args[]) {
Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL);
final Map<String, String> serdeConfig =
Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
SCHEMA_REGISTRY_URL);
final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig);
final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig);
final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
StreamsBuilder builder = new StreamsBuilder();
KTable<Long, Double> ratingAverage = getRatingAverageTable(builder);
getRatedMoviesTable(builder, ratingAverage, movieSerde);
Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.start();
}
private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
return ratingSerde;
}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {

}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>();
movieSerde.configure(serdeConfig, false);
return movieSerde;
}
public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder,
KTable<Long, Double> ratingAverage,
SpecificAvroSerde<Movie> movieSerde) {
builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String()))
.mapValues(Parser::parseMovie)
.map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie))
.to("movies", Produced.with(Serdes.Long(), movieSerde));
KTable<Long, Movie> movies = builder.table("movies",
Materialized
.<Long, Movie, KeyValueStore<Bytes, byte[]>>as(
"movies-store")
.withValueSerde(movieSerde)
.withKeySerde(Serdes.Long())
);
KTable<Long, String> ratedMovies = ratingAverage
.join(movies, (avg, movie) -> movie.getTitle() + "=" + avg);
ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String()));
return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)

return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
.map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating));
KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating);
KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey();
KTable<Long, Long> ratingCounts = ratingsById.count();
KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2);
KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts,
(sum, count) -> sum / count.doubleValue(),
Materialized.as("average-ratings"));
ratingAverage.toStream()
/*.peek((key, value) -> { // debug only
System.out.println("key = " + key + ", value = " + value);
})*/
.to("average-ratings");
return ratingAverage;
}

CREATE TABLE movie_ratings AS
SELECT title,
SUM(rating)/COUNT(rating) AS avg_rating,
COUNT(rating) AS num_ratings
FROM ratings
LEFT OUTER JOIN movies
ON ratings.movie_id = movies.movie_id
GROUP BY title;

producer
consumer
KSQL Cluster
KSQL
Server
KSQL
Server

orders shipping users
warehouse
order web
UI
users web
UI

orders shipping users
warehouse
order web
UI
users web
UI
druid
product
analytics

27
What the heck is Apache
Druid and Why Should I Care?

32
Data
Data
Data
Data Sources
ETL Data
Warehouse
Some Code Usually an RDBMS
Analytics
Reporting
Data mining
Querying

35
Data
Data
Data
Map/reduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search system
Data
Lake
HDFSRDBMS / NoSQL

37
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP

40
Typical
Big Data++
Challenges
● Scale: when data is large, we need a lot of servers
● Speed: aiming for sub-second response time
● Complexity: too much fine grain to precompute
● High dimensionality: 10s or 100s of dimensions
● Concurrency: many users and tenants
● Freshness: load from streams

41
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions

42
! Batch ingestion
! Efficient storage
! Fast analytic queries
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data

43
Druid Use Cases
in the Wild
1. Digital Advertising - Publishers, Advertisers,
Exchanges
2. User Event Analytics- Clickstream, QoS, Usage
3. Network Telemetry
4. Lots and Lots of Data- IoT, Product Analytics,
Fraud

44
Gratuitous
Customer Quote “The performance is great ... some of the tables
that we have internally in Druid have billions and
billions of events in them, and we’re scanning
them in under a second.”
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-
its-hadoop-stuff.html

47
You can Druid
too! Druid community site: https://druid.apache.org/
Imply distribution: https://imply.io/get-started
Contribute: https://github.com/apache/druid

Architecting Microservices Applications with Instant Analytics

More Related Content

What's hot (20)

Similar to Architecting Microservices Applications with Instant Analytics (20)

More from confluent (20)

Recently uploaded (20)

Architecting Microservices Applications with Instant Analytics