SlideShare a Scribd company logo
1
Architecting Microservices
Applications with Instant Analytics
Tim Berglund, Sr. Director, Developer Experience, Confluent
Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data
2
What the heck is Apache
Kafka and Why Should I Care?
K
V
K
V
K
V
K
V
K
V
K
V
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
producer
consumer A
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
Stream
Processingapps
rdbms
nosql
dwh/
hadoop
Stream
Processingapps
rdbms
nosql
dwh/
hadoop
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
consumer A
consumer A
consumer A
Streams
Application
Streams
Application
Streams
Application
public static void main(String args[]) {
Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL);
final Map<String, String> serdeConfig =
Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
SCHEMA_REGISTRY_URL);
final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig);
final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig);
final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
StreamsBuilder builder = new StreamsBuilder();
KTable<Long, Double> ratingAverage = getRatingAverageTable(builder);
getRatedMoviesTable(builder, ratingAverage, movieSerde);
Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.start();
}
private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
return ratingSerde;
}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>();
movieSerde.configure(serdeConfig, false);
return movieSerde;
}
public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder,
KTable<Long, Double> ratingAverage,
SpecificAvroSerde<Movie> movieSerde) {
builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String()))
.mapValues(Parser::parseMovie)
.map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie))
.to("movies", Produced.with(Serdes.Long(), movieSerde));
KTable<Long, Movie> movies = builder.table("movies",
Materialized
.<Long, Movie, KeyValueStore<Bytes, byte[]>>as(
"movies-store")
.withValueSerde(movieSerde)
.withKeySerde(Serdes.Long())
);
KTable<Long, String> ratedMovies = ratingAverage
.join(movies, (avg, movie) -> movie.getTitle() + "=" + avg);
ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String()));
return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
.map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating));
KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating);
KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey();
KTable<Long, Long> ratingCounts = ratingsById.count();
KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2);
KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts,
(sum, count) -> sum / count.doubleValue(),
Materialized.as("average-ratings"));
ratingAverage.toStream()
/*.peek((key, value) -> { // debug only
System.out.println("key = " + key + ", value = " + value);
})*/
.to("average-ratings");
return ratingAverage;
}
CREATE TABLE movie_ratings AS
SELECT title,
SUM(rating)/COUNT(rating) AS avg_rating,
COUNT(rating) AS num_ratings
FROM ratings
LEFT OUTER JOIN movies
ON ratings.movie_id = movies.movie_id
GROUP BY title;
producer
consumer
KSQL Cluster
KSQL
Server
KSQL
Server
orders shipping users
warehouse
order web
UI
users web
UI
orders shipping users
warehouse
order web
UI
users web
UI
druid
product
analytics
27
What the heck is Apache
Druid and Why Should I Care?
28
TimRachel
29
30
31
32
Data
Data
Data
Data Sources
ETL Data
Warehouse
Some Code Usually an RDBMS
Analytics
Reporting
Data mining
Querying
33
34
35
Data
Data
Data
Map/reduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search system
Data
Lake
HDFSRDBMS / NoSQL
36
37
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP
38
39
40
Typical
Big Data++
Challenges
● Scale: when data is large, we need a lot of servers
● Speed: aiming for sub-second response time
● Complexity: too much fine grain to precompute
● High dimensionality: 10s or 100s of dimensions
● Concurrency: many users and tenants
● Freshness: load from streams
41
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
42
! Batch ingestion
! Efficient storage
! Fast analytic queries
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data
43
Druid Use Cases
in the Wild
1. Digital Advertising - Publishers, Advertisers,
Exchanges
2. User Event Analytics- Clickstream, QoS, Usage
3. Network Telemetry
4. Lots and Lots of Data- IoT, Product Analytics,
Fraud
44
Gratuitous
Customer Quote “The performance is great ... some of the tables
that we have internally in Druid have billions and
billions of events in them, and we’re scanning
them in under a second.”
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-
its-hadoop-stuff.html
orders shipping users
warehouse
order web
UI
users web
UI
druid
product
analytics
46
47
You can Druid
too! Druid community site: https://druid.apache.org/
Imply distribution: https://imply.io/get-started
Contribute: https://github.com/apache/druid
48

More Related Content

What's hot (20)

PDF
Building event-driven (Micro)Services with Apache Kafka
Guido Schmutz
 
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
confluent
 
PDF
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
PDF
APAC ksqlDB Workshop
confluent
 
PDF
Building Event-Driven (Micro) Services with Apache Kafka
Guido Schmutz
 
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
PDF
KSQL: Open Source Streaming for Apache Kafka
confluent
 
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
PDF
Building Event-Driven Applications with Apache Kafka & Confluent Platform
confluent
 
PDF
Ingesting streaming data into Graph Database
Guido Schmutz
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Time series-analysis-using-an-event-streaming-platform -_v3_final
confluent
 
PPTX
Stream me to the Cloud (and back) with Confluent & MongoDB
confluent
 
PDF
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 
PPTX
A guide through the Azure Messaging services - Update Conference
Eldert Grootenboer
 
PDF
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Building event-driven (Micro)Services with Apache Kafka
Guido Schmutz
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
confluent
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
APAC ksqlDB Workshop
confluent
 
Building Event-Driven (Micro) Services with Apache Kafka
Guido Schmutz
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
KSQL: Open Source Streaming for Apache Kafka
confluent
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Building Event-Driven Applications with Apache Kafka & Confluent Platform
confluent
 
Ingesting streaming data into Graph Database
Guido Schmutz
 
Introduction to Stream Processing
Guido Schmutz
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
confluent
 
Stream me to the Cloud (and back) with Confluent & MongoDB
confluent
 
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 
A guide through the Azure Messaging services - Update Conference
Eldert Grootenboer
 
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 

Similar to Architecting Microservices Applications with Instant Analytics (20)

PPTX
Kubernetes Controller for Pull Request Based Environment
Vishal Banthia
 
PDF
Lessons from running AppSync in prod
Yan Cui
 
PDF
Tech Webinar: Angular 2, Introduction to a new framework
Codemotion
 
PDF
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
NLJUG
 
PPTX
ql.io at NodePDX
Subbu Allamaraju
 
PPTX
The Very Very Latest in Database Development - Oracle Open World 2012
Lucas Jellema
 
PPTX
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
Getting value from IoT, Integration and Data Analytics
 
PDF
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
PPTX
Make streaming processing towards ANSI SQL
DataWorks Summit
 
PDF
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
Christopher Diamantopoulos
 
PDF
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Lightbend
 
PPTX
Debugging Microservices - QCON 2017
Idit Levine
 
PDF
Android RenderScript on LLVM
John Lee
 
PDF
Big Data Tools in AWS
Shu-Jeng Hsieh
 
PDF
Big datadc skyfall_preso_v2
abramsm
 
PDF
My past-3 yeas-developer-journey-at-linkedin-by-iantsai
Kim Kao
 
PPTX
Tackle Containerization Advisor (TCA) for Legacy Applications
Konveyor Community
 
PPTX
Introduction Into Docker Ecosystem
Alexander Pastukhov, OCPJP, OCPJWSD
 
PDF
Introduction to Software Defined Visualization (SDVis)
Intel® Software
 
PDF
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Natan Silnitsky
 
Kubernetes Controller for Pull Request Based Environment
Vishal Banthia
 
Lessons from running AppSync in prod
Yan Cui
 
Tech Webinar: Angular 2, Introduction to a new framework
Codemotion
 
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
NLJUG
 
ql.io at NodePDX
Subbu Allamaraju
 
The Very Very Latest in Database Development - Oracle Open World 2012
Lucas Jellema
 
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
Getting value from IoT, Integration and Data Analytics
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
Make streaming processing towards ANSI SQL
DataWorks Summit
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
Christopher Diamantopoulos
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Lightbend
 
Debugging Microservices - QCON 2017
Idit Levine
 
Android RenderScript on LLVM
John Lee
 
Big Data Tools in AWS
Shu-Jeng Hsieh
 
Big datadc skyfall_preso_v2
abramsm
 
My past-3 yeas-developer-journey-at-linkedin-by-iantsai
Kim Kao
 
Tackle Containerization Advisor (TCA) for Legacy Applications
Konveyor Community
 
Introduction Into Docker Ecosystem
Alexander Pastukhov, OCPJP, OCPJWSD
 
Introduction to Software Defined Visualization (SDVis)
Intel® Software
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Natan Silnitsky
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 

Architecting Microservices Applications with Instant Analytics

  • 1. 1 Architecting Microservices Applications with Instant Analytics Tim Berglund, Sr. Director, Developer Experience, Confluent Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data
  • 2. 2 What the heck is Apache Kafka and Why Should I Care?
  • 3. K V
  • 4. K V
  • 5. K V
  • 6. K V
  • 7. K V
  • 8. K V
  • 9. … … … partition 0 partition 1 partition 2 Partitioned Topic producer
  • 10. consumer A … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 11. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 12. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 13. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A
  • 14. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  • 17. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  • 20. public static void main(String args[]) { Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL); final Map<String, String> serdeConfig = Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, SCHEMA_REGISTRY_URL); final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig); final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig); final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); StreamsBuilder builder = new StreamsBuilder(); KTable<Long, Double> ratingAverage = getRatingAverageTable(builder); getRatedMoviesTable(builder, ratingAverage, movieSerde); Topology topology = builder.build(); KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); streams.start(); } private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); return ratingSerde; } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
  • 21. } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>(); movieSerde.configure(serdeConfig, false); return movieSerde; } public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder, KTable<Long, Double> ratingAverage, SpecificAvroSerde<Movie> movieSerde) { builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String())) .mapValues(Parser::parseMovie) .map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie)) .to("movies", Produced.with(Serdes.Long(), movieSerde)); KTable<Long, Movie> movies = builder.table("movies", Materialized .<Long, Movie, KeyValueStore<Bytes, byte[]>>as( "movies-store") .withValueSerde(movieSerde) .withKeySerde(Serdes.Long()) ); KTable<Long, String> ratedMovies = ratingAverage .join(movies, (avg, movie) -> movie.getTitle() + "=" + avg); ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String())); return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
  • 22. return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating) .map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating)); KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating); KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey(); KTable<Long, Long> ratingCounts = ratingsById.count(); KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2); KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts, (sum, count) -> sum / count.doubleValue(), Materialized.as("average-ratings")); ratingAverage.toStream() /*.peek((key, value) -> { // debug only System.out.println("key = " + key + ", value = " + value); })*/ .to("average-ratings"); return ratingAverage; }
  • 23. CREATE TABLE movie_ratings AS SELECT title, SUM(rating)/COUNT(rating) AS avg_rating, COUNT(rating) AS num_ratings FROM ratings LEFT OUTER JOIN movies ON ratings.movie_id = movies.movie_id GROUP BY title;
  • 26. orders shipping users warehouse order web UI users web UI druid product analytics
  • 27. 27 What the heck is Apache Druid and Why Should I Care?
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32 Data Data Data Data Sources ETL Data Warehouse Some Code Usually an RDBMS Analytics Reporting Data mining Querying
  • 33. 33
  • 34. 34
  • 35. 35 Data Data Data Map/reduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake HDFSRDBMS / NoSQL
  • 36. 36
  • 38. 38
  • 39. 39
  • 40. 40 Typical Big Data++ Challenges ● Scale: when data is large, we need a lot of servers ● Speed: aiming for sub-second response time ● Complexity: too much fine grain to precompute ● High dimensionality: 10s or 100s of dimensions ● Concurrency: many users and tenants ● Freshness: load from streams
  • 41. 41 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 42. 42 ! Batch ingestion ! Efficient storage ! Fast analytic queries Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  • 43. 43 Druid Use Cases in the Wild 1. Digital Advertising - Publishers, Advertisers, Exchanges 2. User Event Analytics- Clickstream, QoS, Usage 3. Network Telemetry 4. Lots and Lots of Data- IoT, Product Analytics, Fraud
  • 44. 44 Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts- its-hadoop-stuff.html
  • 45. orders shipping users warehouse order web UI users web UI druid product analytics
  • 46. 46
  • 47. 47 You can Druid too! Druid community site: https://druid.apache.org/ Imply distribution: https://imply.io/get-started Contribute: https://github.com/apache/druid
  • 48. 48