SlideShare a Scribd company logo
1
User behavior analysis with
Session Windows and Apache
Kafka’s Streams API
Michael G. Noll
Product Manager
2
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
3
Kafka Streams API: to build real-time apps that power your core business
Key benefits
• Makes your Java apps highly scalable,
elastic, fault-tolerant, stateful,
distributed
• No additional cluster
• Easy to run as a service
• Supports large aggregations and joins
• Security and permissions fully
integrated from Kafka
Example Use Cases
• Microservices
• Reactive applications
• Continuous queries
• Continuous transformations
• Event-triggered processes
Streams
API
App Instance 1
Kafka
Cluster
Streams
API
App Instance N
Your
App ...
4
Use case examples
Industry Use case examples
Travel Build applications with the Kafka Streams API to make real-time decisions to find
best suitable pricing for individual customers, to cross-sell additional services,
and to process bookings and reservations
Finance Build applications to aggregate data sources for real-time views of potential
exposures and for detecting and minimizing fraudulent transactions
Logistics Build applications to track shipments fast, reliably, and in real-time
Retail Build applications to decide in real-time on next best offers, personalized
promotions, pricing, and inventory management
Automotive,
Manufacturing
Build applications to ensure their production lines perform optimally, to gain real-
time insights into supply chains, and to monitor telemetry data from connected
cars to decide if an inspection is needed
And many more …
5
Some public use cases in the wild
• Why Kafka Streams: towards a real-time streaming architecture, by Sky Betting and Gaming
• http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/
• Applying Kafka’s Streams API for social messaging at LINE Corp.
• http://developers.linecorp.com/blog/?p=3960
• Production pipeline at LINE, a social platform based in Japan with 220+ million users
• Microservices and Reactive Applications at Capital One
• https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams
• Containerized Kafka Streams applications in Scala, by Hive Streaming
• https://www.madewithtea.com/processing-tweets-with-kafka-streams.html
• Geo-spatial data analysis
• http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/
• Language classification with machine learning
• https://dzone.com/articles/machine-learning-with-kafka-streams
6
Kafka Summit NYC, May 09
Here, the community will share
latest Kafka Streams use cases.
http://kafka-summit.org/
7
Agenda
• Why are session windows so important?
• Recap: What is windowing?
• Session windows – example use case
• Session windows – how they work
• Session windows – API
8
Why are session windows so important?
• We want to analyze user behavior, which is a very common use case area
• To analyze user behavior on newspapers, social platforms, video sharing sites, booking sites, etc.
• AND tailor the analysis to the individual user
• Specifically, analyses of the type “how many X in one go?” – how many movies watched in one go?
• Achieved through a per-user sessionization step on the input data.
• AND this tailoring must be convenient and scalable
• Achieved through automating the sessionization step, i.e. auto-discovery of sessions
• Session-based analyses can range from simple metrics (e.g. count of user visits on a news
website or social platform) to more complex metrics (e.g. customer conversion funnel and event
flows).
9
What is windowing?
• Aggregations such as “counting things” are key-based operations
• Before you can aggregate your input data, it must first be grouped by key
event-time8 AM7 AM6 AM event-time
Alice
Bob
Dave
8 AM7 AM6 AM
10
What is windowing?
• Aggregations such as “counting things” are key-based operations
Alice: 10 movies
Bob: 11 movies
Dave: 8 movies
“Let me COUNT how many movies each user has watched (IN TOTAL)”
event-time
Alice
Bob
Dave
Feb 7Feb 6Feb 5
11
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
“Let me COUNT how many movies each user has watched PER DAY”
Alice: 4 movies
Bob: 3 movies
Dave: 2 movies
Feb 5
Feb 7Feb 6Feb 5
12
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
Alice: 1 movie
Bob: 2 movies
Dave: 4 movies
Feb 6
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER DAY”
13
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
Alice: 4 movies
Bob: 4 movies
Dave: 1 movie
Feb 7
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER DAY”
14
Session windows: use case
• Session windows allow for “how many X in one go?” analyses, tailored to each key
• Sessions are auto-discovered from the input data (we see how later)
event-time
Alice
Bob
Dave
Alice: 1, 4, 1, 4 movies
(4 sessions)
Bob: 4, 6 movies
(2 sessions)
Dave: 3, 5 movies
(2 sessions)
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER SESSION”
15
Comparing results
• Let’s compare how results differ
Alice
Bob
Dave
IN TOTAL
10
11
8
PER DAY
3.0 (avg)
3.0 (avg)
2.3 (avg)
time windows
PER SESSION
2.5 (avg)
5.0 (avg)
4.0 (avg)
session windowsno windows
16
Comparing results
• Let’s compare how results differ if we our task was to rank the top users
Alice
Bob
Dave
IN TOTAL
#2
#1
#3
PER DAY
#1
#1
#3
time windows
PER SESSION
#3
#1
#2
session windowsno windows
17Confidential
Session windows: how they work
18
Session windows: how they work
• Definition of a session in Kafka Streams API is based on a configurable period of inactivity
• Example: “If Alice hasn’t watched another movie in the past 3 hours, then next movie = new
session!”
Inactivity period
19
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
20
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
Example: How many movies does Alice watch on average per session?”
Inactivity period (e.g. 3 hours)
21
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
Example: How many movies does Alice watch on average per session?”
22
Late-arriving data is handled transparently
• Handling of late-arriving data is important because, in practice, a lot of data arrives late
23
Late-arriving data: example
Users with mobile phones enter
airplane, lose Internet connectivity
Emails are being written
during the 8h flight
Internet connectivity is restored,
phones will send queued emails now,
though with an 8h delay
Bob  writes  Alice  an  
email  at  2  P.M.
Bob’s  email  is  finally  
being  sent  at  10  P.M.
24
Late-arriving data is handled transparently
• Handling of late-arriving data is important because, in practice, a lot of data arrives late
• Good news: late-arriving data is handled transparently and efficiently for you
• Also, in your applications, you can define a grace period after which late-arriving data will be
discarded (default: 1 day), and you can define this granularly per windowed operation
• Example: “I want to sessionize the input data based on 15-min inactivity periods, and late-arriving
data should be discarded if it is more than 12 hours late”
25
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
• Late-arriving data may (1) create new sessions or (2) merge existing sessions
26
Sessions potentially merge as new events arrive
Session Window
27
Sessions potentially merge as new events arrive
Session Window
28
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
29
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
30Confidential
Session windows: API
31Confidential
Session windows: API in Confluent 3.2 / Apache Kafka 0.10.2
//  A  session  window  with  an  inactivity  gap  of  3h;  discard  data  that  is  12h late
SessionWindows.with(TimeUnit.HOURS.toMillis(3)).until(TimeUnit.HOURS.toMillis(12));
Defining a session window
//  Key  (String)  is  user,  value  (Avro  record)  is  the  movie  view  event  for  that  user.
KStream<String,  GenericRecord>  movieViews =  ...;
//  Count  movie  views  per  session,  per  user
KTable<Windowed<String>,  Long>  sessionizedMovieCounts =
movieViews
.groupByKey(Serdes.String(),  genericAvroSerde)        
.count(SessionWindows.with(TimeUnit.HOURS.toMillis(3)),  "views-­‐per-­‐session");
Full example: aggregating with session windows
More details with documentation and examples at:
http://docs.confluent.io/current/streams/developer-guide.html#session-windows
https://github.com/confluentinc/examples
32Confidential
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
UP
NEXT
33
Why Confluent? More than just enterprise software
Confluent Platform
The only enterprise open
source streaming platform
based entirely on Apache
Kafka
Professional Services
Best practice consultation for
future Kafka deployments and
optimize for performance and
scalability of existing ones
Enterprise Support
24x7 support for the entire
Apache Kafka project, not just
a portion of it
Complete support across the entire adoption lifecycle
Kafka Training
Comprehensive hands-on
courses for developers and
operators from the Apache
Kafka experts
34
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and quality
assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise
35
Discount code: kafcom17
  Use the Apache Kafka community discount code to get $50 off
  www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by

More Related Content

What's hot (20)

PDF
Kafka streams windowing behind the curtain
confluent
 
PPTX
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
SeaweedFS introduction
chrislusf
 
PDF
Iceberg: a fast table format for S3
DataWorks Summit
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PDF
VictoriaLogs: Open Source Log Management System - Preview
VictoriaMetrics
 
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PDF
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
PDF
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Databricks
 
PPTX
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
PDF
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
PDF
Battle of the Stream Processing Titans – Flink versus RisingWave
Yingjun Wu
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
Kafka streams windowing behind the curtain
confluent
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
SeaweedFS introduction
chrislusf
 
Iceberg: a fast table format for S3
DataWorks Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Making Apache Spark Better with Delta Lake
Databricks
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaMetrics
 
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Kafka presentation
Mohammed Fazuluddin
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Databricks
 
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Yingjun Wu
 
Stream processing using Kafka
Knoldus Inc.
 

Viewers also liked (7)

PPTX
Open Metadata and Governance with Apache Atlas
DataWorks Summit
 
PDF
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Spark Summit
 
PPTX
Introducing Exactly Once Semantics To Apache Kafka
Apurva Mehta
 
PPTX
No data loss pipeline with apache kafka
Jiangjie Qin
 
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
Jean-Paul Azar
 
PDF
Intro to Pinot (2016-01-04)
Jean-François Im
 
PDF
Pinot: Realtime Distributed OLAP datastore
Kishore Gopalakrishna
 
Open Metadata and Governance with Apache Atlas
DataWorks Summit
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Spark Summit
 
Introducing Exactly Once Semantics To Apache Kafka
Apurva Mehta
 
No data loss pipeline with apache kafka
Jiangjie Qin
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Jean-Paul Azar
 
Intro to Pinot (2016-01-04)
Jean-François Im
 
Pinot: Realtime Distributed OLAP datastore
Kishore Gopalakrishna
 
Ad

Similar to user Behavior Analysis with Session Windows and Apache Kafka's Streams API (20)

PDF
Using Apache Kafka to Analyze Session Windows
confluent
 
PDF
Streaming analytics better than batch – when and why by Dawid Wysakowicz and ...
Big Data Spain
 
PDF
Streaming analytics better than batch when and why - (Big Data Tech 2017)
GetInData
 
PDF
How to Build Streaming Apps with Confluent II
confluent
 
PPTX
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
PDF
APAC Kafka Summit - Best Of
confluent
 
PDF
Kafka Vienna Meetup 020719
Patrik Kleindl
 
PDF
Stream Processing with Flink and Stream Sharing
confluent
 
PDF
Event streaming: A paradigm shift in enterprise software architecture
Sina Sojoodi
 
PDF
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Kai Wähner
 
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
PPTX
Kafka Streams: The Stream Processing Engine of Apache Kafka
Eno Thereska
 
PDF
Kafka Streams
Cristiano Altmann
 
PDF
Streaming Analytics for Financial Enterprises
Databricks
 
PPTX
Kafka Streams for Java enthusiasts
Slim Baltagi
 
PDF
Streaming Analytics
Neera Agarwal
 
PDF
Kafka Streams Windows: Behind the Curtain
Neil Buesing
 
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
PDF
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Using Apache Kafka to Analyze Session Windows
confluent
 
Streaming analytics better than batch – when and why by Dawid Wysakowicz and ...
Big Data Spain
 
Streaming analytics better than batch when and why - (Big Data Tech 2017)
GetInData
 
How to Build Streaming Apps with Confluent II
confluent
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
APAC Kafka Summit - Best Of
confluent
 
Kafka Vienna Meetup 020719
Patrik Kleindl
 
Stream Processing with Flink and Stream Sharing
confluent
 
Event streaming: A paradigm shift in enterprise software architecture
Sina Sojoodi
 
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Kai Wähner
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
Kafka Streams: The Stream Processing Engine of Apache Kafka
Eno Thereska
 
Kafka Streams
Cristiano Altmann
 
Streaming Analytics for Financial Enterprises
Databricks
 
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Streaming Analytics
Neera Agarwal
 
Kafka Streams Windows: Behind the Curtain
Neil Buesing
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 

Recently uploaded (20)

PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Import Data Form Excel to Tally Services
Tally xperts
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 

user Behavior Analysis with Session Windows and Apache Kafka's Streams API

  • 1. 1 User behavior analysis with Session Windows and Apache Kafka’s Streams API Michael G. Noll Product Manager
  • 2. 2 Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing
  • 3. 3 Kafka Streams API: to build real-time apps that power your core business Key benefits • Makes your Java apps highly scalable, elastic, fault-tolerant, stateful, distributed • No additional cluster • Easy to run as a service • Supports large aggregations and joins • Security and permissions fully integrated from Kafka Example Use Cases • Microservices • Reactive applications • Continuous queries • Continuous transformations • Event-triggered processes Streams API App Instance 1 Kafka Cluster Streams API App Instance N Your App ...
  • 4. 4 Use case examples Industry Use case examples Travel Build applications with the Kafka Streams API to make real-time decisions to find best suitable pricing for individual customers, to cross-sell additional services, and to process bookings and reservations Finance Build applications to aggregate data sources for real-time views of potential exposures and for detecting and minimizing fraudulent transactions Logistics Build applications to track shipments fast, reliably, and in real-time Retail Build applications to decide in real-time on next best offers, personalized promotions, pricing, and inventory management Automotive, Manufacturing Build applications to ensure their production lines perform optimally, to gain real- time insights into supply chains, and to monitor telemetry data from connected cars to decide if an inspection is needed And many more …
  • 5. 5 Some public use cases in the wild • Why Kafka Streams: towards a real-time streaming architecture, by Sky Betting and Gaming • http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/ • Applying Kafka’s Streams API for social messaging at LINE Corp. • http://developers.linecorp.com/blog/?p=3960 • Production pipeline at LINE, a social platform based in Japan with 220+ million users • Microservices and Reactive Applications at Capital One • https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams • Containerized Kafka Streams applications in Scala, by Hive Streaming • https://www.madewithtea.com/processing-tweets-with-kafka-streams.html • Geo-spatial data analysis • http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/ • Language classification with machine learning • https://dzone.com/articles/machine-learning-with-kafka-streams
  • 6. 6 Kafka Summit NYC, May 09 Here, the community will share latest Kafka Streams use cases. http://kafka-summit.org/
  • 7. 7 Agenda • Why are session windows so important? • Recap: What is windowing? • Session windows – example use case • Session windows – how they work • Session windows – API
  • 8. 8 Why are session windows so important? • We want to analyze user behavior, which is a very common use case area • To analyze user behavior on newspapers, social platforms, video sharing sites, booking sites, etc. • AND tailor the analysis to the individual user • Specifically, analyses of the type “how many X in one go?” – how many movies watched in one go? • Achieved through a per-user sessionization step on the input data. • AND this tailoring must be convenient and scalable • Achieved through automating the sessionization step, i.e. auto-discovery of sessions • Session-based analyses can range from simple metrics (e.g. count of user visits on a news website or social platform) to more complex metrics (e.g. customer conversion funnel and event flows).
  • 9. 9 What is windowing? • Aggregations such as “counting things” are key-based operations • Before you can aggregate your input data, it must first be grouped by key event-time8 AM7 AM6 AM event-time Alice Bob Dave 8 AM7 AM6 AM
  • 10. 10 What is windowing? • Aggregations such as “counting things” are key-based operations Alice: 10 movies Bob: 11 movies Dave: 8 movies “Let me COUNT how many movies each user has watched (IN TOTAL)” event-time Alice Bob Dave Feb 7Feb 6Feb 5
  • 11. 11 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave “Let me COUNT how many movies each user has watched PER DAY” Alice: 4 movies Bob: 3 movies Dave: 2 movies Feb 5 Feb 7Feb 6Feb 5
  • 12. 12 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave Alice: 1 movie Bob: 2 movies Dave: 4 movies Feb 6 Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER DAY”
  • 13. 13 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave Alice: 4 movies Bob: 4 movies Dave: 1 movie Feb 7 Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER DAY”
  • 14. 14 Session windows: use case • Session windows allow for “how many X in one go?” analyses, tailored to each key • Sessions are auto-discovered from the input data (we see how later) event-time Alice Bob Dave Alice: 1, 4, 1, 4 movies (4 sessions) Bob: 4, 6 movies (2 sessions) Dave: 3, 5 movies (2 sessions) Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER SESSION”
  • 15. 15 Comparing results • Let’s compare how results differ Alice Bob Dave IN TOTAL 10 11 8 PER DAY 3.0 (avg) 3.0 (avg) 2.3 (avg) time windows PER SESSION 2.5 (avg) 5.0 (avg) 4.0 (avg) session windowsno windows
  • 16. 16 Comparing results • Let’s compare how results differ if we our task was to rank the top users Alice Bob Dave IN TOTAL #2 #1 #3 PER DAY #1 #1 #3 time windows PER SESSION #3 #1 #2 session windowsno windows
  • 18. 18 Session windows: how they work • Definition of a session in Kafka Streams API is based on a configurable period of inactivity • Example: “If Alice hasn’t watched another movie in the past 3 hours, then next movie = new session!” Inactivity period
  • 19. 19 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … …
  • 20. 20 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … … Example: How many movies does Alice watch on average per session?” Inactivity period (e.g. 3 hours)
  • 21. 21 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … … Example: How many movies does Alice watch on average per session?”
  • 22. 22 Late-arriving data is handled transparently • Handling of late-arriving data is important because, in practice, a lot of data arrives late
  • 23. 23 Late-arriving data: example Users with mobile phones enter airplane, lose Internet connectivity Emails are being written during the 8h flight Internet connectivity is restored, phones will send queued emails now, though with an 8h delay Bob  writes  Alice  an   email  at  2  P.M. Bob’s  email  is  finally   being  sent  at  10  P.M.
  • 24. 24 Late-arriving data is handled transparently • Handling of late-arriving data is important because, in practice, a lot of data arrives late • Good news: late-arriving data is handled transparently and efficiently for you • Also, in your applications, you can define a grace period after which late-arriving data will be discarded (default: 1 day), and you can define this granularly per windowed operation • Example: “I want to sessionize the input data based on 15-min inactivity periods, and late-arriving data should be discarded if it is more than 12 hours late”
  • 25. 25 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … … • Late-arriving data may (1) create new sessions or (2) merge existing sessions
  • 26. 26 Sessions potentially merge as new events arrive Session Window
  • 27. 27 Sessions potentially merge as new events arrive Session Window
  • 28. 28 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … …
  • 29. 29 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … …
  • 31. 31Confidential Session windows: API in Confluent 3.2 / Apache Kafka 0.10.2 //  A  session  window  with  an  inactivity  gap  of  3h;  discard  data  that  is  12h late SessionWindows.with(TimeUnit.HOURS.toMillis(3)).until(TimeUnit.HOURS.toMillis(12)); Defining a session window //  Key  (String)  is  user,  value  (Avro  record)  is  the  movie  view  event  for  that  user. KStream<String,  GenericRecord>  movieViews =  ...; //  Count  movie  views  per  session,  per  user KTable<Windowed<String>,  Long>  sessionizedMovieCounts = movieViews .groupByKey(Serdes.String(),  genericAvroSerde)         .count(SessionWindows.with(TimeUnit.HOURS.toMillis(3)),  "views-­‐per-­‐session"); Full example: aggregating with session windows More details with documentation and examples at: http://docs.confluent.io/current/streams/developer-guide.html#session-windows https://github.com/confluentinc/examples
  • 32. 32Confidential Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing UP NEXT
  • 33. 33 Why Confluent? More than just enterprise software Confluent Platform The only enterprise open source streaming platform based entirely on Apache Kafka Professional Services Best practice consultation for future Kafka deployments and optimize for performance and scalability of existing ones Enterprise Support 24x7 support for the entire Apache Kafka project, not just a portion of it Complete support across the entire adoption lifecycle Kafka Training Comprehensive hands-on courses for developers and operators from the Apache Kafka experts
  • 34. 34 Get Started with Apache Kafka Today! https://www.confluent.io/downloads/ THE place to start with Apache Kafka! Thoroughly tested and quality assured More extensible developer experience Easy upgrade path to Confluent Enterprise
  • 35. 35 Discount code: kafcom17  Use the Apache Kafka community discount code to get $50 off  www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by