SlideShare a Scribd company logo
Designing Scalable
and Extendable Data
Pipeline for Call of
Duty Games
Yaroslav Tkachenko
Senior Data Engineer at Activision
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
1+
PB
Data lake size
(AWS S3)
Number of topics in the
biggest cluster
(Apache Kafka) 600+
10k+
Messages per second
(Apache Kafka)
Scaling the data pipeline even further
Volume
Well-known industry
techniques
Games
Using previous experience
Use-cases
Completely unpredictable
Complexity
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
Kafka topic
Consumer
or
Producer
Partition 1
Partition 2
Partition 3
Kafka topics are partitioned and replicated
We need to keep the number
of topics and partitions low
More topics means more operational burden.
Number of partitions in a fixed cluster is not infinite.
Autoscaling Kafka is impossible, scaling is hard.
Topic naming convention
$env.$source.$title.$category-$version
prod.glutton.1234.telemetry_match_event-v1
Unique game id
“CoD WW2 on PSN”Producer
A proper solution has
been invented decades
ago.
Think about databases.
Messaging system IS a
form of a database
Data topic = Database + Table.
Data topic = Namespace + Data type.
telemetry.matches
user.logins
marketplace.purchases
prod.glutton.1234.telemetry_match_event-v1
dev.user_login_records.4321.all-v1
prod.marketplace.5678.purchase_event-v1
Compare this
Each approach has pros and cons
• Topics that use metadata for their
names are obviously easier to track
and monitor (and even consume).
• As a consumer, I can consume
exactly what I want, instead of
consuming a single large topic and
extracting required values.
• These dynamic fields can and will
change. Producers (sources) and
consumers will change.
• Very efficient utilization of topics
and partitions.
• Finally, it’s impossible to enforce
any constraints with a topic name.
And you can always end up with dev
data in prod topic and vice versa.
After removing
necessary metadata
from the topic names
stream processing
becomes mandatory.
Stream processing becomes mandatory
Measuring → Validating → Enriching → Filtering & routing
Refinery
Having a single
message schema for a
topic is more than
just a nice-to-have.
Number of supported
message formats 8
Stream processor
JSON Protobuf
Custom Avro
? ?
? ?
// Application.java
props.put("value.deserializer", "com.example.CustomDeserializer");
// CustomDeserializer.java
public class CustomDeserializer implements Deserializer<???> {
@Override
public ??? deserialize(String topic, byte[] data) {
???
}
}
Custom deserialization
Message envelope anatomy
ID, env, timestamp, source, game, ...
Event
Header / Metadata
Body / Payload
Message
Unified message envelope
syntax = "proto2";
message MessageEnvelope {
optional bytes message_id = 1;
optional uint64 created_at = 2;
optional uint64 ingested_at = 3;
optional string source = 4;
optional uint64 title_id = 5;
optional string env = 6;
optional UserInfo resource_owner = 7;
optional SchemaInfo schema_info = 8;
optional string message_name = 9;
optional bytes message = 100;
}
Schema Registry
• API to manage message schemas
• Single source of truth for all producers and consumers
• It should be impossible to send a message to the pipeline
without registering its schema in the Schema Registry!
• Good Schema Registry supports immutability, versioning and
basic validation
• Activision uses custom Schema Registry implemented with
Python and Cassandra
Summary: scaling and extending the data pipeline
Games
• By using unified message envelope
and topic names adding a new game
becomes almost effortless
• “Operational” stream processing
makes it possible
• Still flexible enough: each game can
use its own message payload format
via Schema Registry
Use-cases
• Topic names express data types, not
producers or consumers
• Stream filtering & routing allows
low-cost experiments
• Data catalog built on top of Schema
Registry promotes data discovery
Thanks!
@sap1ens

More Related Content

What's hot (20)

PPTX
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
PDF
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
PDF
KSQL Intro
confluent
 
PDF
KSQL: Streaming SQL for Kafka
confluent
 
PDF
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
confluent
 
PDF
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Allen (Xiaozhong) Wang
 
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
PPTX
Stream Application Development with Apache Kafka
Matthias J. Sax
 
PDF
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
PDF
Performance Analysis and Optimizations for Kafka Streams Applications
Guozhang Wang
 
PPTX
Kafka
shrenikp
 
PDF
kafka
Ariel Moskovich
 
PDF
LINE's messaging service architecture underlying more than 200 million monthl...
kawamuray
 
PDF
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
PDF
Chicago Kafka Meetup
Cliff Gilmore
 
ODP
Apache Kafka Demo
Edward Capriolo
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
HostedbyConfluent
 
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
KSQL Intro
confluent
 
KSQL: Streaming SQL for Kafka
confluent
 
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
confluent
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Allen (Xiaozhong) Wang
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Stream Application Development with Apache Kafka
Matthias J. Sax
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
Performance Analysis and Optimizations for Kafka Streams Applications
Guozhang Wang
 
Kafka
shrenikp
 
LINE's messaging service architecture underlying more than 200 million monthl...
kawamuray
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Chicago Kafka Meetup
Cliff Gilmore
 
Apache Kafka Demo
Edward Capriolo
 
ksqlDB: A Stream-Relational Database System
confluent
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
HostedbyConfluent
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 

More from Yaroslav Tkachenko (13)

PDF
Dynamic Change Data Capture with Flink CDC and Consistent Hashing
Yaroslav Tkachenko
 
PDF
Streaming SQL for Data Engineers: The Next Big Thing?
Yaroslav Tkachenko
 
PDF
Apache Flink Adoption at Shopify
Yaroslav Tkachenko
 
PDF
Storing State Forever: Why It Can Be Good For Your Analytics
Yaroslav Tkachenko
 
PDF
It's Time To Stop Using Lambda Architecture
Yaroslav Tkachenko
 
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
PPTX
10 tips for making Bash a sane programming language
Yaroslav Tkachenko
 
PDF
Building Stateful Microservices With Akka
Yaroslav Tkachenko
 
PDF
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
PDF
Why Actor-Based Systems Are The Best For Microservices
Yaroslav Tkachenko
 
PPTX
Why actor-based systems are the best for microservices
Yaroslav Tkachenko
 
PPTX
Building Eventing Systems for Microservice Architecture
Yaroslav Tkachenko
 
PPTX
Быстрая и безболезненная разработка клиентской части веб-приложений
Yaroslav Tkachenko
 
Dynamic Change Data Capture with Flink CDC and Consistent Hashing
Yaroslav Tkachenko
 
Streaming SQL for Data Engineers: The Next Big Thing?
Yaroslav Tkachenko
 
Apache Flink Adoption at Shopify
Yaroslav Tkachenko
 
Storing State Forever: Why It Can Be Good For Your Analytics
Yaroslav Tkachenko
 
It's Time To Stop Using Lambda Architecture
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
10 tips for making Bash a sane programming language
Yaroslav Tkachenko
 
Building Stateful Microservices With Akka
Yaroslav Tkachenko
 
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
Why Actor-Based Systems Are The Best For Microservices
Yaroslav Tkachenko
 
Why actor-based systems are the best for microservices
Yaroslav Tkachenko
 
Building Eventing Systems for Microservice Architecture
Yaroslav Tkachenko
 
Быстрая и безболезненная разработка клиентской части веб-приложений
Yaroslav Tkachenko
 
Ad

Recently uploaded (20)

PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Ad

Designing Scalable and Extendable Data Pipeline for Call Of Duty Games