SlideShare a Scribd company logo
Cosmos DB –
Kafka Connectors
Abinav Rameesh
Program Manager, Cosmos DB
01
Kafka Connect
Overview
02
Kafka Integration
Use Cases for
Cosmos DB
03
Cosmos DB Source
& Sink Architecture
Overview
04
Demo
05
Taking It
Further
What is a
Connector?
Confluent Platform offers 120+
pre-built connectors to help you
quickly and reliably integrate
with Apache Kafka®.
Connectors import and export data from
some of the most commonly used data
systems.
Connectors either run as a managed
resource on Confluent Cloud or as a self
managed resource on a self managed kafka
cluster.
Kafka Connect runs under the Java
virtual machine (JVM) as a process
known as a worker. Each worker can
execute multiple connectors.
Connect Architecture
•Connectors are responsible for the interaction between Kafka
Connect and the external technology being integrated with
•Converters handle the serialization and deserialization of data
•Transformations can optionally apply one or more transformations
to the data passing through the pipeline
Kafka Cosmos DB
SHARDING
SCALING
CDC
TUNABILITY
MANAGED
Kafka is horizontally partitioned with
brokers serving as leaders and
followers, each owning it’s own
logical range of data.
Kafka can be scaled seamlessly by
simply increasing the number of
brokers in the cluster.
Kafka provides Consumer library to
retrieve changes from each of the
physical partitions of the topic.
Kafka performance can be tuned for
batch sizes, memory thresholds,
polling frequencies etc.
Kafka can be self provisioned or fully
managed through Confluent Cloud.
Cosmos DB is horizontally
partitioned with partitions
composing a replica set.
Cosmos DB can be scaled elastically
by simply increasing the throughput
for dataset.
Cosmos DB provides a Change Feed
Processor, which contains inbuilt
logic to retrieve changes from the
physical partitions for the container.
Cosmos DB can be tuned for query
performance, RU consumption,
batch sizes for writes and reads etc.
Cosmos DB is a fully managed
service.
01
Source and Sink
Connectors for
Cosmos DB facilitate
seamless integration
without the need to
write complex
application code to
migrate data to and
from Kafka.
SOURCE, SINK
ZERO CODE
DATA FORMATS
FULLY MANAGED
02
Only
configurations are
needed to point to
the Cosmos DB
account and
Kafka cluster with
additional
customization
options.
03
JSON and AVRO
data formats are
supported with
additional format
options to come
based on user
feedback.
04
Fully managed
(through Confluent
Cloud) as well as
self managed (using
the connector
directly) are
available.
Cosmos DB – Kafka Use Cases
Bookings
Forecasting
Analytics
Flight
Recommendations
Marketing
Revenue Optimizer
Cosmos DB – Kafka Connector Architecture
…. ….
Managed Kafka
Connect Cluster
(Source Connector)
Change Feed
Processor Kafka Producer
Reading from
Cosmos DB’s
physical
partitions
Writing to the
Kafka topic’s
physical
partitions
Cosmos DB – Kafka Connector Architecture
….
Managed Kafka
Connect Cluster
(Sink Connector)
Cosmos DB Java Client
issuing writes to the
Cosmos container
Kafka Consumer
pulling from the
topic’s brokers
Writing to Cosmos DB’s
physical partitions
….
Demo
Taking It Further

More Related Content

What's hot (20)

PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
HostedbyConfluent
 
PDF
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
PDF
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
HostedbyConfluent
 
PDF
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...
HostedbyConfluent
 
PDF
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
HostedbyConfluent
 
PDF
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
HostedbyConfluent
 
PDF
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
HostedbyConfluent
 
PDF
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
HostedbyConfluent
 
PDF
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
HostedbyConfluent
 
PDF
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
HostedbyConfluent
 
PPTX
Devops Days, 2019 - Charlotte
botsplash.com
 
PDF
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
HostedbyConfluent
 
PDF
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
HostedbyConfluent
 
PDF
Developing custom transformation in the Kafka connect to minimize data redund...
HostedbyConfluent
 
PPTX
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
HostedbyConfluent
 
PDF
apidays LIVE New York 2021 - Service reliability through autoscaling workload...
apidays
 
PDF
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
HostedbyConfluent
 
PDF
Kafka and Kafka Streams in the Global Schibsted Data Platform
Fredrik Vraalsen
 
PDF
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
HostedbyConfluent
 
PDF
Building Microservices with Apache Kafka by Colin McCabe
Data Con LA
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
HostedbyConfluent
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
HostedbyConfluent
 
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...
HostedbyConfluent
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
HostedbyConfluent
 
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
HostedbyConfluent
 
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
HostedbyConfluent
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
HostedbyConfluent
 
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
HostedbyConfluent
 
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
HostedbyConfluent
 
Devops Days, 2019 - Charlotte
botsplash.com
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
HostedbyConfluent
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
HostedbyConfluent
 
Developing custom transformation in the Kafka connect to minimize data redund...
HostedbyConfluent
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
HostedbyConfluent
 
apidays LIVE New York 2021 - Service reliability through autoscaling workload...
apidays
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
HostedbyConfluent
 
Kafka and Kafka Streams in the Global Schibsted Data Platform
Fredrik Vraalsen
 
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
HostedbyConfluent
 
Building Microservices with Apache Kafka by Colin McCabe
Data Con LA
 

Similar to Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft (20)

PDF
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
PPTX
Connecting kafka message systems with scylla
Maheedhar Gunturu
 
PPTX
Apache Cassandra Lunch #79: Cassandra API in Cosmos DB
Anant Corporation
 
PDF
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
PDF
Cosmos DB - Database for Serverless era
Michał Jankowski
 
PDF
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Denny Lee
 
PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PDF
Data integration with Apache Kafka
confluent
 
PDF
Luciano Moreira_Jacob Bogie-BRSP005-10.3_22_FINAL.pdf
HostedbyConfluent
 
PDF
Diving into the Deep End - Kafka Connect
confluent
 
PPTX
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
PDF
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
HostedbyConfluent
 
PDF
Dealing with Azure Cosmos DB
Mihail Mateev
 
PDF
Partner Development Guide for Kafka Connect
confluent
 
PPTX
Introduction to Cosmos DB Presentation.pptx
Knoldus Inc.
 
PPTX
Confluent and Syncsort Webinar August 2016
Precisely
 
PDF
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
HostedbyConfluent
 
PDF
Leverage Kafka to build a stream processing platform
confluent
 
PPTX
How to integrate your database with kafka & CDC
Abdallah Mahmoud
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Connecting kafka message systems with scylla
Maheedhar Gunturu
 
Apache Cassandra Lunch #79: Cassandra API in Cosmos DB
Anant Corporation
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
Cosmos DB - Database for Serverless era
Michał Jankowski
 
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Denny Lee
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Data integration with Apache Kafka
confluent
 
Luciano Moreira_Jacob Bogie-BRSP005-10.3_22_FINAL.pdf
HostedbyConfluent
 
Diving into the Deep End - Kafka Connect
confluent
 
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
HostedbyConfluent
 
Dealing with Azure Cosmos DB
Mihail Mateev
 
Partner Development Guide for Kafka Connect
confluent
 
Introduction to Cosmos DB Presentation.pptx
Knoldus Inc.
 
Confluent and Syncsort Webinar August 2016
Precisely
 
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
HostedbyConfluent
 
Leverage Kafka to build a stream processing platform
confluent
 
How to integrate your database with kafka & CDC
Abdallah Mahmoud
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 

Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft

  • 1. Cosmos DB – Kafka Connectors Abinav Rameesh Program Manager, Cosmos DB
  • 2. 01 Kafka Connect Overview 02 Kafka Integration Use Cases for Cosmos DB 03 Cosmos DB Source & Sink Architecture Overview 04 Demo 05 Taking It Further
  • 3. What is a Connector? Confluent Platform offers 120+ pre-built connectors to help you quickly and reliably integrate with Apache Kafka®. Connectors import and export data from some of the most commonly used data systems. Connectors either run as a managed resource on Confluent Cloud or as a self managed resource on a self managed kafka cluster. Kafka Connect runs under the Java virtual machine (JVM) as a process known as a worker. Each worker can execute multiple connectors.
  • 4. Connect Architecture •Connectors are responsible for the interaction between Kafka Connect and the external technology being integrated with •Converters handle the serialization and deserialization of data •Transformations can optionally apply one or more transformations to the data passing through the pipeline
  • 5. Kafka Cosmos DB SHARDING SCALING CDC TUNABILITY MANAGED Kafka is horizontally partitioned with brokers serving as leaders and followers, each owning it’s own logical range of data. Kafka can be scaled seamlessly by simply increasing the number of brokers in the cluster. Kafka provides Consumer library to retrieve changes from each of the physical partitions of the topic. Kafka performance can be tuned for batch sizes, memory thresholds, polling frequencies etc. Kafka can be self provisioned or fully managed through Confluent Cloud. Cosmos DB is horizontally partitioned with partitions composing a replica set. Cosmos DB can be scaled elastically by simply increasing the throughput for dataset. Cosmos DB provides a Change Feed Processor, which contains inbuilt logic to retrieve changes from the physical partitions for the container. Cosmos DB can be tuned for query performance, RU consumption, batch sizes for writes and reads etc. Cosmos DB is a fully managed service.
  • 6. 01 Source and Sink Connectors for Cosmos DB facilitate seamless integration without the need to write complex application code to migrate data to and from Kafka. SOURCE, SINK ZERO CODE DATA FORMATS FULLY MANAGED 02 Only configurations are needed to point to the Cosmos DB account and Kafka cluster with additional customization options. 03 JSON and AVRO data formats are supported with additional format options to come based on user feedback. 04 Fully managed (through Confluent Cloud) as well as self managed (using the connector directly) are available.
  • 7. Cosmos DB – Kafka Use Cases
  • 9. Cosmos DB – Kafka Connector Architecture …. …. Managed Kafka Connect Cluster (Source Connector) Change Feed Processor Kafka Producer Reading from Cosmos DB’s physical partitions Writing to the Kafka topic’s physical partitions
  • 10. Cosmos DB – Kafka Connector Architecture …. Managed Kafka Connect Cluster (Sink Connector) Cosmos DB Java Client issuing writes to the Cosmos container Kafka Consumer pulling from the topic’s brokers Writing to Cosmos DB’s physical partitions ….
  • 11. Demo