SlideShare a Scribd company logo
Introducing Kafka
Connect and
ImplementingCustom
Connectors
Kobi Hikri
Kafka Connect
Agenda
 What (is Kafka Connect)
 The World of Data
 Why (do we need it) Kafka Connect
 When (will we consider using) Kafka Connect
 How Kafka Connect
 Setting up the infrastructure
 Interacting with Kafka Connect
 Implementing a custom source connector
 Working with connectors
 What will not be covered in this talk (will be covered per request)
 Kafka ConnectTransformations
 Working with custom schemas
What is Kafka
Connect
 A Kafka integration tool
 Allows importing data into Kafka
 And exporting data out of Kafka
 Runs as a cluster of one or more system processes
 Each such process is referred to as a Worker
 Workers are grouped together in Worker Groups
 Connector Instances are managed by theConnect cluster
 Tasks are configured by connectors and are distributed across workers
 Data in being pulled by SourceTasks (configured by Source Connectors)
 Or Pushed by Sink Tasks (configured by Sink Connectors)
Worker
Worker
Worker
Worker
Worker Group Worker Group
TheWorld of Data
Dataisscatteredallover!
AJigsawandplentyofglueareneeded
Why KafkaConnect
 In one word
 Integration
 Of different data oriented technologies
 Of a multitude of changing APIs and communication protocols
 In more than one word
 Fault tolerant, scalable Integration with various technologies
When Kafka
Connect
 Implementing a Kafka Producer or Consumer within the
desired data source / target is impossible or expensive (e.g.
the source code is unavailable)
 Even when we can modify the source code of the source /
target system – but prefer to have a scalable and fault tolerant
integration with it (e.g. not to be limited in bandwidth by the
direct connection between the technology and our Kafka
cluster).
How KafkaConnect
Demo
Time
Docker
Compose
Kafka
Connect
Rest API
Custom
Source
Connector

More Related Content

What's hot (20)

PDF
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
Neo4j
 
PDF
Big Query Basics
Ido Green
 
PDF
Semantic AI
Semantic Web Company
 
PDF
Real-Time Market Data Analytics Using Kafka Streams
confluent
 
PPTX
Hadoop Data Modeling
Adam Doyle
 
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera, Inc.
 
PPTX
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
PDF
One Cloud Pitch Deck
Claudio de Castro Correa
 
PDF
Sharing and Deploying Data Science with KNIME Server
KNIMESlides
 
PPTX
What is aerospike database and why is it vastly superior to other database an...
Aerospike
 
PDF
Zipline—Airbnb’s Declarative Feature Engineering Framework
Databricks
 
PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Cassandra Learning
Ehsan Javanmard
 
PDF
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
PDF
Integrate Machine Learning into Your Spring Application in Less than an Hour
VMware Tanzu
 
PDF
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
PPTX
Intro to Google Cloud Platform Data Engineering.
Joseph Holbrook, Chief Learning Officer (CLO)
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
PPTX
Oracle Data Integrator
IT Help Desk Inc
 
PDF
Serving ML easily with FastAPI - meme version
Sebastián Ramírez Montaño
 
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
Neo4j
 
Big Query Basics
Ido Green
 
Real-Time Market Data Analytics Using Kafka Streams
confluent
 
Hadoop Data Modeling
Adam Doyle
 
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera, Inc.
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
One Cloud Pitch Deck
Claudio de Castro Correa
 
Sharing and Deploying Data Science with KNIME Server
KNIMESlides
 
What is aerospike database and why is it vastly superior to other database an...
Aerospike
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Cassandra Learning
Ehsan Javanmard
 
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
VMware Tanzu
 
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
Intro to Google Cloud Platform Data Engineering.
Joseph Holbrook, Chief Learning Officer (CLO)
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Oracle Data Integrator
IT Help Desk Inc
 
Serving ML easily with FastAPI - meme version
Sebastián Ramírez Montaño
 

Similar to Introducing Kafka Connect and Implementing Custom Connectors (20)

ODP
Introduction to Kafka connect
Knoldus Inc.
 
PPTX
Introduction to Kafka Connectors (Knolx).pptx
Knoldus Inc.
 
PDF
Diving into the Deep End - Kafka Connect
confluent
 
PDF
Introduction to Kafka Connectors
Knoldus Inc.
 
PDF
Introduction to Kafka Connectors
Knoldus Inc.
 
PDF
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
PDF
Overview of Kafka connect
Knoldus Inc.
 
PDF
Overview of Kafka connect
Knoldus Inc.
 
PPTX
Introduction to kafka connector
Knoldus Inc.
 
PPTX
Kafka connect 101
Whiteklay
 
PDF
Changing landscapes in data integration - Kafka Connect for near real-time da...
HostedbyConfluent
 
PDF
Leverage Kafka to build a stream processing platform
confluent
 
PPTX
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
PDF
A Better Kafka Connect With Kubernetes, Stefan Sprenger & Hakan Lofcali | Cur...
HostedbyConfluent
 
PPTX
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
confluent
 
PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PPTX
Kafka connect
Andrew Stevenson
 
Introduction to Kafka connect
Knoldus Inc.
 
Introduction to Kafka Connectors (Knolx).pptx
Knoldus Inc.
 
Diving into the Deep End - Kafka Connect
confluent
 
Introduction to Kafka Connectors
Knoldus Inc.
 
Introduction to Kafka Connectors
Knoldus Inc.
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Overview of Kafka connect
Knoldus Inc.
 
Overview of Kafka connect
Knoldus Inc.
 
Introduction to kafka connector
Knoldus Inc.
 
Kafka connect 101
Whiteklay
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
HostedbyConfluent
 
Leverage Kafka to build a stream processing platform
confluent
 
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
A Better Kafka Connect With Kubernetes, Stefan Sprenger & Hakan Lofcali | Cur...
HostedbyConfluent
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
confluent
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Kafka connect
Andrew Stevenson
 
Ad

More from Itai Yaffe (20)

PDF
Mastering Partitioning for High-Volume Data Processing
Itai Yaffe
 
PDF
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Itai Yaffe
 
PDF
Lessons Learnt from Running Thousands of On-demand Spark Applications
Itai Yaffe
 
PPTX
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
PDF
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Itai Yaffe
 
PDF
Evaluating Big Data & ML Solutions - Opening Notes
Itai Yaffe
 
PDF
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
PDF
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Itai Yaffe
 
PDF
Unleashing the Power of your Data
Itai Yaffe
 
PDF
Data Lake on Public Cloud - Opening Notes
Itai Yaffe
 
PDF
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...
Itai Yaffe
 
PDF
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
Itai Yaffe
 
PDF
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
Itai Yaffe
 
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
PDF
Scalable Incremental Index for Druid
Itai Yaffe
 
PDF
Funnel Analysis with Spark and Druid
Itai Yaffe
 
PDF
The benefits of running Spark on your own Docker
Itai Yaffe
 
PDF
Optimizing Spark-based data pipelines - are you up for it?
Itai Yaffe
 
PDF
Scheduling big data workloads on serverless infrastructure
Itai Yaffe
 
PDF
GraphQL API on a Serverless Environment
Itai Yaffe
 
Mastering Partitioning for High-Volume Data Processing
Itai Yaffe
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Itai Yaffe
 
Lessons Learnt from Running Thousands of On-demand Spark Applications
Itai Yaffe
 
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Itai Yaffe
 
Evaluating Big Data & ML Solutions - Opening Notes
Itai Yaffe
 
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Itai Yaffe
 
Unleashing the Power of your Data
Itai Yaffe
 
Data Lake on Public Cloud - Opening Notes
Itai Yaffe
 
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...
Itai Yaffe
 
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
Itai Yaffe
 
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
Itai Yaffe
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
Scalable Incremental Index for Druid
Itai Yaffe
 
Funnel Analysis with Spark and Druid
Itai Yaffe
 
The benefits of running Spark on your own Docker
Itai Yaffe
 
Optimizing Spark-based data pipelines - are you up for it?
Itai Yaffe
 
Scheduling big data workloads on serverless infrastructure
Itai Yaffe
 
GraphQL API on a Serverless Environment
Itai Yaffe
 
Ad

Recently uploaded (20)

PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Research Methodology Overview Introduction
ayeshagul29594
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
BinarySearchTree in datastructures in detail
kichokuttu
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
big data eco system fundamentals of data science
arivukarasi
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 

Introducing Kafka Connect and Implementing Custom Connectors

  • 2. Agenda  What (is Kafka Connect)  The World of Data  Why (do we need it) Kafka Connect  When (will we consider using) Kafka Connect  How Kafka Connect  Setting up the infrastructure  Interacting with Kafka Connect  Implementing a custom source connector  Working with connectors  What will not be covered in this talk (will be covered per request)  Kafka ConnectTransformations  Working with custom schemas
  • 3. What is Kafka Connect  A Kafka integration tool  Allows importing data into Kafka  And exporting data out of Kafka  Runs as a cluster of one or more system processes  Each such process is referred to as a Worker  Workers are grouped together in Worker Groups  Connector Instances are managed by theConnect cluster  Tasks are configured by connectors and are distributed across workers  Data in being pulled by SourceTasks (configured by Source Connectors)  Or Pushed by Sink Tasks (configured by Sink Connectors) Worker Worker Worker Worker Worker Group Worker Group
  • 5. Why KafkaConnect  In one word  Integration  Of different data oriented technologies  Of a multitude of changing APIs and communication protocols  In more than one word  Fault tolerant, scalable Integration with various technologies
  • 6. When Kafka Connect  Implementing a Kafka Producer or Consumer within the desired data source / target is impossible or expensive (e.g. the source code is unavailable)  Even when we can modify the source code of the source / target system – but prefer to have a scalable and fault tolerant integration with it (e.g. not to be limited in bandwidth by the direct connection between the technology and our Kafka cluster).