Introducing Kafka Connect and Implementing Custom Connectors

1 like650 views

The document introduces Kafka Connect, a tool for integrating data into and out of Kafka, designed to facilitate the management of data across various technologies. It outlines the architecture of Kafka Connect, emphasizing its fault tolerance and scalability, particularly in situations where direct integration is impractical. Additionally, it details the implementation of custom connectors and the setting up of necessary infrastructure to utilize Kafka Connect effectively.

Data & Analytics

Introducing Kafka
Connect and
ImplementingCustom
Connectors
Kobi Hikri
Kafka Connect

Agenda
 What (is Kafka Connect)
 The World of Data
 Why (do we need it) Kafka Connect
 When (will we consider using) Kafka Connect
 How Kafka Connect
 Setting up the infrastructure
 Interacting with Kafka Connect
 Implementing a custom source connector
 Working with connectors
 What will not be covered in this talk (will be covered per request)
 Kafka ConnectTransformations
 Working with custom schemas

What is Kafka
Connect
 A Kafka integration tool
 Allows importing data into Kafka
 And exporting data out of Kafka
 Runs as a cluster of one or more system processes
 Each such process is referred to as a Worker
 Workers are grouped together in Worker Groups
 Connector Instances are managed by theConnect cluster
 Tasks are configured by connectors and are distributed across workers
 Data in being pulled by SourceTasks (configured by Source Connectors)
 Or Pushed by Sink Tasks (configured by Sink Connectors)
Worker
Worker
Worker
Worker
Worker Group Worker Group

TheWorld of Data
Dataisscatteredallover!
AJigsawandplentyofglueareneeded

Why KafkaConnect
 In one word
 Integration
 Of different data oriented technologies
 Of a multitude of changing APIs and communication protocols
 In more than one word
 Fault tolerant, scalable Integration with various technologies

When Kafka
Connect
 Implementing a Kafka Producer or Consumer within the
desired data source / target is impossible or expensive (e.g.
the source code is unavailable)
 Even when we can modify the source code of the source /
target system – but prefer to have a scalable and fault tolerant
integration with it (e.g. not to be limited in bandwidth by the
direct connection between the technology and our Kafka
cluster).

How KafkaConnect
Demo
Time
Docker
Compose
Kafka
Connect
Rest API
Custom
Source
Connector

More Related Content

What's hot (20)

PDF

Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdfNeo4j

PDF

Big Query BasicsIdo Green

PDF

Semantic AISemantic Web Company

PDF

Real-Time Market Data Analytics Using Kafka Streamsconfluent

PPTX

Hadoop Data ModelingAdam Doyle

PPTX

Enterprise Data Hub: The Next Big Thing in Big DataCloudera, Inc.

PPTX

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk

PDF

One Cloud Pitch DeckClaudio de Castro Correa

PDF

Sharing and Deploying Data Science with KNIME ServerKNIMESlides

PPTX

What is aerospike database and why is it vastly superior to other database an...Aerospike

PDF

Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks

PPTX

DW Migration Webinar-March 2022.pptxDatabricks

PPTX

Cassandra LearningEhsan Javanmard

PDF

Machine learning and big data @ uber a tale of two systemsZhenxiao Luo

PDF

Integrate Machine Learning into Your Spring Application in Less than an HourVMware Tanzu

PDF

Meetup: Streaming Data Pipeline DevelopmentTimothy Spann

PPTX

Intro to Google Cloud Platform Data Engineering.Joseph Holbrook, Chief Learning Officer (CLO)

PDF

Batch Processing at Scale with Flink & IcebergFlink Forward

PPTX

Oracle Data Integrator IT Help Desk Inc

PDF

Serving ML easily with FastAPI - meme versionSebastián Ramírez Montaño

Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdfNeo4j

Big Query BasicsIdo Green

Semantic AISemantic Web Company

Real-Time Market Data Analytics Using Kafka Streamsconfluent

Hadoop Data ModelingAdam Doyle

Enterprise Data Hub: The Next Big Thing in Big DataCloudera, Inc.

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk

One Cloud Pitch DeckClaudio de Castro Correa

Sharing and Deploying Data Science with KNIME ServerKNIMESlides

What is aerospike database and why is it vastly superior to other database an...Aerospike

Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks

DW Migration Webinar-March 2022.pptxDatabricks

Cassandra LearningEhsan Javanmard

Machine learning and big data @ uber a tale of two systemsZhenxiao Luo

Integrate Machine Learning into Your Spring Application in Less than an HourVMware Tanzu

Meetup: Streaming Data Pipeline DevelopmentTimothy Spann

Intro to Google Cloud Platform Data Engineering.Joseph Holbrook, Chief Learning Officer (CLO)

Batch Processing at Scale with Flink & IcebergFlink Forward

Oracle Data Integrator IT Help Desk Inc

Serving ML easily with FastAPI - meme versionSebastián Ramírez Montaño

Similar to Introducing Kafka Connect and Implementing Custom Connectors (20)

ODP

Introduction to Kafka connectKnoldus Inc.

PPTX

Introduction to Kafka Connectors (Knolx).pptxKnoldus Inc.

PDF

Diving into the Deep End - Kafka Connectconfluent

PDF

Introduction to Kafka Connectors Knoldus Inc.

PDF

Introduction to Kafka ConnectorsKnoldus Inc.

PDF

Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent

PDF

Overview of Kafka connectKnoldus Inc.

PDF

Overview of Kafka connectKnoldus Inc.

PPTX

Introduction to kafka connectorKnoldus Inc.

PPTX

Kafka connect 101Whiteklay

PDF

Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent

PDF

Leverage Kafka to build a stream processing platformconfluent

PPTX

Riding the Streaming Wave DIY styleKonstantine Karantasis

PDF

8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data

PDF

A Better Kafka Connect With Kubernetes, Stefan Sprenger & Hakan Lofcali | Cur...HostedbyConfluent

PPTX

Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...confluent

PPTX

Data Pipelines with Kafka ConnectKaufman Ng

PDF

Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent

PDF

Break data silos with real-time connectivity using Confluent Cloud Connectorsconfluent

PPTX

Kafka connectAndrew Stevenson

Introduction to Kafka connectKnoldus Inc.

Introduction to Kafka Connectors (Knolx).pptxKnoldus Inc.

Diving into the Deep End - Kafka Connectconfluent

Introduction to Kafka Connectors Knoldus Inc.

Introduction to Kafka ConnectorsKnoldus Inc.

Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent

Overview of Kafka connectKnoldus Inc.

Introduction to kafka connectorKnoldus Inc.

Kafka connect 101Whiteklay

Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent

Leverage Kafka to build a stream processing platformconfluent

Riding the Streaming Wave DIY styleKonstantine Karantasis

8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data

A Better Kafka Connect With Kubernetes, Stefan Sprenger & Hakan Lofcali | Cur...HostedbyConfluent

Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...confluent

Data Pipelines with Kafka ConnectKaufman Ng

Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent

Break data silos with real-time connectivity using Confluent Cloud Connectorsconfluent

Kafka connectAndrew Stevenson

More from Itai Yaffe (20)

PDF

Mastering Partitioning for High-Volume Data ProcessingItai Yaffe

PDF

Solving Data Engineers Velocity - Wix's Data Warehouse AutomationItai Yaffe

PDF

Lessons Learnt from Running Thousands of On-demand Spark ApplicationsItai Yaffe

PPTX

Why do the majority of Data Science projects never make it to production?Itai Yaffe

PDF

Planning a data solution - "By Failing to prepare, you are preparing to fail"Itai Yaffe

PDF

Evaluating Big Data & ML Solutions - Opening NotesItai Yaffe

PDF

Big data serving: Processing and inference at scale in real timeItai Yaffe

PDF

Data Lakes on Public Cloud: Breaking Data Management MonolithsItai Yaffe

PDF

Unleashing the Power of your DataItai Yaffe

PDF

Data Lake on Public Cloud - Opening NotesItai Yaffe

PDF

Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...Itai Yaffe

PDF

DevTalks Reimagined 2020 - Funnel Analysis with Spark and DruidItai Yaffe

PDF

Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)Itai Yaffe

PDF

A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe

PDF

Scalable Incremental Index for DruidItai Yaffe

PDF

Funnel Analysis with Spark and DruidItai Yaffe

PDF

The benefits of running Spark on your own DockerItai Yaffe

PDF

Optimizing Spark-based data pipelines - are you up for it?Itai Yaffe

PDF

Scheduling big data workloads on serverless infrastructureItai Yaffe

PDF

GraphQL API on a Serverless EnvironmentItai Yaffe