SlideShare a Scribd company logo
Building Kafka Connectors with Kotlin
A Step-by-Step Guide to Creation and Deployment
By Sami Alashabi and Ramzi Alashabi
2
Building Kafka
Connectors with
Kotlin
A Step-by-Step Guide to Creation
and Deployment
Sami Alashabi, Solutions Architect, Accenture/Essent
Ramzi Alashabi, Senior Data Engineer, ABN Amro
3
Sami Alashabi
12+ Year Journey in Data
Various Roles and Segments
Architecture, Big Data, Real-Time
Low Latency Distributed Systems,
AWS
Love to solve problems
Love spending time with family
when I’m not coding/architecting
Kafka Enthusiast
https://www.linkedin.com/in/sami-alashabi/
4
Ramzi Alashabi
10+ Years Data Specialist
Micro-services, ETLs, and Cloud
Engineering
Transform ideas to Production
Love learning new Languages &
hanging out with the fam.
Yes, I'm a Dog Person
https://www.linkedin.com/in/ramzialashabi/
5
Q&A
Questions & Follow Up
01
02
03
Kafka Connect
Overview, Architecture, Types & Concepts
Kotlin
Introduction, Background, Features & Advantages
Implementation & Code
Building a Source Connector, Test & Deployment Strategies
Agenda
04 Key Learnings
Summary & Takeaways
05
6
Kafka
7
Kafka Connecter
8
Connect: Start Kafka Connector
9
Connect: Start Kafka Connector
10
Connect: Start Kafka Connector
11
Connect: Start Kafka Connector
12
Connect: Start Kafka Connector
13
Connect: Start Kafka Connector
14
Connect: Start Kafka Connector
15
Connect: Start Kafka Connector
16
Connect: Start Kafka Connector
17
Connect: Start Kafka Connector
18
Connect: Start Kafka Connector
19
Connect: Start Kafka Connector
20
Connect: Start Kafka Connector
21
Connect: Standalone vs Distributed
Standalone
Ideal for large
production
Tasks are
distributed
across multiple
worker nodes
Configuration
stored in
Kafka, allows
dynamic
updates
Fault
tolerance,
tasks are
automatically
redistributed
It provides automatic
scalability.
more worker processes
can be added to scale
up (elastic)
Distributed
Ideal for
development &
testing
Tasks executed
in a single
process
Configuration
in a properties
file
No fault
tolerance,
If the process
fails, all tasks
stop
No automatic
scalability,
To scale up, you need to
manually start more
standalone processes.
22
curl --location 'http://kafkaConnect:8083/connectors' 
--header 'Content-Type: application/json' 
--header 'Authorization: Basic *******************' 
--data '{
"name": "GitlabSourceConnector-merge-requests",
"config": {
"name": "GitlabSourceConnector-merge-requests",
"connector.class":
"com.sami12rom.kafka.gitlab.GitlabSourceConnector",
"gitlab.repositories":
"kafka/confluent_kafka_connect_aws_terraform",
"gitlab.service.url": "https://gitlab.compny.nl/api/v4/",
"gitlab.resources": "merge_requests",
"gitlab.since": "2023-12-10T20:12:59.300Z",
"gitlab.access.token": "*****************",
"max.poll.interval.ms": "40000",
"topic.name.pattern": "gitlab-merge-requests",
"tasks.max": 1,
...
}
}'
Distributed Mode
Offsets
&
Config
&
Status
23
Single Message Transform
Single Message Transform: Is a way to modify the individual
messages as it flows through the Kafka Connect pipeline e.g.
● ReplaceField:
org.apache.kafka.connect.transforms.ReplaceField$Key
● MaskField:
org.apache.kafka.connect.transforms.MaskField$Value
● InsertField:
org.apache.kafka.connect.transforms.InsertField$Value
"config": {
...
"transforms":"flatten,createKey",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": "_",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id,iid,project_id"
}
24
Data formats can be chosen depending on the specific requirements of your application:
● ProtobufConverter: When you need to optimize for speed and size -
io.confluent.connect.protobuf.ProtobufConverter
● JsonSchemaConverter: When you want a human-readable format and working with RESTful APIs
- io.confluent.connect.json.JsonSchemaConverter
● AvroConverter: is easiest for schema evolution - io.confluent.connect.avro.AvroConverter
● JsonConverter: When you want a human-readable format and don't need a schema -
org.apache.kafka.connect.json.JsonConverter
Converters & Data Formats
"config": {
...
"key.converter":"io.confluent.connect.json.JsonSchemaConverter",
"key.converter.schema.registry.url":"http://schema-registry:8081",
"value.converter":"io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schema.registry.url":"http://schema-registry:8081"
}
25
25
Kotlin
Background, Features
and Advantages
26
Introduction
Kotlin is a modern, statically typed programming
language that mainly targets the Java Virtual
Machine (JVM)
● It was first introduced by JetBrains in 2011.
● In 2019, Google announced Kotlin as an
official language for Android development.
● Growing Community of Developers.
27
Features
& Advantages
val message = "Hello, World!" // Type inference
if (message is String) { // Smart cast
println(message.length)} // Allows accessing String-specific funcs
// Using default arguments
fun greet(name: String = "John Doe", message: String = "Hello") {
println("$message, $name!")}
greet()
// Safe Calls (?.): Execute only when the value is not null
val name: String? = null
val length: Int? = name?.length
// Elvis Operator (?:): Use value if not null, otherwise use default
val name: String? = null
val length = name?.length ?: -1
// Not-null assertion (!!): Use when sure the value is not null
val name: String? = null
val length = name!!.length
// Higher-order function that takes a function as a parameter
fun calculate(x: Int, y: Int, operation: (Int, Int) -> Int): Int {
return operation(x, y)}
// Using lambda expression
val result = calculate(5, 3) { a, b -> a + b }
Concise Syntax
Reduces boilerplate which
allows writing clean, compact &
more readable code e.g.
● Type inference
● Smart casts
● Default arguments
Safe & Reliable
Built-in null safety features,
eliminating the infamous
NullPointerException errors
using
● safe calls (?.)
● the Elvis operator (?:)
● non-null assertion (!!)
Interoperability
It is fully compatible with Java,
which means you can seamlessly
use Kotlin code in Java projects
and vice versa.
Functional
Programming support
It embraces functional
programming and offers features
like higher-order & first-class
functions, lambda expressions,
functional utilities such as map,
filter, and reduce.
Implementation
& Code
Building A Source
Connector
29
Build.gradle.kts
● Plugins: e.g. Java library plugin, the
Kotlin JVM plugin, the Git version plugin,
and the Maven Publish plugin.
● Repositories: specifies where to fetch
dependencies from.
● Dependencies: libraries the project
depends on, including both
implementation and test dependencies
● Tasks: Test, Build, Jar
● Publishing: publish to a Maven
repository
plugins {
`java-library`
kotlin("jvm") version "1.9.22"
id("com.palantir.git-version") version "1.0.0"
`maven-publish`
}
dependencies {
implementation("org.apache.kafka:connect-api:3.4.0”)
implementation("commons-validator:commons-validator:1.7")
testImplementation("org.testcontainers:kafka:1.19.6")
}
Gitlab: Building a
Source Connector
30
Source Connector Interface
● GitlabSourceConnector extends from
SourceConnector.
● SourceConnector: part of the Kafka
Connect framework to stream data from
external data systems to Kafka.
● Version: Returns the version of the
connector and is often used for logging
and debugging purposes.
Gitlab: Building a
Source Connector class GitlabSourceConnector: SourceConnector() {
override fun version(): String {
return ConnectorVersionDetails::class.java.`package`.implementationVersion ?:
"1.0.0" }
override fun start(props: Map<String, String>) {}
override fun config(): ConfigDef {}
override fun taskClass(): Class<out Task> {}
override fun taskConfigs(maxTasks: Int):
List<Map<String, String>> {}
override fun stop() {}
}
31
Gitlab: Building a
Source Connector class GitlabSourceConnector: SourceConnector() {
override fun version(): String {}
override fun start(props: Map<String, String>) {
logger.info("Starting GitlabSourceConnector”)
this.props = props
}
override fun config(): ConfigDef {}
override fun taskClass (): Class< out Task> {}
override fun taskConfigs (maxTasks: Int): List<Map<String , String>> {}
override fun stop() {
logger.info("Requested connector to stop at ${Instant.now()}")
}
Source Connector Lifecycle
● The start and stop methods are part of
the lifecycle of a Source Connector in
Kafka Connect.
● start(props) is called on initialization
and allows the set up of any resources
the connector needs to run. The props is
a map of configuration settings.
● stop is called when the connector is
being shut down and where it clean up
any resources that were opened or
started in the start method.
32
Gitlab: Building a
Source Connector
Source Connector Task
Configuration
● taskConfigs method is used to divide
the work of the connector into smaller,
independent tasks that can be distributed
across multiple workers in a Kafka
Connect cluster, with benefits such as:
○ Parallelism
○ Scalability
○ Fault Isolation
○ Flexibility
override fun taskConfigs(maxTasks: Int): List<Map<String,
String>> {
val taskConfigs = ListOf<Map<String, String>>()
val repositories = props[REPOSITORIES].split(", ")
val groups = repositories.size.coerceAtMost(maxTasks)
val reposGrouped =
ConnectorUtils.groupPartitions(repositories, groups)
for (group in reposGrouped) {
val taskProps = mutableMapOf<String, String>()
taskProps.putAll(props?.toMap()!!)
taskProps.replace(REPOSITORIES,
group.joinToString(";"))
taskConfigs.add(taskProps)
}
return taskConfigs
}
Output config:
[ {gitlab.repositories=Repo#1;Repo#2}, {gitlab.repositories=Repo#3} ]
Input config:
{"gitlab.repositories": "Repo#1, Repo#2, Repo#3", "tasks.max": 2
33
Gitlab: Building a
Source Connector
override fun config(): ConfigDef {}
const val GITLAB_ENDPOINT_CONFIG = "gitlab.service.url"
val CONFIG: ConfigDef = ConfigDef()
.define(
/* name = */ GITLAB_ENDPOINT_CONFIG,
/* type = */ ConfigDef.Type.STRING,
/* defaultValue = */ "https://gitlab.company.nl/api/v4",
/* validator = */ EndpointValidator(),
/* importance = */ ConfigDef.Importance.HIGH,
/* documentation = */ "GitLab API Root Endpoint Ex.
https://gitlab.example.com/api/v4/",
/* group = */ "Settings",
/* orderInGroup = */ -1,
/* width = */ ConfigDef.Width.MEDIUM,
/* displayName = */ "GitLab Endpoint",
/* recommender = */ EndpointRecommender()
)
Source Connector Configuration
● ConfigDef class is used to define the
configuration options the Kafka connector
accepts.
34
Gitlab: Building a
Source Connector
override fun config(): ConfigDef {}
class EndpointValidator : ConfigDef.Validator {
override fun ensureValid(name: String?, value: Any?) {
val url = value as String
val validator = UrlValidator()
if (!validator.isValid(url)) {
throw ConfigException("$url must be a valid URL, use
examples https://gitlab.example.com/api/v4/")
}
}
}
class EndpointRecommender : ConfigDef.Recommender {
override fun validValues(name: String, parsedConfig:
Map<String, Any>): List<String> {
return ListOf("https://gitlab.company.nl/api/v4/")
}
override fun visible(name: String?, parsedConfig:
Map<String, Any>?): Boolean {
return true
}
}
Source Connector Configuration
● Enhancing usability and reducing the
likelihood of configuration errors.
● Recommender: Is an instance of
ConfigDef.Recommender that can
suggest values for the configuration
option and make it easier for users to
configure options correctly.
● Validator: Is an instance of
ConfigDef.Validator that is used to
validate the configuration values which
can help catch configuration errors early,
before they cause problems at runtime.
35
Gitlab: Building a
Source Connector val mergedRequest: Schema = SchemaBuilder.struct()
.name("com.sami12rom.mergedRequest")
.version(1).doc("Merged Request Value Schema")
.field("id", SchemaBuilder.int64())
.field("project_id", SchemaBuilder.int64())
.field("title", SchemaBuilder.string()
.optional().defaultValue(null))
.field("description", SchemaBuilder.string()
.optional().defaultValue(null))
.build()
val struct = Struct(Schemas.mergedRequest)
struct.put("id", mergedRequest.id)
struct.put("project_id", mergedRequest.project_id)
struct.put("title", mergedRequest.title)
struct.put("description", mergedRequest.description)
Data Schemas: SchemaBuilder
● Schemas define the structure of the
data in Kafka Connect and specify the
type of each field, whether it's required
or optional, and other properties.
○ Data types e.g. struct, map, array
○ Helps ensure data consistency
● Structs is used to hold actual data and
ensure that the data conforms to the
schema.
○ Needed for SourceRecord or
SinkRecord.
36
Gitlab: Building a
Source Connector
class GitlabSourceTask : SourceTask() {
override fun start(props: Map<String, String>?) {
initializeSource()
}
override fun poll(): MutableList<SourceRecord> {
val records = mutableListOf<SourceRecord>()
sleepForInterval()
val response = ApiCalls.GitLabCall(props!!)
val record = generateSourceRecord(response as
MergedRequest)
records.add(record)
return records
}
override fun stop() {}
Source Task Class
● Poll: is called repeatedly to pull data
from external source into Kafka. It should
return a list of SourceRecord objects
or null if there's no data available.
37
Source Record - Part 1
● topic: Name of the topic to write to.
● partition: Partition where the record will be
written, can be null to let Kafka assign it.
● keySchema & key: The schema & key for
this record.
● valueSchema & value: The schema & value
for this record. Value is the actual data that
will be written to the Kafka topic.
● timestamp: The timestamp for this record
and can be null to let Kafka assign the
current time.
● headers: Headers for this record.
Gitlab: Building a
Source Connector val record = SourceRecord(
/* sourcePartition = */ Map (Connector),
/* sourceOffset = */ Map (Connector),
/* topic = */ String,
/* partition = */ Integer (Optional),
/* keySchema = */ Schema (Optional),
/* key = */ Object (Optional),
/* valueSchema = */ Schema (Optional),
/* value = */ Object (Optional),
/* timestamp = */ Long (Optional),
/* headers= generateHeaders() (Optional)
)
38
val record = SourceRecord(
/* sourcePartition = */ Map,
/* sourceOffset = */ Map,
...
)
Connector Restart Offset:
override fun start(props: Map<String, String>?) {
initializeSource()
}
fun initializeSource(): Map<String, Any>?{
return context.offsetStorageReader()
.offset(sourcePartition())
Gitlab: Building a
Source Connector
Source Record - Part 2
(Restartability)
● sourcePartition: It defines the partition
of the source system that this record
came from, e.g. a table name for a
database connector.
● sourceOffset: It defines the position in
the source partition that this record came
from, e.g. an ID of the row for a database
connector.
● offsetStorageReader: Retrieve the last
stored offset for a specific partition to
resume data ingestion where it last left
off.
39
Testing Strategies
Soak & Error Handling Tests
Run your connector for an extended period under
typical load to identify long-term issues. Write tests to
ensure your connector handles errors gracefully and
recovers from failures.
Unit Tests
Isolated tests for individual functions or methods using
a testing framework like JUnit and a mocking library like
mockito.
Integration Tests
Test the interaction between all components using
tools like Testcontainers to set up realistic testing
environments.
End to End & Performance Tests
Validate the entire flow from producing a record to the
source system to consuming it from Kafka. Measure the
throughput and latency of your connector under
different loads.
40
Deployment Strategies
CodeArtifact
AWS ECR
AWS ECS
CodeBuild
CodeDeploy
CodePipeline
Github
Gitlab
AWS ECR
AWS ECS
Gitlab
Managed
Infrastructure Connector
Plugin
41
Monitoring Strategies
Cloudwatch
CW Alarm
Grafana
Prometheus
42
Errors Handling
Retries
For transient errors, e.g.
temporary network issues
Custom Error Handling
In your SourceTask or SinkTask,
custom error handling logic can be
added e.g. catch exceptions,
log them, and decide whether to
fail the task or attempt to recover
and continue
Monitoring Metrics
Actively monitor and alert
on error message rates of the
connector e.g. Task Error
Metrics, Records
Produced/Consumed, Task
Status, Lag/Throughput
Metrics
Error Tolerance
errors.tolerance = none
● fail fast (default)
errors.tolerance = all
● silently ignore
errors.deadletterqueue.topic.name
● dead letter queues
Log Errors
Errors can be logged for
troubleshooting and can be
controlled by:
● errors.log.enable = true
● Errors.log.include.messages
Avoid excessive use of Error or
Warn levels in your logging
Dead Letter Queue
Automatically send error records
to a DLQ topic for later inspection
along with header.
PS: DLQ is currently only supported
for Sink Connectors and not for
Source Connectors
43
4. Resilience and Error
Handling
● Design your connector with restartability and
fault tolerance in mind.
● Implement error handling.
● Consider how the connector will handle
network failures, API rate limits, etc..
5. Testing, Deployment, and
Monitoring
● Test, Test & Test under different scenarios
● Set up Monitoring Mechanism
● Implement proper logging
● Track Performance (JMX)
2. Connector Development
● Add the required dependencies,
● Define the actions for the start and stop
methods,
● Determine the number of tasks based on
your parallelism requirements.
● implement the poll method, and decide on
the frequency of polling.
3. Data Management
● Develop a function to fetch data from your source system.
● Define the Schema and Struct.
● Define the contents of the source record.
● Choose the right Converter for your data format (operations)
● Consider the usage of Single Message Transforms (operations)
1. Planning and Design
● Understand Your Data Source
● Decide on the type (source or sink)
● Plan config inputs, defaults, validators, and
recommenders
● Consider the volume of data your connector
will need to handle (parallel processing)
Key Learnings
44
Q & A
Thank you for your attention
and participation
Please rate the session in the Kafka Summit App
Code
https://github.com/sami12rom/kafka-connect-gitlab

More Related Content

What's hot (20)

PPTX
BPMN and DMN for Processing Business Data with Camunda
André Borgonovo
 
PPT
LiquiBase
Mike Willbanks
 
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
PPTX
HBase Low Latency
DataWorks Summit
 
PPTX
Managing multiple event types in a single topic with Schema Registry | Bill B...
HostedbyConfluent
 
PDF
Release and patching strategy
Jitendra Singh
 
PDF
Oracle Drivers configuration for High Availability, is it a developer's job?
Ludovico Caldara
 
PDF
Expanding the capabilities of SAC with App Design
Visual_BI
 
PDF
Oracle RAC - New Generation
Anil Nair
 
PPTX
IBM Integration Bus and REST APIs - Sanjay Nagchowdhury
Karen Broughton-Mabbitt
 
ODP
Liquibase & Flyway @ Baltic DevOps
Andrei Solntsev
 
PDF
Technical Introduction to IBM Integration Bus
Geza Geleji
 
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
PDF
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
confluent
 
PDF
Realmの暗号化とAndroid System
Keiji Ariyama
 
PPTX
Apache spark 소개 및 실습
동현 강
 
PPTX
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
PDF
Moving OBIEE to Oracle Analytics Cloud
Edelweiss Kammermann
 
PDF
Desing principles: tensions and synergies v3.0
Ilias Bartolini
 
PPTX
Improving Kafka at-least-once performance at Uber
Ying Zheng
 
BPMN and DMN for Processing Business Data with Camunda
André Borgonovo
 
LiquiBase
Mike Willbanks
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
HBase Low Latency
DataWorks Summit
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
HostedbyConfluent
 
Release and patching strategy
Jitendra Singh
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Ludovico Caldara
 
Expanding the capabilities of SAC with App Design
Visual_BI
 
Oracle RAC - New Generation
Anil Nair
 
IBM Integration Bus and REST APIs - Sanjay Nagchowdhury
Karen Broughton-Mabbitt
 
Liquibase & Flyway @ Baltic DevOps
Andrei Solntsev
 
Technical Introduction to IBM Integration Bus
Geza Geleji
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
confluent
 
Realmの暗号化とAndroid System
Keiji Ariyama
 
Apache spark 소개 및 실습
동현 강
 
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Moving OBIEE to Oracle Analytics Cloud
Edelweiss Kammermann
 
Desing principles: tensions and synergies v3.0
Ilias Bartolini
 
Improving Kafka at-least-once performance at Uber
Ying Zheng
 

Similar to Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and Deployment (20)

PDF
Partner Development Guide for Kafka Connect
confluent
 
PDF
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
HostedbyConfluent
 
PPTX
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
PDF
Diving into the Deep End - Kafka Connect
confluent
 
PDF
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
PDF
How to Write Great Kafka Connectors
confluent
 
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
PDF
Dok Talks #119 - Cloud-Native Data Pipelines
DoKC
 
PDF
How to Build an Apache Kafka® Connector
confluent
 
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 
PPTX
Introduction to Kafka Connectors (Knolx).pptx
Knoldus Inc.
 
PDF
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
confluent
 
PPTX
Introduction to kafka connector
Knoldus Inc.
 
PPTX
Kafka infrastructure services
lambdaloopers
 
PPTX
Kafka connect 101
Whiteklay
 
PDF
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
confluent
 
PPTX
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
confluent
 
PDF
Overview of Kafka connect
Knoldus Inc.
 
PDF
Overview of Kafka connect
Knoldus Inc.
 
Partner Development Guide for Kafka Connect
confluent
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
HostedbyConfluent
 
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
Diving into the Deep End - Kafka Connect
confluent
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
How to Write Great Kafka Connectors
confluent
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
Dok Talks #119 - Cloud-Native Data Pipelines
DoKC
 
How to Build an Apache Kafka® Connector
confluent
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 
Introduction to Kafka Connectors (Knolx).pptx
Knoldus Inc.
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
confluent
 
Introduction to kafka connector
Knoldus Inc.
 
Kafka infrastructure services
lambdaloopers
 
Kafka connect 101
Whiteklay
 
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
confluent
 
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
confluent
 
Overview of Kafka connect
Knoldus Inc.
 
Overview of Kafka connect
Knoldus Inc.
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Python basic programing language for automation
DanialHabibi2
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
July Patch Tuesday
Ivanti
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 

Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and Deployment

  • 1. Building Kafka Connectors with Kotlin A Step-by-Step Guide to Creation and Deployment By Sami Alashabi and Ramzi Alashabi
  • 2. 2 Building Kafka Connectors with Kotlin A Step-by-Step Guide to Creation and Deployment Sami Alashabi, Solutions Architect, Accenture/Essent Ramzi Alashabi, Senior Data Engineer, ABN Amro
  • 3. 3 Sami Alashabi 12+ Year Journey in Data Various Roles and Segments Architecture, Big Data, Real-Time Low Latency Distributed Systems, AWS Love to solve problems Love spending time with family when I’m not coding/architecting Kafka Enthusiast https://www.linkedin.com/in/sami-alashabi/
  • 4. 4 Ramzi Alashabi 10+ Years Data Specialist Micro-services, ETLs, and Cloud Engineering Transform ideas to Production Love learning new Languages & hanging out with the fam. Yes, I'm a Dog Person https://www.linkedin.com/in/ramzialashabi/
  • 5. 5 Q&A Questions & Follow Up 01 02 03 Kafka Connect Overview, Architecture, Types & Concepts Kotlin Introduction, Background, Features & Advantages Implementation & Code Building a Source Connector, Test & Deployment Strategies Agenda 04 Key Learnings Summary & Takeaways 05
  • 21. 21 Connect: Standalone vs Distributed Standalone Ideal for large production Tasks are distributed across multiple worker nodes Configuration stored in Kafka, allows dynamic updates Fault tolerance, tasks are automatically redistributed It provides automatic scalability. more worker processes can be added to scale up (elastic) Distributed Ideal for development & testing Tasks executed in a single process Configuration in a properties file No fault tolerance, If the process fails, all tasks stop No automatic scalability, To scale up, you need to manually start more standalone processes.
  • 22. 22 curl --location 'http://kafkaConnect:8083/connectors' --header 'Content-Type: application/json' --header 'Authorization: Basic *******************' --data '{ "name": "GitlabSourceConnector-merge-requests", "config": { "name": "GitlabSourceConnector-merge-requests", "connector.class": "com.sami12rom.kafka.gitlab.GitlabSourceConnector", "gitlab.repositories": "kafka/confluent_kafka_connect_aws_terraform", "gitlab.service.url": "https://gitlab.compny.nl/api/v4/", "gitlab.resources": "merge_requests", "gitlab.since": "2023-12-10T20:12:59.300Z", "gitlab.access.token": "*****************", "max.poll.interval.ms": "40000", "topic.name.pattern": "gitlab-merge-requests", "tasks.max": 1, ... } }' Distributed Mode Offsets & Config & Status
  • 23. 23 Single Message Transform Single Message Transform: Is a way to modify the individual messages as it flows through the Kafka Connect pipeline e.g. ● ReplaceField: org.apache.kafka.connect.transforms.ReplaceField$Key ● MaskField: org.apache.kafka.connect.transforms.MaskField$Value ● InsertField: org.apache.kafka.connect.transforms.InsertField$Value "config": { ... "transforms":"flatten,createKey", "transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value", "transforms.flatten.delimiter": "_", "transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id,iid,project_id" }
  • 24. 24 Data formats can be chosen depending on the specific requirements of your application: ● ProtobufConverter: When you need to optimize for speed and size - io.confluent.connect.protobuf.ProtobufConverter ● JsonSchemaConverter: When you want a human-readable format and working with RESTful APIs - io.confluent.connect.json.JsonSchemaConverter ● AvroConverter: is easiest for schema evolution - io.confluent.connect.avro.AvroConverter ● JsonConverter: When you want a human-readable format and don't need a schema - org.apache.kafka.connect.json.JsonConverter Converters & Data Formats "config": { ... "key.converter":"io.confluent.connect.json.JsonSchemaConverter", "key.converter.schema.registry.url":"http://schema-registry:8081", "value.converter":"io.confluent.connect.json.JsonSchemaConverter", "value.converter.schema.registry.url":"http://schema-registry:8081" }
  • 26. 26 Introduction Kotlin is a modern, statically typed programming language that mainly targets the Java Virtual Machine (JVM) ● It was first introduced by JetBrains in 2011. ● In 2019, Google announced Kotlin as an official language for Android development. ● Growing Community of Developers.
  • 27. 27 Features & Advantages val message = "Hello, World!" // Type inference if (message is String) { // Smart cast println(message.length)} // Allows accessing String-specific funcs // Using default arguments fun greet(name: String = "John Doe", message: String = "Hello") { println("$message, $name!")} greet() // Safe Calls (?.): Execute only when the value is not null val name: String? = null val length: Int? = name?.length // Elvis Operator (?:): Use value if not null, otherwise use default val name: String? = null val length = name?.length ?: -1 // Not-null assertion (!!): Use when sure the value is not null val name: String? = null val length = name!!.length // Higher-order function that takes a function as a parameter fun calculate(x: Int, y: Int, operation: (Int, Int) -> Int): Int { return operation(x, y)} // Using lambda expression val result = calculate(5, 3) { a, b -> a + b } Concise Syntax Reduces boilerplate which allows writing clean, compact & more readable code e.g. ● Type inference ● Smart casts ● Default arguments Safe & Reliable Built-in null safety features, eliminating the infamous NullPointerException errors using ● safe calls (?.) ● the Elvis operator (?:) ● non-null assertion (!!) Interoperability It is fully compatible with Java, which means you can seamlessly use Kotlin code in Java projects and vice versa. Functional Programming support It embraces functional programming and offers features like higher-order & first-class functions, lambda expressions, functional utilities such as map, filter, and reduce.
  • 29. 29 Build.gradle.kts ● Plugins: e.g. Java library plugin, the Kotlin JVM plugin, the Git version plugin, and the Maven Publish plugin. ● Repositories: specifies where to fetch dependencies from. ● Dependencies: libraries the project depends on, including both implementation and test dependencies ● Tasks: Test, Build, Jar ● Publishing: publish to a Maven repository plugins { `java-library` kotlin("jvm") version "1.9.22" id("com.palantir.git-version") version "1.0.0" `maven-publish` } dependencies { implementation("org.apache.kafka:connect-api:3.4.0”) implementation("commons-validator:commons-validator:1.7") testImplementation("org.testcontainers:kafka:1.19.6") } Gitlab: Building a Source Connector
  • 30. 30 Source Connector Interface ● GitlabSourceConnector extends from SourceConnector. ● SourceConnector: part of the Kafka Connect framework to stream data from external data systems to Kafka. ● Version: Returns the version of the connector and is often used for logging and debugging purposes. Gitlab: Building a Source Connector class GitlabSourceConnector: SourceConnector() { override fun version(): String { return ConnectorVersionDetails::class.java.`package`.implementationVersion ?: "1.0.0" } override fun start(props: Map<String, String>) {} override fun config(): ConfigDef {} override fun taskClass(): Class<out Task> {} override fun taskConfigs(maxTasks: Int): List<Map<String, String>> {} override fun stop() {} }
  • 31. 31 Gitlab: Building a Source Connector class GitlabSourceConnector: SourceConnector() { override fun version(): String {} override fun start(props: Map<String, String>) { logger.info("Starting GitlabSourceConnector”) this.props = props } override fun config(): ConfigDef {} override fun taskClass (): Class< out Task> {} override fun taskConfigs (maxTasks: Int): List<Map<String , String>> {} override fun stop() { logger.info("Requested connector to stop at ${Instant.now()}") } Source Connector Lifecycle ● The start and stop methods are part of the lifecycle of a Source Connector in Kafka Connect. ● start(props) is called on initialization and allows the set up of any resources the connector needs to run. The props is a map of configuration settings. ● stop is called when the connector is being shut down and where it clean up any resources that were opened or started in the start method.
  • 32. 32 Gitlab: Building a Source Connector Source Connector Task Configuration ● taskConfigs method is used to divide the work of the connector into smaller, independent tasks that can be distributed across multiple workers in a Kafka Connect cluster, with benefits such as: ○ Parallelism ○ Scalability ○ Fault Isolation ○ Flexibility override fun taskConfigs(maxTasks: Int): List<Map<String, String>> { val taskConfigs = ListOf<Map<String, String>>() val repositories = props[REPOSITORIES].split(", ") val groups = repositories.size.coerceAtMost(maxTasks) val reposGrouped = ConnectorUtils.groupPartitions(repositories, groups) for (group in reposGrouped) { val taskProps = mutableMapOf<String, String>() taskProps.putAll(props?.toMap()!!) taskProps.replace(REPOSITORIES, group.joinToString(";")) taskConfigs.add(taskProps) } return taskConfigs } Output config: [ {gitlab.repositories=Repo#1;Repo#2}, {gitlab.repositories=Repo#3} ] Input config: {"gitlab.repositories": "Repo#1, Repo#2, Repo#3", "tasks.max": 2
  • 33. 33 Gitlab: Building a Source Connector override fun config(): ConfigDef {} const val GITLAB_ENDPOINT_CONFIG = "gitlab.service.url" val CONFIG: ConfigDef = ConfigDef() .define( /* name = */ GITLAB_ENDPOINT_CONFIG, /* type = */ ConfigDef.Type.STRING, /* defaultValue = */ "https://gitlab.company.nl/api/v4", /* validator = */ EndpointValidator(), /* importance = */ ConfigDef.Importance.HIGH, /* documentation = */ "GitLab API Root Endpoint Ex. https://gitlab.example.com/api/v4/", /* group = */ "Settings", /* orderInGroup = */ -1, /* width = */ ConfigDef.Width.MEDIUM, /* displayName = */ "GitLab Endpoint", /* recommender = */ EndpointRecommender() ) Source Connector Configuration ● ConfigDef class is used to define the configuration options the Kafka connector accepts.
  • 34. 34 Gitlab: Building a Source Connector override fun config(): ConfigDef {} class EndpointValidator : ConfigDef.Validator { override fun ensureValid(name: String?, value: Any?) { val url = value as String val validator = UrlValidator() if (!validator.isValid(url)) { throw ConfigException("$url must be a valid URL, use examples https://gitlab.example.com/api/v4/") } } } class EndpointRecommender : ConfigDef.Recommender { override fun validValues(name: String, parsedConfig: Map<String, Any>): List<String> { return ListOf("https://gitlab.company.nl/api/v4/") } override fun visible(name: String?, parsedConfig: Map<String, Any>?): Boolean { return true } } Source Connector Configuration ● Enhancing usability and reducing the likelihood of configuration errors. ● Recommender: Is an instance of ConfigDef.Recommender that can suggest values for the configuration option and make it easier for users to configure options correctly. ● Validator: Is an instance of ConfigDef.Validator that is used to validate the configuration values which can help catch configuration errors early, before they cause problems at runtime.
  • 35. 35 Gitlab: Building a Source Connector val mergedRequest: Schema = SchemaBuilder.struct() .name("com.sami12rom.mergedRequest") .version(1).doc("Merged Request Value Schema") .field("id", SchemaBuilder.int64()) .field("project_id", SchemaBuilder.int64()) .field("title", SchemaBuilder.string() .optional().defaultValue(null)) .field("description", SchemaBuilder.string() .optional().defaultValue(null)) .build() val struct = Struct(Schemas.mergedRequest) struct.put("id", mergedRequest.id) struct.put("project_id", mergedRequest.project_id) struct.put("title", mergedRequest.title) struct.put("description", mergedRequest.description) Data Schemas: SchemaBuilder ● Schemas define the structure of the data in Kafka Connect and specify the type of each field, whether it's required or optional, and other properties. ○ Data types e.g. struct, map, array ○ Helps ensure data consistency ● Structs is used to hold actual data and ensure that the data conforms to the schema. ○ Needed for SourceRecord or SinkRecord.
  • 36. 36 Gitlab: Building a Source Connector class GitlabSourceTask : SourceTask() { override fun start(props: Map<String, String>?) { initializeSource() } override fun poll(): MutableList<SourceRecord> { val records = mutableListOf<SourceRecord>() sleepForInterval() val response = ApiCalls.GitLabCall(props!!) val record = generateSourceRecord(response as MergedRequest) records.add(record) return records } override fun stop() {} Source Task Class ● Poll: is called repeatedly to pull data from external source into Kafka. It should return a list of SourceRecord objects or null if there's no data available.
  • 37. 37 Source Record - Part 1 ● topic: Name of the topic to write to. ● partition: Partition where the record will be written, can be null to let Kafka assign it. ● keySchema & key: The schema & key for this record. ● valueSchema & value: The schema & value for this record. Value is the actual data that will be written to the Kafka topic. ● timestamp: The timestamp for this record and can be null to let Kafka assign the current time. ● headers: Headers for this record. Gitlab: Building a Source Connector val record = SourceRecord( /* sourcePartition = */ Map (Connector), /* sourceOffset = */ Map (Connector), /* topic = */ String, /* partition = */ Integer (Optional), /* keySchema = */ Schema (Optional), /* key = */ Object (Optional), /* valueSchema = */ Schema (Optional), /* value = */ Object (Optional), /* timestamp = */ Long (Optional), /* headers= generateHeaders() (Optional) )
  • 38. 38 val record = SourceRecord( /* sourcePartition = */ Map, /* sourceOffset = */ Map, ... ) Connector Restart Offset: override fun start(props: Map<String, String>?) { initializeSource() } fun initializeSource(): Map<String, Any>?{ return context.offsetStorageReader() .offset(sourcePartition()) Gitlab: Building a Source Connector Source Record - Part 2 (Restartability) ● sourcePartition: It defines the partition of the source system that this record came from, e.g. a table name for a database connector. ● sourceOffset: It defines the position in the source partition that this record came from, e.g. an ID of the row for a database connector. ● offsetStorageReader: Retrieve the last stored offset for a specific partition to resume data ingestion where it last left off.
  • 39. 39 Testing Strategies Soak & Error Handling Tests Run your connector for an extended period under typical load to identify long-term issues. Write tests to ensure your connector handles errors gracefully and recovers from failures. Unit Tests Isolated tests for individual functions or methods using a testing framework like JUnit and a mocking library like mockito. Integration Tests Test the interaction between all components using tools like Testcontainers to set up realistic testing environments. End to End & Performance Tests Validate the entire flow from producing a record to the source system to consuming it from Kafka. Measure the throughput and latency of your connector under different loads.
  • 40. 40 Deployment Strategies CodeArtifact AWS ECR AWS ECS CodeBuild CodeDeploy CodePipeline Github Gitlab AWS ECR AWS ECS Gitlab Managed Infrastructure Connector Plugin
  • 42. 42 Errors Handling Retries For transient errors, e.g. temporary network issues Custom Error Handling In your SourceTask or SinkTask, custom error handling logic can be added e.g. catch exceptions, log them, and decide whether to fail the task or attempt to recover and continue Monitoring Metrics Actively monitor and alert on error message rates of the connector e.g. Task Error Metrics, Records Produced/Consumed, Task Status, Lag/Throughput Metrics Error Tolerance errors.tolerance = none ● fail fast (default) errors.tolerance = all ● silently ignore errors.deadletterqueue.topic.name ● dead letter queues Log Errors Errors can be logged for troubleshooting and can be controlled by: ● errors.log.enable = true ● Errors.log.include.messages Avoid excessive use of Error or Warn levels in your logging Dead Letter Queue Automatically send error records to a DLQ topic for later inspection along with header. PS: DLQ is currently only supported for Sink Connectors and not for Source Connectors
  • 43. 43 4. Resilience and Error Handling ● Design your connector with restartability and fault tolerance in mind. ● Implement error handling. ● Consider how the connector will handle network failures, API rate limits, etc.. 5. Testing, Deployment, and Monitoring ● Test, Test & Test under different scenarios ● Set up Monitoring Mechanism ● Implement proper logging ● Track Performance (JMX) 2. Connector Development ● Add the required dependencies, ● Define the actions for the start and stop methods, ● Determine the number of tasks based on your parallelism requirements. ● implement the poll method, and decide on the frequency of polling. 3. Data Management ● Develop a function to fetch data from your source system. ● Define the Schema and Struct. ● Define the contents of the source record. ● Choose the right Converter for your data format (operations) ● Consider the usage of Single Message Transforms (operations) 1. Planning and Design ● Understand Your Data Source ● Decide on the type (source or sink) ● Plan config inputs, defaults, validators, and recommenders ● Consider the volume of data your connector will need to handle (parallel processing) Key Learnings
  • 44. 44 Q & A Thank you for your attention and participation Please rate the session in the Kafka Summit App Code https://github.com/sami12rom/kafka-connect-gitlab