SlideShare a Scribd company logo
Cloud Native


Data Pipelines
MARCH 3, 2022, DATA ON KUBERNETES
‣ Why Cloud Native Data Pipelines


Bringing cloud-native to ETL


‣ Excursion: Kafka and Kafka Connect


How Kafka Connect works and
considerations for deploying on K8


‣ Quarkus and Jib


Java snippets of a pipeline, build with Jib
container builder, and deploy via API
2
Contents
Hakan Lofcali


CTO


DataCater GmbH
▸ Software Industry has evolved from Dev & Ops -> DevOps -> GitOps


▸ ETL space has not caught up


▸ Runtimes of ETL tooling diverge significantly from dev -> test -> prod


▸ Scalability has to be taken care of next to business logic


▸ Divergence of infrastructure description and computations to be executed
3
PROBLEM DESCRIPTION
ETL needs evolving, too
4
CLOUD NATIVE PRINCIPLES
Auto Scale Image Immutability Declarative Description
▸ Start with streaming & event-sourcing for continuity, predictable resource
consumption, and ease of horizontally scaling workers


▸ Reduce state in pipeline pod by externalising state to Apache Kafka and event-
sourcing from various systems with Kafka Connect


▸ Declare computations [filters, transformations] and build an image containing all
needed computations
5
CLOUD-NATIVE DATA PIPELINES
Apply Cloud-Native Principles to ETL
6
TARGET ARCHITECTURE
‣ Multiple frontends for defining Data
Pipelines


YAML, API, and UI need to produce the
same pipeline


‣ DataCater allows no-code and Python
transformations


Filters and Transformations are
packaged into containers


‣ Pipeline -> Kafka Streams app


All the goodness of cloud-native in
Java
7
PIPELINE PRACTICAL EXAMPLE
8
Excursion: Strimzi & Kafka
▸ Kafka is the de-facto industry standard for messaging. Kafka’s API has been adopted by
many other technologies in the realm i.e. Google PubSub, Redpanda


▸ Kafka brokers distribute messages to consumers and expect acknowledgements of
retrieval.


▸ Messages are stored in topics as append only logs, these are partitioned.


▸ Kafka Connect can be thought of as a translation layer between Kafka and other systems.


▸ Framework for creating messages from / for events of external systems such as
databases, cloud events, data warehouses etc.
9
KAFKA AND KAFKA CONNECT
Short intro to Kafka and Kafka Connect
10
STRIMZI DEPLOYS KAFKA
* Violates Self-containment principle


** Hopefully obsolete soon, cluster coordination within Kubernetes :D
11
SO, WHAT’S A PIPELINE?
▸ Source / Sink Connectors are deployed into the same cluster


▸ There is no resource descriptor for a single process


▸ Kafka Connect Pods will probably run more than a single connector


▸ Combined with point two, this can get incredibly painful as a rogue connector will
impact all connectors in that pod
12
KAFKA CONNECT - THE UGLY BITS
Kafka Connect and Self Containment
13
KAFKA CONNECT - SOLUTION
Kafka Connect: Self Containment and Scaling
* Connectors also connect to Kafka Cluster; lines not introduced for visuals
14
Let’s go down the deep end
15
YAML IN PIPELINE OUT
16
CREATING A PIPELINE - JAVA APPLICATION
17
CREATING A PIPELINE - JAVA APPLICATION
18
ACKNOWLEDGEMENTS
public Multi<String> basicPipe(double number)
▸ In messaging / streaming we need to acknowledge as consumers


▸ Quarkus / Smallrye's Multi class handles this automatically for us


▸ If we do not want anything to be written, we return an empty Multi and still acknowledge
the incoming message as received


▸ In Kafka Terms, these are offsets and by acknowledging / committing an offset, we avoid
duplication of data in sinks [in smallrye: use throttled strategy for exactly-once delivery]
▸ Quarkus utilizes for e.g. small rye for streaming, comes automatically with metrics
endpoint for each pipeline.


▸ In dev mode Quarkus install Vectorized/Redpanda (shout out to redpanda.com), so
need to have a Kafka cluster running.


▸ Dev tools are impeccable, from method profiling to test coverage, all in one interface.
19
PERKS OF QUARKUS
Quarkus and libraries pack a bunch
20
CREATING A PIPELINE - JIB CONTAINER
▸ Utilise caching, and initial build might make sense to have the base image ready on
new start up and running application on a new node.


▸ Jib container builder implements default credential retriever for registry credentials. It
takes dockerconfig files, basic auth, and OAuth2.


▸ Kubernetes secrets not included as credential retriever only via mounts -> key
rotation could be problematic here


▸ Detailed log messages are really helpful in debugging
21
JIB CONTAINER BUILDER LEARNINGS
Jib considerations and perks
▸ Making ETL more cloud-native has still open issues


▸ Stronger self-containment needed


▸ New and evolving tools [most < v1.0.0], rough edges encountered


▸ No specification of declarative data pipeline description, we at DataCater try to make first steps here


▸ We can already reap the benefits of it, thanks to


▸ Strong messaging technology such as Apache Kafka


▸ Great dev ecosystem around Java Quarkus


▸ Strimzi providing means to operate Apacha Kafka & co. easily
22
SUMMARY
Way to go, but ecosystem is getting better
23
THANK YOU DOK
Big Thanks for tuning in and


big thanks to the teams behind …

More Related Content

PDF
How to Integrate Kubernetes in OpenStack
Meng-Ze Lee
 
PDF
AWS Lambda and serverless Java | DevNation Live
Red Hat Developers
 
PPTX
[20200720]cloud native develoment - Nelson Lin
HanLing Shen
 
PDF
Docker for HPC in a Nutshell
inside-BigData.com
 
PDF
HPC in a Box - Docker Workshop at ISC 2015
inside-BigData.com
 
PDF
Testing kubernetes and_open_shift_at_scale_20170209
mffiedler
 
PDF
Going deep (learning) with tensor flow and quarkus
Red Hat Developers
 
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
How to Integrate Kubernetes in OpenStack
Meng-Ze Lee
 
AWS Lambda and serverless Java | DevNation Live
Red Hat Developers
 
[20200720]cloud native develoment - Nelson Lin
HanLing Shen
 
Docker for HPC in a Nutshell
inside-BigData.com
 
HPC in a Box - Docker Workshop at ISC 2015
inside-BigData.com
 
Testing kubernetes and_open_shift_at_scale_20170209
mffiedler
 
Going deep (learning) with tensor flow and quarkus
Red Hat Developers
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 

What's hot (19)

PDF
Cloud Native User Group: Shift-Left Testing IaC With PaC
smalltown
 
PDF
DevConf 2017 - Realistic Container Platform Simulations
Jeremy Eder
 
PDF
Effective Building your Platform with Kubernetes == Keep it Simple
Wojciech Barczyński
 
PDF
From Code to Kubernetes
Daniel Oliveira Filho
 
PDF
Kubernetes on the Edge / 在邊緣的K8S
Yi-Fu Ciou
 
PDF
Serverless, Tekton, and Argo CD: How to craft modern CI/CD workflows | DevNat...
Red Hat Developers
 
PPTX
KubeCon EU 2016: Multi-Tenant Kubernetes
KubeAcademy
 
PDF
Red Hat Forum Benelux 2015
Microsoft
 
PDF
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Wojciech Barczyński
 
PDF
Kubernetes Day 2017 - Build, Ship and Run Your APP, Production !!
smalltown
 
PDF
Building streaming applications using a managed Kafka service | DevNation Tec...
Red Hat Developers
 
PDF
Deploy Prometheus - Grafana and EFK stack on Kubic k8s Clusters
Syah Dwi Prihatmoko
 
PDF
KubeCon EU 2016: Heroku to Kubernetes
KubeAcademy
 
PDF
"On-premises" FaaS on Kubernetes
Alex Casalboni
 
PDF
Running and Managing Kubernetes on OpenStack
Victor Palma
 
PDF
Beyond Ingresses - Better Traffic Management in Kubernetes
Mark McBride
 
PDF
Cloud Native User Group: Prometheus Day 2
smalltown
 
PDF
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
NETWAYS
 
PDF
Serverless Workflow: New approach to Kubernetes service orchestration | DevNa...
Red Hat Developers
 
Cloud Native User Group: Shift-Left Testing IaC With PaC
smalltown
 
DevConf 2017 - Realistic Container Platform Simulations
Jeremy Eder
 
Effective Building your Platform with Kubernetes == Keep it Simple
Wojciech Barczyński
 
From Code to Kubernetes
Daniel Oliveira Filho
 
Kubernetes on the Edge / 在邊緣的K8S
Yi-Fu Ciou
 
Serverless, Tekton, and Argo CD: How to craft modern CI/CD workflows | DevNat...
Red Hat Developers
 
KubeCon EU 2016: Multi-Tenant Kubernetes
KubeAcademy
 
Red Hat Forum Benelux 2015
Microsoft
 
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Wojciech Barczyński
 
Kubernetes Day 2017 - Build, Ship and Run Your APP, Production !!
smalltown
 
Building streaming applications using a managed Kafka service | DevNation Tec...
Red Hat Developers
 
Deploy Prometheus - Grafana and EFK stack on Kubic k8s Clusters
Syah Dwi Prihatmoko
 
KubeCon EU 2016: Heroku to Kubernetes
KubeAcademy
 
"On-premises" FaaS on Kubernetes
Alex Casalboni
 
Running and Managing Kubernetes on OpenStack
Victor Palma
 
Beyond Ingresses - Better Traffic Management in Kubernetes
Mark McBride
 
Cloud Native User Group: Prometheus Day 2
smalltown
 
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
NETWAYS
 
Serverless Workflow: New approach to Kubernetes service orchestration | DevNa...
Red Hat Developers
 
Ad

Similar to Dok Talks #119 - Cloud-Native Data Pipelines (20)

PDF
OSO Confluent GitOps Demo
Sion Smith
 
PPTX
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
HostedbyConfluent
 
PDF
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
PDF
Changing landscapes in data integration - Kafka Connect for near real-time da...
HostedbyConfluent
 
PDF
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
 
PPTX
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
PDF
Data pipeline with kafka
Mole Wong
 
PDF
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
PDF
Partner Development Guide for Kafka Connect
confluent
 
PDF
Streaming Processing with a Distributed Commit Log
Joe Stein
 
PDF
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
PDF
Leverage Kafka to build a stream processing platform
confluent
 
PDF
How to Write Great Kafka Connectors
confluent
 
PDF
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent
 
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
PDF
Kafka summit apac session
Christina Lin
 
PDF
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
OSO Confluent GitOps Demo
Sion Smith
 
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
HostedbyConfluent
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
HostedbyConfluent
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
 
Riding the Streaming Wave DIY style
Konstantine Karantasis
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
Data pipeline with kafka
Mole Wong
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
Partner Development Guide for Kafka Connect
confluent
 
Streaming Processing with a Distributed Commit Log
Joe Stein
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
Leverage Kafka to build a stream processing platform
confluent
 
How to Write Great Kafka Connectors
confluent
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
Kafka summit apac session
Christina Lin
 
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Ad

More from DoKC (20)

PDF
Distributed Vector Databases - What, Why, and How
DoKC
 
PDF
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
DoKC
 
PDF
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
DoKC
 
PDF
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
DoKC
 
PDF
The State of Stateful on Kubernetes
DoKC
 
PDF
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
DoKC
 
PDF
Make Your Kafka Cluster Production-Ready
DoKC
 
PDF
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
DoKC
 
PDF
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
DoKC
 
PDF
The Kubernetes Native Database
DoKC
 
PDF
ING Data Services hosted on ICHP DoK Amsterdam 2023
DoKC
 
PDF
Implementing data and databases on K8s within the Dutch government
DoKC
 
PDF
StatefulSets in K8s - DoK Talks #154
DoKC
 
PDF
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
DoKC
 
PDF
Analytics with Apache Superset and ClickHouse - DoK Talks #151
DoKC
 
PPTX
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
DoKC
 
PDF
Evaluating Cloud Native Storage Vendors - DoK Talks #147
DoKC
 
PDF
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
DoKC
 
PDF
We will Dok You! - The journey to adopt stateful workloads on k8s
DoKC
 
PPTX
Mastering MongoDB on Kubernetes, the power of operators
DoKC
 
Distributed Vector Databases - What, Why, and How
DoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
DoKC
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
DoKC
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
DoKC
 
The State of Stateful on Kubernetes
DoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
DoKC
 
Make Your Kafka Cluster Production-Ready
DoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
DoKC
 
The Kubernetes Native Database
DoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
DoKC
 
Implementing data and databases on K8s within the Dutch government
DoKC
 
StatefulSets in K8s - DoK Talks #154
DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
DoKC
 
Mastering MongoDB on Kubernetes, the power of operators
DoKC
 

Recently uploaded (20)

DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
Exploring AI Agents in Process Industries
amoreira6
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Exploring AI Agents in Process Industries
amoreira6
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 

Dok Talks #119 - Cloud-Native Data Pipelines

  • 1. Cloud Native Data Pipelines MARCH 3, 2022, DATA ON KUBERNETES
  • 2. ‣ Why Cloud Native Data Pipelines 
 Bringing cloud-native to ETL ‣ Excursion: Kafka and Kafka Connect 
 How Kafka Connect works and considerations for deploying on K8 ‣ Quarkus and Jib 
 Java snippets of a pipeline, build with Jib container builder, and deploy via API 2 Contents Hakan Lofcali CTO DataCater GmbH
  • 3. ▸ Software Industry has evolved from Dev & Ops -> DevOps -> GitOps ▸ ETL space has not caught up ▸ Runtimes of ETL tooling diverge significantly from dev -> test -> prod ▸ Scalability has to be taken care of next to business logic ▸ Divergence of infrastructure description and computations to be executed 3 PROBLEM DESCRIPTION ETL needs evolving, too
  • 4. 4 CLOUD NATIVE PRINCIPLES Auto Scale Image Immutability Declarative Description
  • 5. ▸ Start with streaming & event-sourcing for continuity, predictable resource consumption, and ease of horizontally scaling workers ▸ Reduce state in pipeline pod by externalising state to Apache Kafka and event- sourcing from various systems with Kafka Connect ▸ Declare computations [filters, transformations] and build an image containing all needed computations 5 CLOUD-NATIVE DATA PIPELINES Apply Cloud-Native Principles to ETL
  • 6. 6 TARGET ARCHITECTURE ‣ Multiple frontends for defining Data Pipelines 
 YAML, API, and UI need to produce the same pipeline ‣ DataCater allows no-code and Python transformations 
 Filters and Transformations are packaged into containers ‣ Pipeline -> Kafka Streams app 
 All the goodness of cloud-native in Java
  • 9. ▸ Kafka is the de-facto industry standard for messaging. Kafka’s API has been adopted by many other technologies in the realm i.e. Google PubSub, Redpanda ▸ Kafka brokers distribute messages to consumers and expect acknowledgements of retrieval. ▸ Messages are stored in topics as append only logs, these are partitioned. ▸ Kafka Connect can be thought of as a translation layer between Kafka and other systems. ▸ Framework for creating messages from / for events of external systems such as databases, cloud events, data warehouses etc. 9 KAFKA AND KAFKA CONNECT Short intro to Kafka and Kafka Connect
  • 10. 10 STRIMZI DEPLOYS KAFKA * Violates Self-containment principle 
 ** Hopefully obsolete soon, cluster coordination within Kubernetes :D
  • 11. 11 SO, WHAT’S A PIPELINE?
  • 12. ▸ Source / Sink Connectors are deployed into the same cluster ▸ There is no resource descriptor for a single process ▸ Kafka Connect Pods will probably run more than a single connector ▸ Combined with point two, this can get incredibly painful as a rogue connector will impact all connectors in that pod 12 KAFKA CONNECT - THE UGLY BITS Kafka Connect and Self Containment
  • 13. 13 KAFKA CONNECT - SOLUTION Kafka Connect: Self Containment and Scaling * Connectors also connect to Kafka Cluster; lines not introduced for visuals
  • 14. 14 Let’s go down the deep end
  • 16. 16 CREATING A PIPELINE - JAVA APPLICATION
  • 17. 17 CREATING A PIPELINE - JAVA APPLICATION
  • 18. 18 ACKNOWLEDGEMENTS public Multi<String> basicPipe(double number) ▸ In messaging / streaming we need to acknowledge as consumers ▸ Quarkus / Smallrye's Multi class handles this automatically for us ▸ If we do not want anything to be written, we return an empty Multi and still acknowledge the incoming message as received ▸ In Kafka Terms, these are offsets and by acknowledging / committing an offset, we avoid duplication of data in sinks [in smallrye: use throttled strategy for exactly-once delivery]
  • 19. ▸ Quarkus utilizes for e.g. small rye for streaming, comes automatically with metrics endpoint for each pipeline. ▸ In dev mode Quarkus install Vectorized/Redpanda (shout out to redpanda.com), so need to have a Kafka cluster running. ▸ Dev tools are impeccable, from method profiling to test coverage, all in one interface. 19 PERKS OF QUARKUS Quarkus and libraries pack a bunch
  • 20. 20 CREATING A PIPELINE - JIB CONTAINER
  • 21. ▸ Utilise caching, and initial build might make sense to have the base image ready on new start up and running application on a new node. ▸ Jib container builder implements default credential retriever for registry credentials. It takes dockerconfig files, basic auth, and OAuth2. ▸ Kubernetes secrets not included as credential retriever only via mounts -> key rotation could be problematic here ▸ Detailed log messages are really helpful in debugging 21 JIB CONTAINER BUILDER LEARNINGS Jib considerations and perks
  • 22. ▸ Making ETL more cloud-native has still open issues ▸ Stronger self-containment needed ▸ New and evolving tools [most < v1.0.0], rough edges encountered ▸ No specification of declarative data pipeline description, we at DataCater try to make first steps here ▸ We can already reap the benefits of it, thanks to ▸ Strong messaging technology such as Apache Kafka ▸ Great dev ecosystem around Java Quarkus ▸ Strimzi providing means to operate Apacha Kafka & co. easily 22 SUMMARY Way to go, but ecosystem is getting better
  • 23. 23 THANK YOU DOK Big Thanks for tuning in and big thanks to the teams behind …