SlideShare a Scribd company logo
www.bisptrainings.com
Kafka Connectivity with AWS S3 Storage
Sno Date Modification Author Verified By
1 2019/08/06 Initial Document Nishtha Sumit Goyal
www.bisptrainings.com
Table of Contents
Introduction.......................................................................................................................................... 3
Objective............................................................................................................................................ 3
Solutions:........................................................................................................................................... 3
Step1 Get Kafka .................................................................................................................................... 3
Step 2: Create a S3 bucket:................................................................................................................... 5
Step 3: Connect with S3 bucket............................................................................................................ 5
Step 4: Go into docker bash ................................................................................................................. 6
Step 5: Creating Kafka topic ................................................................................................................. 6
Step 6: Start Kafka console producer ................................................................................................... 6
Step 7: Start Kafka Console Consumer ................................................................................................. 7
Step 8: Type in something in the producer console............................................................................. 7
www.bisptrainings.com
Introduction
Kafka is a distributed streaming platform that is used publish and subscribe to streams of records. Kafka
gets used for fault tolerant storage. Kafka replicates topic log partitions to multiple servers. Kafka is
designed to allow your apps to process records as they occur. Kafka is fast, uses IO efficiently by batching,
compressing records. Kafka gets used for decoupling data streams. Kafka is used to stream data into data
lakes, applications and real-time stream analytic systems.
An Amazon S3 bucket is a public cloud storage resource available in Amazon Web Services' (AWS)
Simple Storage Service (S3), an object storage offering. Amazon S3 buckets, which are similar to file
folders, store objects, which consist of data and its descriptive metadata.
Objective
The main objective of this project is that we input some text in Kafka producer console and it will
automatically appear in our s3 bucket as a JSON file
Solutions:
Note: In this document we explain step by step Implementation between kafka and s3 storage (To store
message).
Step1 Get Kafka
Run the following command
docker run -p 2181:2181 
-p 3030:3030 
-p 8081-8083:8081-8083 
-p 9581-9585:9581-9585 
-p 9092:9092 
-e AWS_ACCESS_KEY_ID=your_aws_access_key_without_quotes 
-e AWS_SECRET_ACCESS_KEY=your_aws_secret_key_without_quotes 
-e ADV_HOST=127.0.0.1 
landoop/fast-data-dev:latest-
www.bisptrainings.com
In addition to that, you should see it loads quite a lot of services including Zookeeeper, broker, schema registry,
rest proxy, connect. That is where we are opening various ports.
After about two minutes you should be able to go to http://127.0.0.1:3030 to see the Kafka connect UI.
www.bisptrainings.com
Step 2: Create a S3 bucket:
The next step is to connect to the S3 bucket since we will be uploading our files to s3 bucket. Login to your aws
account and create your bucket.
Step 3: Connect with S3 bucket
From the User interface, click enter at Kafka connect UI . Once you are there, click New connector.
After you click new connector, you will see a lot of connector that you can connect to. Since we want to connect to
S3, click the Amazon S3 icon. And you can see that you are presented with some settings with lots of errors.
In order to get rid of the error, we need to change the following settings. From the following list, you need to change
your s3.region as your bucket may not be in Sydney and your s3.bucket.name to the bucket you have created.
name=S3SinkConnector
connector.class=io.confluent.connect.s3.S3SinkConnector
s3.region=ap-southeast-2
format.class=io.confluent.connect.s3.format.json.JsonFormat
topics.dir=topics
flush.size=1
topics=name of your topic
tasks.max=1
value.converter=org.apache.kafka.connect.storage.StringConverter
storage.class=io.confluent.connect.s3.storage.S3Storage
key.converter=org.apache.kafka.connect.storage.StringConverter
s3.bucket.name=your s3 bucket name
www.bisptrainings.com
Once you fill up all the details and click create, you should see it similar to what I have.
Step 4: Go into docker bash
In order to open the bash inside the docker, we will need the docker id. To get the id, we can do a docker ps
docker ps
This will give you a list of docker container that is running.
If you notice from the image, I have only one container running with the ID ‘66bf4d3ffa46 ’. To open the bash inside
there, we can now type in the following command.
docker exec -it 66bf4d3ffa46 /bin/bash
This will get the us in the bash and we can now create our topic, producer and consumer
root@fast-data-dev / $
Step 5: Creating Kafka topic
The Kafka cluster stores streams of records in categories called topics.
So if we don’t have a topic, we can’t stream our records, or for our case type a message and send it through.
Step 6: Start Kafka console producer
To create messages, we will need to start our Kafka producer console. To create Kafka console producer, we will use
the following command. Once you press enter, you should see a > appear on the screen expecting you to type
something.
www.bisptrainings.com
Step 7: Start Kafka Console Consumer
To check if the message that you are typing is actually going through, lets open a consumer console. To do that, we
will open a new terminal window and go in the same docker container by doing the following
Step 8: Type in something in the producer console
If we now type anything in the kafka-console-producer, it will appear in the console consumer and create a json file
in S3. Download it the file and you will see what you typed!

More Related Content

Similar to Apache Kafka with AWS s3 storage (20)

PDF
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
PDF
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
DOCX
Apache kafka configuration-guide
Chetan Khatri
 
PDF
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
PDF
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent
 
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
Jean-Paul Azar
 
PPTX
Kafka Tutorial, Kafka ecosystem with clustering examples
Jean-Paul Azar
 
PDF
From Zero to Hero with Kafka Connect
Databricks
 
PDF
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
confluent
 
PPTX
Brief introduction to Kafka Streaming Platform
Jean-Paul Azar
 
PPTX
Kafka Tutorial: Streaming Data Architecture
Jean-Paul Azar
 
PPTX
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
PPTX
Kafka Intro With Simple Java Producer Consumers
Jean-Paul Azar
 
PDF
Apache Kafka Scalable Message Processing and more!
Guido Schmutz
 
PDF
Kafka summit apac session
Christina Lin
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Apache kafka configuration-guide
Chetan Khatri
 
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent
 
Kafka Tutorial - introduction to the Kafka streaming platform
Jean-Paul Azar
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Jean-Paul Azar
 
From Zero to Hero with Kafka Connect
Databricks
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
confluent
 
Brief introduction to Kafka Streaming Platform
Jean-Paul Azar
 
Kafka Tutorial: Streaming Data Architecture
Jean-Paul Azar
 
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
Kafka Intro With Simple Java Producer Consumers
Jean-Paul Azar
 
Apache Kafka Scalable Message Processing and more!
Guido Schmutz
 
Kafka summit apac session
Christina Lin
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
From Zero to Hero with Kafka Connect
confluent
 

More from Rati Sharma (12)

PDF
Oracle FCCS Getting Started Guide II
Rati Sharma
 
PDF
Getting Started with Visualforce
Rati Sharma
 
PDF
Oracle fccs creating new application
Rati Sharma
 
PDF
Oracle PBCS Calculating Depreciation
Rati Sharma
 
PDF
Display contact using aura and Ligtning components (apex and visualforce) con...
Rati Sharma
 
PDF
Flask Pykafka integration and Setup
Rati Sharma
 
PDF
Oracle PBCS Sales driver for retail Business
Rati Sharma
 
PDF
Getting started with oracle eprcs
Rati Sharma
 
PDF
Sales quota planning creating application
Rati Sharma
 
PDF
Oracle Enterprise PBCS Driver Based Planning and Budgeting
Rati Sharma
 
PDF
Oracle strategic workforce planning cloud (hcmswp)
Rati Sharma
 
PDF
Python nltk synonyms and antonyms
Rati Sharma
 
Oracle FCCS Getting Started Guide II
Rati Sharma
 
Getting Started with Visualforce
Rati Sharma
 
Oracle fccs creating new application
Rati Sharma
 
Oracle PBCS Calculating Depreciation
Rati Sharma
 
Display contact using aura and Ligtning components (apex and visualforce) con...
Rati Sharma
 
Flask Pykafka integration and Setup
Rati Sharma
 
Oracle PBCS Sales driver for retail Business
Rati Sharma
 
Getting started with oracle eprcs
Rati Sharma
 
Sales quota planning creating application
Rati Sharma
 
Oracle Enterprise PBCS Driver Based Planning and Budgeting
Rati Sharma
 
Oracle strategic workforce planning cloud (hcmswp)
Rati Sharma
 
Python nltk synonyms and antonyms
Rati Sharma
 
Ad

Recently uploaded (20)

PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPTX
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
PPSX
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
PPTX
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
PPTX
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
PPTX
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PDF
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPT
digestive system for Pharm d I year HAP
rekhapositivity
 
PPTX
How to Configure Access Rights of Manufacturing Orders in Odoo 18 Manufacturing
Celine George
 
PPSX
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
PPTX
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
PPTX
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
digestive system for Pharm d I year HAP
rekhapositivity
 
How to Configure Access Rights of Manufacturing Orders in Odoo 18 Manufacturing
Celine George
 
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
community health nursing question paper 2.pdf
Prince kumar
 
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
Ad

Apache Kafka with AWS s3 storage

  • 1. www.bisptrainings.com Kafka Connectivity with AWS S3 Storage Sno Date Modification Author Verified By 1 2019/08/06 Initial Document Nishtha Sumit Goyal
  • 2. www.bisptrainings.com Table of Contents Introduction.......................................................................................................................................... 3 Objective............................................................................................................................................ 3 Solutions:........................................................................................................................................... 3 Step1 Get Kafka .................................................................................................................................... 3 Step 2: Create a S3 bucket:................................................................................................................... 5 Step 3: Connect with S3 bucket............................................................................................................ 5 Step 4: Go into docker bash ................................................................................................................. 6 Step 5: Creating Kafka topic ................................................................................................................. 6 Step 6: Start Kafka console producer ................................................................................................... 6 Step 7: Start Kafka Console Consumer ................................................................................................. 7 Step 8: Type in something in the producer console............................................................................. 7
  • 3. www.bisptrainings.com Introduction Kafka is a distributed streaming platform that is used publish and subscribe to streams of records. Kafka gets used for fault tolerant storage. Kafka replicates topic log partitions to multiple servers. Kafka is designed to allow your apps to process records as they occur. Kafka is fast, uses IO efficiently by batching, compressing records. Kafka gets used for decoupling data streams. Kafka is used to stream data into data lakes, applications and real-time stream analytic systems. An Amazon S3 bucket is a public cloud storage resource available in Amazon Web Services' (AWS) Simple Storage Service (S3), an object storage offering. Amazon S3 buckets, which are similar to file folders, store objects, which consist of data and its descriptive metadata. Objective The main objective of this project is that we input some text in Kafka producer console and it will automatically appear in our s3 bucket as a JSON file Solutions: Note: In this document we explain step by step Implementation between kafka and s3 storage (To store message). Step1 Get Kafka Run the following command docker run -p 2181:2181 -p 3030:3030 -p 8081-8083:8081-8083 -p 9581-9585:9581-9585 -p 9092:9092 -e AWS_ACCESS_KEY_ID=your_aws_access_key_without_quotes -e AWS_SECRET_ACCESS_KEY=your_aws_secret_key_without_quotes -e ADV_HOST=127.0.0.1 landoop/fast-data-dev:latest-
  • 4. www.bisptrainings.com In addition to that, you should see it loads quite a lot of services including Zookeeeper, broker, schema registry, rest proxy, connect. That is where we are opening various ports. After about two minutes you should be able to go to http://127.0.0.1:3030 to see the Kafka connect UI.
  • 5. www.bisptrainings.com Step 2: Create a S3 bucket: The next step is to connect to the S3 bucket since we will be uploading our files to s3 bucket. Login to your aws account and create your bucket. Step 3: Connect with S3 bucket From the User interface, click enter at Kafka connect UI . Once you are there, click New connector. After you click new connector, you will see a lot of connector that you can connect to. Since we want to connect to S3, click the Amazon S3 icon. And you can see that you are presented with some settings with lots of errors. In order to get rid of the error, we need to change the following settings. From the following list, you need to change your s3.region as your bucket may not be in Sydney and your s3.bucket.name to the bucket you have created. name=S3SinkConnector connector.class=io.confluent.connect.s3.S3SinkConnector s3.region=ap-southeast-2 format.class=io.confluent.connect.s3.format.json.JsonFormat topics.dir=topics flush.size=1 topics=name of your topic tasks.max=1 value.converter=org.apache.kafka.connect.storage.StringConverter storage.class=io.confluent.connect.s3.storage.S3Storage key.converter=org.apache.kafka.connect.storage.StringConverter s3.bucket.name=your s3 bucket name
  • 6. www.bisptrainings.com Once you fill up all the details and click create, you should see it similar to what I have. Step 4: Go into docker bash In order to open the bash inside the docker, we will need the docker id. To get the id, we can do a docker ps docker ps This will give you a list of docker container that is running. If you notice from the image, I have only one container running with the ID ‘66bf4d3ffa46 ’. To open the bash inside there, we can now type in the following command. docker exec -it 66bf4d3ffa46 /bin/bash This will get the us in the bash and we can now create our topic, producer and consumer root@fast-data-dev / $ Step 5: Creating Kafka topic The Kafka cluster stores streams of records in categories called topics. So if we don’t have a topic, we can’t stream our records, or for our case type a message and send it through. Step 6: Start Kafka console producer To create messages, we will need to start our Kafka producer console. To create Kafka console producer, we will use the following command. Once you press enter, you should see a > appear on the screen expecting you to type something.
  • 7. www.bisptrainings.com Step 7: Start Kafka Console Consumer To check if the message that you are typing is actually going through, lets open a consumer console. To do that, we will open a new terminal window and go in the same docker container by doing the following Step 8: Type in something in the producer console If we now type anything in the kafka-console-producer, it will appear in the console consumer and create a json file in S3. Download it the file and you will see what you typed!