10
Most read
15
Most read
19
Most read
London PostgreSQL Meetup
January 24th 2019
Migrating with Debezium
Mike Fowler - Senior Site Reliability Engineer, Public Cloud Practice
PLACE CUSTOMER LOGO HERE
London PostgreSQL Meetup
January 24th 2019
About Me
London PostgreSQL Meetup
January 24th 2019
Assess
Refactor
Migrate
London PostgreSQL Meetup
January 24th 2019
How do we get from A to B?
London PostgreSQL Meetup
January 24th 2019
Option: CSV Extract, Transform & Load
London PostgreSQL Meetup
January 24th 2019
Option: Initial Load + Change Data Capture
London PostgreSQL Meetup
January 24th 2019
Option: Initial Load + Change Data Capture
connect
jdbc
London PostgreSQL Meetup
January 24th 2019
Ingredients
• Kafka
• Confluent
- Schema Registry
- Kafka Connect JDBC
• Debezium
• MySQL Database (Sakila)
• PostgreSQL
London PostgreSQL Meetup
January 24th 2019
Setting up MySQL
mysql -e "CREATE DATABASE sakila;"
mysql -e "CREATE USER sakila IDENTIFIED BY 'sakila';"
mysql -e "GRANT ALL PRIVILEGES ON *.* TO sakila;"
mysql -u sakila -psakila sakila < sakila-schema.sql
mysql -u sakila -psakila sakila < sakila-data.sql
https://dev.mysql.com/doc/sakila/en/
London PostgreSQL Meetup
January 24th 2019
Setting up MySQL
server-id = 2019
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = row
binlog_row_image = full
London PostgreSQL Meetup
January 24th 2019
Setting up PostgreSQL
psql -c "CREATE USER sakila WITH PASSWORD 'sakila';"
psql -c "CREATE DATABASE sakila OWNER sakila;"
London PostgreSQL Meetup
January 24th 2019
Setting up Debezium
{ "name": "mysql-connector",
"config": {
"Connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "192.168.100.8",
"database.port": "3306",
"database.user": "sakila",
"database.password": "sakila",
"database.server.id": "184054",
"database.server.name": "mysql",
"database.whitelist": "sakila",
"database.history.kafka.bootstrap.servers": "127.0.0.1:9092",
"database.history.kafka.topic": "dbhistory.sakila",
"include.schema.changes": "true" }
}
London PostgreSQL Meetup
January 24th 2019
Adding the Debezium Kafka Connector
curl -X POST
-d @mysql.json
http://localhost:8083/connectors
--header "Content-Type:application/json"
London PostgreSQL Meetup
January 24th 2019
Verifying What’s Happened
# ./kafka-topics.sh --list --zookeeper localhost:2181
...
dbhistory.sakila
mysql
mysql.sakila.actor
mysql.sakila.address
mysql.sakila.category
mysql.sakila.city
...
London PostgreSQL Meetup
January 24th 2019
Interlude: The Story so Far
London PostgreSQL Meetup
January 24th 2019
Setting up Kafka Connect JDBC
{ "name": "psql-sakila-sink-actor",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "mysql.sakila.actor",
"connection.url":
"jdbc:postgresql://192.168.100.9:5432/sakila?user=sakila&password=sakila",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"auto.create": "true",
"insert.mode": "upsert",
"table.name.format": "actor",
"pk.fields": "actor_id",
"pk.mode": "record_value" } }
London PostgreSQL Meetup
January 24th 2019
Adding the Kafka JDBC Connector
curl -X POST
-d @jdbc.json
http://localhost:8083/connectors
--header "Content-Type:application/json"
London PostgreSQL Meetup
January 24th 2019
Tada!
connect
jdbc
London PostgreSQL Meetup
January 24th 2019
No such thing as a ...
Mike Fowler
www.mlfowler.com
Questions ?
mlfowler
mlfowler_
github.com/gh-mlfowler/debezium-demo
mike.fowler@claranet.uk
Migrating with Debezium

More Related Content

PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PPTX
Introduction to azure cosmos db
PDF
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
PPTX
Snowflake Overview
PDF
Change Data Streaming Patterns for Microservices With Debezium
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PDF
Building a Data Lake on AWS
PDF
Intro to Delta Lake
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Introduction to azure cosmos db
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Snowflake Overview
Change Data Streaming Patterns for Microservices With Debezium
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Building a Data Lake on AWS
Intro to Delta Lake

What's hot (20)

PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Azure Synapse 101 Webinar Presentation
PPTX
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
PDF
Cassandra Introduction & Features
PDF
What is in a Lucene index?
PDF
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PDF
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
PPTX
Programming in Spark using PySpark
PDF
Apache Iceberg: An Architectural Look Under the Covers
PDF
Azure Synapse Analytics
PDF
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PDF
Apache Kafka® and the Data Mesh
PPTX
Azure Synapse Analytics Overview (r1)
PPTX
Spark and Spark Streaming
PPTX
Azure Data Factory ETL Patterns in the Cloud
PDF
Kafka 101 and Developer Best Practices
Apache Iceberg - A Table Format for Hige Analytic Datasets
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Scaling your Data Pipelines with Apache Spark on Kubernetes
Azure Synapse 101 Webinar Presentation
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Cassandra Introduction & Features
What is in a Lucene index?
[pgday.Seoul 2022] PostgreSQL with Google Cloud
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Programming in Spark using PySpark
Apache Iceberg: An Architectural Look Under the Covers
Azure Synapse Analytics
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Apache Kafka® and the Data Mesh
Azure Synapse Analytics Overview (r1)
Spark and Spark Streaming
Azure Data Factory ETL Patterns in the Cloud
Kafka 101 and Developer Best Practices
Ad

More from Mike Fowler (16)

PDF
From Warehouses to Lakes: The Value of Streams
PDF
From Warehouses to Lakes: The Value of Streams
PDF
Getting Started with Machine Learning on AWS
PDF
Building with Firebase
PDF
Reducing Pager Fatigue Using a Serverless ML Bot
PDF
Getting started with Machine Learning
PDF
Leveraging Automation for a Disposable Infrastructure
PDF
Migrating PostgreSQL to the Cloud
PDF
Shaping Clouds with Terraform
PDF
Elephants in the Cloud
PDF
Google Cloud & Your Data
PDF
Hosted PostgreSQL
PDF
Disposable infrastructure
ODP
Fun Things to do with Logical Decoding
ODP
Handling XML and JSON in the Database
ODP
Migrating Rant & Rave to PostgreSQL
From Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of Streams
Getting Started with Machine Learning on AWS
Building with Firebase
Reducing Pager Fatigue Using a Serverless ML Bot
Getting started with Machine Learning
Leveraging Automation for a Disposable Infrastructure
Migrating PostgreSQL to the Cloud
Shaping Clouds with Terraform
Elephants in the Cloud
Google Cloud & Your Data
Hosted PostgreSQL
Disposable infrastructure
Fun Things to do with Logical Decoding
Handling XML and JSON in the Database
Migrating Rant & Rave to PostgreSQL
Ad

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPT
Geologic Time for studying geology for geologist
PDF
Architecture types and enterprise applications.pdf
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
CloudStack 4.21: First Look Webinar slides
DOCX
search engine optimization ppt fir known well about this
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
The various Industrial Revolutions .pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
August Patch Tuesday
Enhancing emotion recognition model for a student engagement use case through...
1 - Historical Antecedents, Social Consideration.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
observCloud-Native Containerability and monitoring.pptx
Web Crawler for Trend Tracking Gen Z Insights.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Geologic Time for studying geology for geologist
Architecture types and enterprise applications.pdf
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
CloudStack 4.21: First Look Webinar slides
search engine optimization ppt fir known well about this
DP Operators-handbook-extract for the Mautical Institute
Hindi spoken digit analysis for native and non-native speakers
Taming the Chaos: How to Turn Unstructured Data into Decisions
The various Industrial Revolutions .pptx
Getting started with AI Agents and Multi-Agent Systems
O2C Customer Invoices to Receipt V15A.pptx
August Patch Tuesday

Migrating with Debezium

Editor's Notes

  • #3: Senior Site Reliability Engineer in the Public Cloud Practice Background in Software & Systems Engineering, System & Database Administration Using PostgreSQL since version 7.4 Contributed to PostgreSQL, Terraform & YAWL Been involved with flossUK, formerly UKUUG for many years first reviewing books and then as a speaker Work exclusively on public clouds, hence the collection of Public Cloud certifications
  • #4: Image Source: Author’s personal collection “As you can see from my background, I like to think that I know Data”
  • #5: Image Source: https://wall.alphacoders.com/big.php?i=789452 Identify what requires migration Database schemas Client applications Analyse for PostgreSQL compatibility In each schema we need to convert Storage Objects Tables, views, data types, etc. Code Objects Packages, procedures & functions Most projects are qualified out at this stage
  • #6: Image Source: https://www.coventrytelegraph.net/news/uk-world-news/big-ben-scaffolding-14826151 Determine what needs to be migrated Whole schemas may no longer be required Some tables could be archived out of the database Extract tables to a data interchange format and store in Google Cloud Storage or S3 Some packages/procedures/functions could be removed Rewrite stored procedures and functions to PL/PgSQL or equivalent Test Driven Compatibility Development Existing bugs may be identified Preserving them may be necessary, and hard Fixes should be retrofitted so that switchover day surprises don’t occur
  • #7: Image Source: https://www.amazon.com/Battlestar-Galactica-Fleet-Poster-2ftx3ft/dp/B005E9JWTW Migration of test database will commence with the refactoring Final migration of the production database occurs after: All refactorings are complete The migration has been performed multiple times The applications have been tested against a pre-migrated production database
  • #8: Two primary approaches Extract,Transform & Load (also called Dump & Restore) Change Data Capture & Replay Picking on MySQL but MongoDB and PostgreSQL itself can also be sources Oracle & SQL Server are in development
  • #9: Simplest strategy Perceived as low risk for data loss Less “moving parts” Easy to test repeatedly and “prove” migration Downtime is function of database size Downtime can be minimised Move historic data ahead of time Opportunity to clear out unused data Consider introducing partitions Over provision resources Gives higher throughput during data load Downscale once operational
  • #10: Usually a complicated option that requires connected two unrelated technologies The same ETL as the CSV approach needs to be performed, but continually and reliably Failures often require a complete restart of the process to ensure no data loss Numerous commercial offerings but few open source Enter Kafka Append only log journaling system with messaging semantics Entire history can be maintained if you’ve the disk space Consumers maintain their position allowing for data replay in the event of failure
  • #11: Kafka Connect is an extensible framework for producers and consumers Debezium is a Kafka Connect source Configured to read from MySQL and write to Kafka Each table becomes it’s own topic with a dedicated topic for schema changes The data and schema can be recorded in Kafka in Avro format using open source tools from Confluent Kafka Connect JDBC is both a Kafka Connect source and sink, we’ll use it to sink data to PostgreSQL We need to create one connector for each table topic to write to PostgreSQL Able to read the schema data and create a corresponding table to host the incoming data
  • #12: All of this software is open source however configuring and building the Confluent tools is involved Working demonstration using Vagrant is available in GitHub (github.com/gh-mlfowler/debezium-demo) We’ll walk through the interesting pieces of configuration to give an idea of what is involved and how easy it is to get data moving
  • #13: The MySQL installation is the basic default available in Ubuntu A database and user is created, all using the ‘sakila’ identifier The Sakila (https://dev.mysql.com/doc/sakila/en/) sample database is loaded Binary logging is not enabled so there is no change data capture available for this data
  • #14: Now the data is loaded we enable binary logging We need a unique server-id to identify this server, we’ll need another one later MySQL is restarted and any new changes are now recorded in the binary log
  • #15: Again, vanilla PostgreSQL as found in Ubuntu Just a database and a user which owns the database No schema is loaded, we just leave the empty database to be populated by JDBC Connect later
  • #16: Each Kafka connector is defined by a JSON document passed to a web service Items highlighted in red are what we likely need to change for each new connector We need a unique database.server.id so Debezium can identify itself to the MySQL instance A whitelist allows us to pick only the database or databases of interest
  • #17: A POST with the JSON document as the body will create the connector and we can remove it later with a DELETE Once the connector is created, it will interrogate MySQL to extract the schema This requires a brief database wide lock but the timing is short and logged - check this for your planning! All data older than the current binary log is read and transferred to Kafka A topic is created per table A topic recording schema changes is created A second history table is created and is really for internal use
  • #18: We can verify this by browsing our Kafka box and listing the topics The topic names follow the pattern “server-name.database.table” Server name is “mysql” in this example. This topic has the schema changes
  • #19: At this stage all existing data is resting in topics in Kafka New changes to MySQL will be appended to Kafka shortly after they occur DDL changes to the server name topic Data changes to the relevant table topic Although PostgreSQL is running, no data has been sent there yet
  • #20: Configuring the Kafka Connect JDBC sink is similar to Debezium Create one for each table we wish to sink or read multiple in a single connector In this example we’ll consume only the actor table If the table name isn’t specified the topic name will be used which PostgreSQL will reject
  • #21: Loading the JDBC connector is identical, just a POST to the Kafka Connect web service
  • #22: All finished, end to end replication of data in near real time from MySQL to PostgreSQL We can create many replicas simply by reading from the topics One off read from the source database as long as Kafka is large enough to store all the data
  • #23: Image Source: https://eguidemagazine.com/theres-no-such-thing-as-a-free-lunch/ With the simplicity of set-up we lose some things Constraints are not copied across These can be added at the time of the final switchover however you’ll need to time this to measure the impact Data types are converted to the lowest common denominator You can pre-create the schema to hand nominate the conversion Does not deal with stored procedures There are some tools for this, but this isn’t the tool for it