SlideShare a Scribd company logo
Julien Testut
Senior Principal Product Manager, Oracle Development
with thanks to Jagdev Dhillon & Tianshu Li
Productizing AsyncAPI for
Data Replication / CDC
Copyright © 2023, Oracle and/or its affiliates
The following is intended to outline our general product
direction. It is intended for information purposes only, and may
not be incorporated into any contract. It is not a commitment to
deliver any material, code, or functionality, and should not be
relied upon in making purchasing decisions. The development,
release, timing, and pricing of any features or functionality
described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.
The materials in this presentation pertain to Oracle Health, Oracle, Oracle Cerner, and Cerner Enviza which are all wholly owned
subsidiaries of Oracle Corporation. Nothing in this presentation should be taken as indicating that any decisions regarding the
integration of any EMEA Cerner and/or Enviza entities have been made where an integration has not already occurred.
Safe harbor statement
Copyright © 2023, Oracle and/or its affiliates
Agenda
1. Brief background on CDC & GoldenGate
2. Why AsyncAPI?
3. AsyncAPI with GoldenGate
4. Roadmap
Copyright © 2023, Oracle and/or its affiliates
Databases, Change Data Capture (CDC) & Data Replication
Copyright © 2023, Oracle and/or its affiliates
• In databases, the most important system events are Transactions (Tx’s)
• DML (data manipulation language) – inserts, updates, deletes
• DDL (data definition language) – schema changes, alter table, etc.
• All OLTP databases, and most databases overall, have centralized logging
• Users/applications can open short/long-running Tx’s, affecting single rows or billions
• Committed transactions are when these database events achieve “durability”
• Change Data Capture (CDC) is fundamentally about “capturing” the Tx’s from the source
• Typically, at the moment of “commit” in the logs – when the Tx’s become durable
• Replication is about transmitting the “captured” Tx’s to other places (e.g.; Targets)
• Some databases have their own CDC & Replication layers (e.g.; proprietary to only that DB)
• CDC/Replication tools are built to work with many different databases
CDC Tools and Oracle GoldenGate (GG)
Copyright © 2023, Oracle and/or its affiliates
• CDC tools are a long-time part of the enterprise software domain
• GoldenGate was one of the first tools in this area, dating back from the mid-1990’s
• Open-source tools like Debezium have gained popularity since ~2020
• GoldenGate CDC/Replication technology was acquired by Oracle in 2009
• To provide a solution for “logical replication” supporting High Availability in distributed DBs
• To replace older technologies: Oracle Streams, Oracle CDC
• To become the foundation of Oracle’s data integration portfolio
• Today, GoldenGate is ~$1B global ecosystem for mission-critical systems
• ~10,000 customers, 180+ countries, historically mostly for large multi-nationals and Gov’s
• GG runs in most of your banks, payments systems, ecommerce retailers, telcos, airlines, etc.
• GG supports 100’s of different databases, clouds, warehouses, lakehouses, messaging, etc.
• GG is more than CDC/Replication, it also includes Data Integration, Streaming Data, Cloud
Pipelines, Data Governance, and Real-time Observability
History of GoldenGate and “Why AsyncAPI Now?”
Copyright © 2023, Oracle and/or its affiliates
• 1995 – 2005 – Decade of database replication
• Use cases focus on DML/DDL replication between databases
• 2005 – 2015 – Emergence of MPP and Big Data
• Massive expansions into Data Warehousing and eventually Hadoop-based Big Data tech
• 2015 – today – Shift to Microservices, Cloud and distributed data architecture
• GG is refactored to a microservices architecture and massive growth in cloud delivery
• Adoption of AsyncAPI is a natural part of the evolution for GoldenGate
• CDC/Replication is inherently an asynchronous activity
• More and more use cases featuring “Event Sourcing” designs (Tx Outbox, Saga Patterns)
• Event streams are becoming a valued part of a “Data Product” architecture
• Kafka is non-transactional (eg; DMLs) and difficult to maintain “C” (consistency) in ACID
• Kafka is sometimes “overkill” / too much overhead for many use cases
AsyncAPI with GoldenGate
Copyright © 2023, Oracle and/or its affiliates
Automated, machine-generated client applications to a stream of exactly-once
transaction events – JSON formatted, via REST pub/sub AsyncAPI
Standardized
Pub/Sub APIs
CDC/Replication
of Transactions
Real-time data events
Message
Data Tx’s
• Inserts
• Updates
• Deletes
• GET/PUT
• Schema
Changes
YAML
descriptor
Real-time data events
to any data consumers
Automated code generators:
Bypass need for Kafka
for data consumers
Benefits
Simple event sourcing &
transaction outbox patterns
AsyncAPI is the future of
event-driven architecture
Two important “big picture” use cases for CDC/Replication + AsyncAPI
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
Data Product
Producers
Tx’s
Apps
JSON
etc.
Application
Microservice
Consumers are App Microservices, for
CQRS/Outbox type design patterns
Analytic or data science consumers,
or for bespoke clients
Why bind from data tier? (a) commit point for durable data, (b) lowest latency
transmission, and (c) very high levels of automation
streaming data
products
Parquet
etc.
DB Event Streams with CDC and AsyncAPI
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Apps
DB Transactions/Commits
Base Tables
(Application Schema)
GoldenGate
Microservices
Data Product
Consumers
Apps
Transform
data, react to
changes
Tx’s
JSON as Tx’s
Data and Schema
changes are in stream
Pros:
• Easy for Producer
(highly automated, very
low effort to publish)
• No application changes
are required
• Schema metadata in
payloads (ie: the
Consumer can decide
how to handle schema
changes)
Cons:
• Consumer binding is to
Base Tables, exposing
some implementation
details such as Structure
DML
events
JSON
Transaction Outbox (without CDC or AsyncAPI tooling)
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Apps (code)
Base Tables
Data Product
Consumers
Apps
Outbox
Table
JSON
*JSON may be
in consumer
or producer
formats
JSON as
Biz Objects
Pros:
• Outbox pattern ensures
data consistency at
commit-point
• JSON schema may be
defined by either
Producer or Consumer
Cons:
• Latency & load – when
using a polling-based
relay service
• Burden of change lifecycle
is on the Producer
JSON
DB Transactions/Commits
JSON
commit
dev
code
Broker
Message
Relay
Read: polling
for changes
Publish:
distribute
Transaction Outbox with CDC & AsyncAPI
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Apps (code)
Base Tables
GoldenGate
Microservices
Data Product
Consumers
Apps
Outbox
Table
JSON
*JSON may be
in consumer
or producer
formats
JSON as
Biz Objects
in CDC payload
Pros:
• CDC + AsyncAPI provides
very low latency and less
impact on source DB
• Easy for Consumer
(eg; in many cases,
Consumer may define
format of the JSON)
• Outbox pattern may be
favored by Producer
application developers
(for Tx consistency)
Cons:
• Burden of change lifecycle
is on the Producer
JSON
DB Transactions/Commits
JSON
commit
dev
code
DML
events
Using AsyncAPI with GoldenGate
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
Data Product
Producers
1. Decide which Databases,
Tables & Columns to publish
2. Use GoldenGate Admin
Microservice to setup the
“capture” trail (GG’s ledger)
3. Use GoldenGate Distribution
Microservice to define the
AsyncAPI Channel (and
associate it to GG Trail)
4. Use REST/GUI to browse AsyncAPI
Channels
5. Authenticate using GG user/role and
download YAML document
6. Build or generate client to receive and
parse the Tx payload from GG Data Streams
7. Consume transactions
Tx’s
Apps
JSON
“Data Producer” using GoldenGate to create AsyncAPI Channels
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Part of GG microservice
called “Distribution Service”
Create “Data Streams”
Associated to a “GG Trail”
“Data Producer” can filter payloads, per Channel
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Filtering can happen at
Object level… Tables,
Columns, Data Values (eg;
sensitive data, or JSON
payloads etc.)
GoldenGate (typically) publishes via WebSocket Secure (WSS)
Copyright © 2023, Oracle and/or its affiliates
Oracle objective is to have
WSS client template to be
contributed back to
AsyncAPI for all to use
“Data Producer” using GoldenGate to create AsyncAPI Channels
Copyright © 2023, Oracle and/or its affiliates
Individual Data Streams
consist of a GoldenGate
payload, any of 6 possible
schema types
Data Product
Producers
GoldenGate generated client code
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
Example JavaScript to
define and initialize
the WebSocket for
streaming
GoldenGate Data Streams Payload
Copyright © 2023, Oracle and/or its affiliates
Record consist of
before/after images
and op_type
information (type of
transaction)
Data Product
Consumers
Example JSON records
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
Options for the service
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
• Connection protocol (set by the Producer)
• ws – WebSocket or wss – WebSocket Secure
• Payload service levels (set by Producer)
• Exact-once – GoldenGate will handle all tasks for deduplication
of records to guarantee that DML/DDL events are only sent
exactly one time
• At-most-once – Will tolerate gaps in streaming data records,
e.g.; gaps in data that may have been purged by the Producer
• At-least-once – Service Producer may from time-to-time re-
process source DML/DDL and this SLA may send duplicates
• Start position (set by Consumer)
• Current – will begin streaming Tx’s from current position
• Earliest – will fetch Tx’s starting from earliest available in the
GoldenGate Trail (retention is defined by Data Producers)
Data Product
Producers
Roadmap – what’s on the horizon
Copyright © 2023, Oracle and/or its affiliates
• Formatters
• For App Consumers: JSON (default), Avro, XML, Protobuf, etc.
• For Analytic Consumers: Parquet, Iceberg, Delta, etc.
• CloudEvents payload format option
• Adds more overhead, latency, etc.
• May help simplify how some clients can parse the transactions
• Business object semantics
• When integrated with Oracle JSON-Relational duality
(producers may choose to share Business Object structure,
rather than the physical tables)
• Stream processing sink
• AsyncAPI channels as output of streaming data pipelines
(pipelines enable data integration/prep or analytic actions)
Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed Data Capture, Julien Testut, Oracle
Our mission is to help people see
data in new ways, discover insights,
unlock endless possibilities.
Copyright © 2023, Oracle and/or its affiliates

More Related Content

Similar to Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed Data Capture, Julien Testut, Oracle (20)

PDF
Microservices Patterns with GoldenGate
Jeffrey T. Pollock
 
PDF
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
Markus Michalewicz
 
PPTX
Serverless patterns
Jesse Butler
 
PDF
techbrief-enterprisedatameshandgoldengate.pdf
aliramezani30
 
PPTX
Best practices for application migration to public clouds interop presentation
esebeus
 
PPTX
Fluentd – Making Logging Easy & Effective in a Multi-cloud & Hybrid Environme...
Phil Wilkins
 
PDF
Oracle Database Migration to Oracle Cloud Infrastructure
SinanPetrusToma
 
PPTX
OOW19 - HOL5221
Bobby Curtis
 
PDF
Cloudera federal summit
Matt Carroll
 
PDF
Database@Home : The Future is Data Driven
Tammy Bednar
 
PDF
Whats new in Autonomous Database in 2022
Sandesh Rao
 
PPTX
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
Bharat Paliwal
 
PDF
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
PDF
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Denodo
 
PDF
56k.cloud training
Brian Christner
 
DOCX
Winds of change from vender lock in to the meta cloud
Munisekhar Gunapati
 
PDF
OAC and ODI! A Match Made in…the cloud?
Rodrigo Radtke de Souza
 
PDF
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
MarketingArrowECS_CZ
 
PPTX
Oracle GoldenGate on Docker
Bobby Curtis
 
PPTX
The Last Frontier- Virtualization, Hybrid Management and the Cloud
Kellyn Pot'Vin-Gorman
 
Microservices Patterns with GoldenGate
Jeffrey T. Pollock
 
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
Markus Michalewicz
 
Serverless patterns
Jesse Butler
 
techbrief-enterprisedatameshandgoldengate.pdf
aliramezani30
 
Best practices for application migration to public clouds interop presentation
esebeus
 
Fluentd – Making Logging Easy & Effective in a Multi-cloud & Hybrid Environme...
Phil Wilkins
 
Oracle Database Migration to Oracle Cloud Infrastructure
SinanPetrusToma
 
OOW19 - HOL5221
Bobby Curtis
 
Cloudera federal summit
Matt Carroll
 
Database@Home : The Future is Data Driven
Tammy Bednar
 
Whats new in Autonomous Database in 2022
Sandesh Rao
 
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
Bharat Paliwal
 
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Denodo
 
56k.cloud training
Brian Christner
 
Winds of change from vender lock in to the meta cloud
Munisekhar Gunapati
 
OAC and ODI! A Match Made in…the cloud?
Rodrigo Radtke de Souza
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
MarketingArrowECS_CZ
 
Oracle GoldenGate on Docker
Bobby Curtis
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
Kellyn Pot'Vin-Gorman
 

More from apidays (20)

PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Ad

Recently uploaded (20)

PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Ad

Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed Data Capture, Julien Testut, Oracle

  • 1. Julien Testut Senior Principal Product Manager, Oracle Development with thanks to Jagdev Dhillon & Tianshu Li Productizing AsyncAPI for Data Replication / CDC Copyright © 2023, Oracle and/or its affiliates
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. The materials in this presentation pertain to Oracle Health, Oracle, Oracle Cerner, and Cerner Enviza which are all wholly owned subsidiaries of Oracle Corporation. Nothing in this presentation should be taken as indicating that any decisions regarding the integration of any EMEA Cerner and/or Enviza entities have been made where an integration has not already occurred. Safe harbor statement Copyright © 2023, Oracle and/or its affiliates
  • 3. Agenda 1. Brief background on CDC & GoldenGate 2. Why AsyncAPI? 3. AsyncAPI with GoldenGate 4. Roadmap Copyright © 2023, Oracle and/or its affiliates
  • 4. Databases, Change Data Capture (CDC) & Data Replication Copyright © 2023, Oracle and/or its affiliates • In databases, the most important system events are Transactions (Tx’s) • DML (data manipulation language) – inserts, updates, deletes • DDL (data definition language) – schema changes, alter table, etc. • All OLTP databases, and most databases overall, have centralized logging • Users/applications can open short/long-running Tx’s, affecting single rows or billions • Committed transactions are when these database events achieve “durability” • Change Data Capture (CDC) is fundamentally about “capturing” the Tx’s from the source • Typically, at the moment of “commit” in the logs – when the Tx’s become durable • Replication is about transmitting the “captured” Tx’s to other places (e.g.; Targets) • Some databases have their own CDC & Replication layers (e.g.; proprietary to only that DB) • CDC/Replication tools are built to work with many different databases
  • 5. CDC Tools and Oracle GoldenGate (GG) Copyright © 2023, Oracle and/or its affiliates • CDC tools are a long-time part of the enterprise software domain • GoldenGate was one of the first tools in this area, dating back from the mid-1990’s • Open-source tools like Debezium have gained popularity since ~2020 • GoldenGate CDC/Replication technology was acquired by Oracle in 2009 • To provide a solution for “logical replication” supporting High Availability in distributed DBs • To replace older technologies: Oracle Streams, Oracle CDC • To become the foundation of Oracle’s data integration portfolio • Today, GoldenGate is ~$1B global ecosystem for mission-critical systems • ~10,000 customers, 180+ countries, historically mostly for large multi-nationals and Gov’s • GG runs in most of your banks, payments systems, ecommerce retailers, telcos, airlines, etc. • GG supports 100’s of different databases, clouds, warehouses, lakehouses, messaging, etc. • GG is more than CDC/Replication, it also includes Data Integration, Streaming Data, Cloud Pipelines, Data Governance, and Real-time Observability
  • 6. History of GoldenGate and “Why AsyncAPI Now?” Copyright © 2023, Oracle and/or its affiliates • 1995 – 2005 – Decade of database replication • Use cases focus on DML/DDL replication between databases • 2005 – 2015 – Emergence of MPP and Big Data • Massive expansions into Data Warehousing and eventually Hadoop-based Big Data tech • 2015 – today – Shift to Microservices, Cloud and distributed data architecture • GG is refactored to a microservices architecture and massive growth in cloud delivery • Adoption of AsyncAPI is a natural part of the evolution for GoldenGate • CDC/Replication is inherently an asynchronous activity • More and more use cases featuring “Event Sourcing” designs (Tx Outbox, Saga Patterns) • Event streams are becoming a valued part of a “Data Product” architecture • Kafka is non-transactional (eg; DMLs) and difficult to maintain “C” (consistency) in ACID • Kafka is sometimes “overkill” / too much overhead for many use cases
  • 7. AsyncAPI with GoldenGate Copyright © 2023, Oracle and/or its affiliates Automated, machine-generated client applications to a stream of exactly-once transaction events – JSON formatted, via REST pub/sub AsyncAPI Standardized Pub/Sub APIs CDC/Replication of Transactions Real-time data events Message Data Tx’s • Inserts • Updates • Deletes • GET/PUT • Schema Changes YAML descriptor Real-time data events to any data consumers Automated code generators: Bypass need for Kafka for data consumers Benefits Simple event sourcing & transaction outbox patterns AsyncAPI is the future of event-driven architecture
  • 8. Two important “big picture” use cases for CDC/Replication + AsyncAPI Copyright © 2023, Oracle and/or its affiliates Data Product Consumers Data Product Producers Tx’s Apps JSON etc. Application Microservice Consumers are App Microservices, for CQRS/Outbox type design patterns Analytic or data science consumers, or for bespoke clients Why bind from data tier? (a) commit point for durable data, (b) lowest latency transmission, and (c) very high levels of automation streaming data products Parquet etc.
  • 9. DB Event Streams with CDC and AsyncAPI Copyright © 2023, Oracle and/or its affiliates Data Product Producers Apps DB Transactions/Commits Base Tables (Application Schema) GoldenGate Microservices Data Product Consumers Apps Transform data, react to changes Tx’s JSON as Tx’s Data and Schema changes are in stream Pros: • Easy for Producer (highly automated, very low effort to publish) • No application changes are required • Schema metadata in payloads (ie: the Consumer can decide how to handle schema changes) Cons: • Consumer binding is to Base Tables, exposing some implementation details such as Structure DML events JSON
  • 10. Transaction Outbox (without CDC or AsyncAPI tooling) Copyright © 2023, Oracle and/or its affiliates Data Product Producers Apps (code) Base Tables Data Product Consumers Apps Outbox Table JSON *JSON may be in consumer or producer formats JSON as Biz Objects Pros: • Outbox pattern ensures data consistency at commit-point • JSON schema may be defined by either Producer or Consumer Cons: • Latency & load – when using a polling-based relay service • Burden of change lifecycle is on the Producer JSON DB Transactions/Commits JSON commit dev code Broker Message Relay Read: polling for changes Publish: distribute
  • 11. Transaction Outbox with CDC & AsyncAPI Copyright © 2023, Oracle and/or its affiliates Data Product Producers Apps (code) Base Tables GoldenGate Microservices Data Product Consumers Apps Outbox Table JSON *JSON may be in consumer or producer formats JSON as Biz Objects in CDC payload Pros: • CDC + AsyncAPI provides very low latency and less impact on source DB • Easy for Consumer (eg; in many cases, Consumer may define format of the JSON) • Outbox pattern may be favored by Producer application developers (for Tx consistency) Cons: • Burden of change lifecycle is on the Producer JSON DB Transactions/Commits JSON commit dev code DML events
  • 12. Using AsyncAPI with GoldenGate Copyright © 2023, Oracle and/or its affiliates Data Product Consumers Data Product Producers 1. Decide which Databases, Tables & Columns to publish 2. Use GoldenGate Admin Microservice to setup the “capture” trail (GG’s ledger) 3. Use GoldenGate Distribution Microservice to define the AsyncAPI Channel (and associate it to GG Trail) 4. Use REST/GUI to browse AsyncAPI Channels 5. Authenticate using GG user/role and download YAML document 6. Build or generate client to receive and parse the Tx payload from GG Data Streams 7. Consume transactions Tx’s Apps JSON
  • 13. “Data Producer” using GoldenGate to create AsyncAPI Channels Copyright © 2023, Oracle and/or its affiliates Data Product Producers Part of GG microservice called “Distribution Service” Create “Data Streams” Associated to a “GG Trail”
  • 14. “Data Producer” can filter payloads, per Channel Copyright © 2023, Oracle and/or its affiliates Data Product Producers Filtering can happen at Object level… Tables, Columns, Data Values (eg; sensitive data, or JSON payloads etc.)
  • 15. GoldenGate (typically) publishes via WebSocket Secure (WSS) Copyright © 2023, Oracle and/or its affiliates Oracle objective is to have WSS client template to be contributed back to AsyncAPI for all to use
  • 16. “Data Producer” using GoldenGate to create AsyncAPI Channels Copyright © 2023, Oracle and/or its affiliates Individual Data Streams consist of a GoldenGate payload, any of 6 possible schema types Data Product Producers
  • 17. GoldenGate generated client code Copyright © 2023, Oracle and/or its affiliates Data Product Consumers Example JavaScript to define and initialize the WebSocket for streaming
  • 18. GoldenGate Data Streams Payload Copyright © 2023, Oracle and/or its affiliates Record consist of before/after images and op_type information (type of transaction) Data Product Consumers
  • 19. Example JSON records Copyright © 2023, Oracle and/or its affiliates Data Product Consumers
  • 20. Options for the service Copyright © 2023, Oracle and/or its affiliates Data Product Consumers • Connection protocol (set by the Producer) • ws – WebSocket or wss – WebSocket Secure • Payload service levels (set by Producer) • Exact-once – GoldenGate will handle all tasks for deduplication of records to guarantee that DML/DDL events are only sent exactly one time • At-most-once – Will tolerate gaps in streaming data records, e.g.; gaps in data that may have been purged by the Producer • At-least-once – Service Producer may from time-to-time re- process source DML/DDL and this SLA may send duplicates • Start position (set by Consumer) • Current – will begin streaming Tx’s from current position • Earliest – will fetch Tx’s starting from earliest available in the GoldenGate Trail (retention is defined by Data Producers) Data Product Producers
  • 21. Roadmap – what’s on the horizon Copyright © 2023, Oracle and/or its affiliates • Formatters • For App Consumers: JSON (default), Avro, XML, Protobuf, etc. • For Analytic Consumers: Parquet, Iceberg, Delta, etc. • CloudEvents payload format option • Adds more overhead, latency, etc. • May help simplify how some clients can parse the transactions • Business object semantics • When integrated with Oracle JSON-Relational duality (producers may choose to share Business Object structure, rather than the physical tables) • Stream processing sink • AsyncAPI channels as output of streaming data pipelines (pipelines enable data integration/prep or analytic actions)
  • 23. Our mission is to help people see data in new ways, discover insights, unlock endless possibilities. Copyright © 2023, Oracle and/or its affiliates