SlideShare a Scribd company logo
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
MySQL Cluster
Bernd Ocklin
MySQL Cluster Development
Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Asynchronous Parallel Database Design for High Performance
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract. It
is not a commitment to deliver any material, code, or functionality, and should
not be relied upon in making purchasing decisions. The development, release,
and timing of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
Copyright 2015, oracle and/or its affiliates. All rights reserved 2
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Bernd Ocklin
Snr Director
MySQL Cluster Engineering
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Today
MySQL Cluster Overview
MySQL Cluster 8.0
Data Distribution
Asynchronous Model
Cluster Virtual Machine
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
MySQL Cluster
Proven in serving billions of people every day when making phone calls,
playing online games or handling financial transactions.
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 6
Massive linear scale
Always-On 99.9999% Availability
Distributed In-Memory Datasets
Always Consistent
Parallel Real-Time Performance.
Auto-partitioning, data distribution
and replication built-in.
Read- and Write Scale-Out
to many TB on commodity hardware.
Designed for mission critical
systems. Masterless, shared-nothing
with no single point of failure.
Transactional consistency across
distributed and partitioned dataset.
Out of the box straightforward
application programming.
Ease of use
Open Source
Written in C++. Can be used standalone
or with MySQL as a SQL front-end.
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
MySQL Cluster 8.0
- MySQL Server 5.7
- 5x faster restarts
- JSON Support
- 50% faster reads
- 40% faster read/write
MySQL 7.4 / 7.5
MySQL 7.6
- MySQL Server 5.7
- Redesigned for Terabyte clusters
- Redesigned for modern hardware
- 50% faster joins
- 100% faster scans
- 1000% faster restarts
MySQL 8.0
- Adopting data dictionary
- Hundreds of nodes
- 3 - 4 replicas
- Larger rows
Download and test 8.0.13 now
https://dev.mysql.com/downloads/cluster/
DMR
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
MySQL Cluster 7.6.7 - 2 x faster scans
• example: Sysbench RO
0
2250
4500
6750
9000
1 2 4 8 16 32 64 128 256 384 512
RO 7.6.8 7.6.7 7.5.11 7.4.21
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
MySQL Cluster 7.6.7 - 10 x faster restarts
• example: restart with 600 DBT2 warehouses - 280M rows
7.5.11 7.6.6 7.6.7
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
MySQL Cluster Architecture
• Multiple data nodes form a cluster
• Shared nothing
• Data distributed to data nodes
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Auto-partitioning and
distribution
• No name-node or central
master
• Each dataset is split into
fragments and distributed
across data nodes
• Within a cluster data is always
transactionally consistent
Data distribution User-id (PK) Service (PK) Data
1773467253 chat xxx
6257346892 chat xxx
1773467253 photos xxx
7234782739 photos xxx
8235602099 reminders xxx
8437829249 location xxx
MySQL Cluster Data Nodes
Partition Key
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Dataset distribution to virtual
partitions
• Hash on partition key as
default
• No re-hashing for data re-
organisation
Data distribution User-id (PK) Service (PK) Data
1773467253 chat xxx
6257346892 chat xxx
1773467253 photos xxx
7234782739 photos xxx
8235602099 reminders xxx
8437829249 location xxx
Partition Key
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data distribution
• Cluster uses thousands of virtual partitions
• Distributed to data nodes and within data nodes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data distribution
• These virtual partitions are distributed to data nodes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data distribution
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data distribution
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data distribution
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Global consistent view of distributed data
User-id (PK) Service (PK) Data
1773467253 chat xxx
6257346892 chat xxx
1773467253 photos xxx
7234782739 photos xxx
8235602099 reminders xxx
8437829249 location xxx
Partition Key
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Across one or many parallel MySQL Servers and API nodes
Global consistent view of distributed data
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Data distribution awareness
• Key-value with hash on primary key
• Complemented by ordered in-memory-
optimised T-Tree indexes for fast searches
• Cluster always knows where its data is - without a name node
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Parallel cross-partition queries
•Parallel execution on
the data nodes and
within data nodes
• Utilizes up to 64 cores
efficiently
• Parallelizes all work
even on single queries
• Batches automatically
• Event driven and
asynchronous
• Always consistent access to the entire distributed and partitioned dataset
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Just mentioning - not topic of these slides
Cluster and High Availability
Copy of 1
Copy of 1
• Multiple copies of data are
maintained for availability
• A group of data nodes shares the
same data
• 1 - 4 replicas/copies of data can
be configured
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Replication locally and between physical locations
• With awareness of data locality and availability domains for cloud
Cluster
Data Nodes
Replication
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Reading and writing data
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Asynchronous design for reads and writes
• Tasks organized in working
blocks
• Signals for communication
between blocks
• Job buffers for each
working block
• Executed in cluster VM
• Cluster virtual machine kernel
Working 

„block“
Job 

queue
Signals

(Events)
• Transaction 

Coordination
• Data Manager
• …
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Single asynchronous simple read
• A single transaction is coordinated by one Transaction Coordinator
• One Data Manager per partition
Transaction
Coordinator (TC)
Data
Managers (DM)
Async Signals
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Single asynchronous simple read - distributed
• Signals are send between blocks inside and between data nodes
Transaction
Coordinator (TC)
Data
Manager (DM)
Data Node Data Node
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Multiple asynchronous simple reads
Transaction
Coordinator (TC)
Data
Managers (DM)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Single asynchronous table scan
Transaction
Coordinator (TC)
Data
Managers (DM)
• Scanning multiple partitions in parallel
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
In a bit more detail
LQH ACC TUP TUX API TC
Scan request
Stored procedure request
Next scan request
prepareTupleKeyRequest
Next scan confirmation
Tuple key request
TransAI
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cluster virtual machine
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Cost of CPU context switches - around 3 - 5us
Context
switch
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Multiple Threads per data node
• Each thread handles some blocks
• Parallel execution of
• … multiple queries from …
• … multiple users on …
• … multiple MySQL Servers
• Communication with signals
• Goal: minimize context
switching
Virtual machine model
NIC
Main thread
Data manager
Receive
Send
Disk/SSD IO
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Data is partitioned inside the
data nodes
• Communication asynchronously,
event driven on cluster’s VM
• No group communication -
instead using
• Distributed row locks
• Non-blocking 2-phase commit
Lock free and multi core VM
NIC
Main thread
Data manager
Receive
Send
Disk/SSD IO
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Concurrent requests in multithreaded Virtual Machine
• Out of order execution possible
Transaction
Coordinator (TC)
Data
Managers (DM)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Multithreaded Virtual Machine
• Each partition can be managed in its own thread via Data Manager
• There can be separate threads for disks, sending and receiving over
network as well
• Threads can be distributed and locked to CPU cores for real time
Data manager threads
Data nodes with 

data partitions
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cross partition joins
•Parallel execution
across data nodes and
within data nodes
•Cluster queries
distributed data as if
it was a single
consolidated database
• Joins are pushed
down to data nodes
• Result consolidation
in MySQL Server
• Parallel joins with MySQL Server query engine
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Massive parallel system executing parallel queries
Receive Send
Transaction Data Manager
Data Node
Receive Send
Transaction Data Manager
Data Node
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Some conclusions
• Asynchronous event driven model
• Very hard to establish (distributed, out of order execution, …)
• Once established it is easy to extend (create new blocks, signals)
• Very simple to split, scale and multithread as blocks already communicate
asynchronously
• Allows to parallelize even single queries over multiple machines and cores
• Network is the bottleneck
• Virtual machine model helps to avoid CPU overhead
• Context switches
• Memory

More Related Content

What's hot (20)

PPTX
Gpu computing workshop
datastack
 
PDF
Understanding Cassandra internals to solve real-world problems
Acunu
 
ODP
Nyc summit intro_to_cassandra
zznate
 
PDF
Migration Best Practices: From RDBMS to Cassandra without a Hitch
DataStax Academy
 
PPTX
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
PDF
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
PDF
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
PDF
MariaDB ColumnStore
MariaDB plc
 
PDF
NewSQL overview, Feb 2015
Ivan Glushkov
 
PDF
Cisco: Cassandra adoption on Cisco UCS & OpenStack
DataStax Academy
 
PDF
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
 
PPTX
Large partition in Cassandra
Shogo Hoshii
 
PDF
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
ScyllaDB
 
PDF
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
DataStax
 
PPTX
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
PDF
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
DataStax
 
PPTX
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
PDF
NewSQL Database Overview
Steve Min
 
PPTX
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
PPTX
An Overview of Apache Cassandra
DataStax
 
Gpu computing workshop
datastack
 
Understanding Cassandra internals to solve real-world problems
Acunu
 
Nyc summit intro_to_cassandra
zznate
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
DataStax Academy
 
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
MariaDB ColumnStore
MariaDB plc
 
NewSQL overview, Feb 2015
Ivan Glushkov
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
DataStax Academy
 
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
 
Large partition in Cassandra
Shogo Hoshii
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
ScyllaDB
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
DataStax
 
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
DataStax
 
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
NewSQL Database Overview
Steve Min
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
An Overview of Apache Cassandra
DataStax
 

Similar to Mysql NDB Cluster's Asynchronous Parallel Design for High Performance (20)

PDF
MySQL 5.7 InnoDB Cluster (Jan 2018)
Olivier DASINI
 
PDF
What's New in MySQL 8.0 @ HKOSC 2017
Ivan Ma
 
PDF
Why MySQL High Availability Matters
Matt Lord
 
PDF
Novinky v Oracle Database 18c
MarketingArrowECS_CZ
 
PDF
Unlocking big data with Hadoop + MySQL
Ricky Setyawan
 
PDF
MySQL Connector/Node.js and the X DevAPI
Rui Quelhas
 
PDF
MySQL & Oracle Linux Keynote at Open Source India 2014
Sanjay Manwani
 
PPTX
Rac 12c rel2_operational_best_practices_sangam_2017
Anil Nair
 
PPTX
Simplify IT: Oracle SuperCluster
Fran Navarro
 
PPTX
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
Andrew Morgan
 
PDF
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
Olivier DASINI
 
PDF
New availability features in oracle rac 12c release 2 anair ss
Anil Nair
 
PDF
Oracle Storage a ochrana dat
MarketingArrowECS_CZ
 
PDF
Oracle Cloud
MarketingArrowECS_CZ
 
PDF
My sql5.7 whatsnew_presentedatgids2015
Sanjay Manwani
 
PPT
Exadata architecture and internals presentation
Sanjoy Dasgupta
 
PDF
Introduction to MySQL
Ted Wennmark
 
PDF
MySQL for Software-as-a-Service (SaaS)
Mario Beck
 
PPTX
MySQL Cluster - Latest Developments (up to and including MySQL Cluster 7.4)
Andrew Morgan
 
PDF
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016
Geir Høydalsvik
 
MySQL 5.7 InnoDB Cluster (Jan 2018)
Olivier DASINI
 
What's New in MySQL 8.0 @ HKOSC 2017
Ivan Ma
 
Why MySQL High Availability Matters
Matt Lord
 
Novinky v Oracle Database 18c
MarketingArrowECS_CZ
 
Unlocking big data with Hadoop + MySQL
Ricky Setyawan
 
MySQL Connector/Node.js and the X DevAPI
Rui Quelhas
 
MySQL & Oracle Linux Keynote at Open Source India 2014
Sanjay Manwani
 
Rac 12c rel2_operational_best_practices_sangam_2017
Anil Nair
 
Simplify IT: Oracle SuperCluster
Fran Navarro
 
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
Andrew Morgan
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
Olivier DASINI
 
New availability features in oracle rac 12c release 2 anair ss
Anil Nair
 
Oracle Storage a ochrana dat
MarketingArrowECS_CZ
 
Oracle Cloud
MarketingArrowECS_CZ
 
My sql5.7 whatsnew_presentedatgids2015
Sanjay Manwani
 
Exadata architecture and internals presentation
Sanjoy Dasgupta
 
Introduction to MySQL
Ted Wennmark
 
MySQL for Software-as-a-Service (SaaS)
Mario Beck
 
MySQL Cluster - Latest Developments (up to and including MySQL Cluster 7.4)
Andrew Morgan
 
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016
Geir Høydalsvik
 
Ad

Recently uploaded (20)

PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Executive Business Intelligence Dashboards
vandeslie24
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Import Data Form Excel to Tally Services
Tally xperts
 
Ad

Mysql NDB Cluster's Asynchronous Parallel Design for High Performance

  • 1. Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | MySQL Cluster Bernd Ocklin MySQL Cluster Development Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Asynchronous Parallel Database Design for High Performance
  • 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright 2015, oracle and/or its affiliates. All rights reserved 2
  • 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Bernd Ocklin Snr Director MySQL Cluster Engineering
  • 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Today MySQL Cluster Overview MySQL Cluster 8.0 Data Distribution Asynchronous Model Cluster Virtual Machine
  • 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | MySQL Cluster Proven in serving billions of people every day when making phone calls, playing online games or handling financial transactions.
  • 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 6 Massive linear scale Always-On 99.9999% Availability Distributed In-Memory Datasets Always Consistent Parallel Real-Time Performance. Auto-partitioning, data distribution and replication built-in. Read- and Write Scale-Out to many TB on commodity hardware. Designed for mission critical systems. Masterless, shared-nothing with no single point of failure. Transactional consistency across distributed and partitioned dataset. Out of the box straightforward application programming. Ease of use Open Source Written in C++. Can be used standalone or with MySQL as a SQL front-end.
  • 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | MySQL Cluster 8.0 - MySQL Server 5.7 - 5x faster restarts - JSON Support - 50% faster reads - 40% faster read/write MySQL 7.4 / 7.5 MySQL 7.6 - MySQL Server 5.7 - Redesigned for Terabyte clusters - Redesigned for modern hardware - 50% faster joins - 100% faster scans - 1000% faster restarts MySQL 8.0 - Adopting data dictionary - Hundreds of nodes - 3 - 4 replicas - Larger rows Download and test 8.0.13 now https://dev.mysql.com/downloads/cluster/ DMR
  • 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | MySQL Cluster 7.6.7 - 2 x faster scans • example: Sysbench RO 0 2250 4500 6750 9000 1 2 4 8 16 32 64 128 256 384 512 RO 7.6.8 7.6.7 7.5.11 7.4.21
  • 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | MySQL Cluster 7.6.7 - 10 x faster restarts • example: restart with 600 DBT2 warehouses - 280M rows 7.5.11 7.6.6 7.6.7
  • 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | MySQL Cluster Architecture • Multiple data nodes form a cluster • Shared nothing • Data distributed to data nodes
  • 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Auto-partitioning and distribution • No name-node or central master • Each dataset is split into fragments and distributed across data nodes • Within a cluster data is always transactionally consistent Data distribution User-id (PK) Service (PK) Data 1773467253 chat xxx 6257346892 chat xxx 1773467253 photos xxx 7234782739 photos xxx 8235602099 reminders xxx 8437829249 location xxx MySQL Cluster Data Nodes Partition Key
  • 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Dataset distribution to virtual partitions • Hash on partition key as default • No re-hashing for data re- organisation Data distribution User-id (PK) Service (PK) Data 1773467253 chat xxx 6257346892 chat xxx 1773467253 photos xxx 7234782739 photos xxx 8235602099 reminders xxx 8437829249 location xxx Partition Key
  • 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data distribution • Cluster uses thousands of virtual partitions • Distributed to data nodes and within data nodes
  • 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data distribution • These virtual partitions are distributed to data nodes
  • 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data distribution
  • 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data distribution
  • 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data distribution
  • 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Global consistent view of distributed data User-id (PK) Service (PK) Data 1773467253 chat xxx 6257346892 chat xxx 1773467253 photos xxx 7234782739 photos xxx 8235602099 reminders xxx 8437829249 location xxx Partition Key
  • 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Across one or many parallel MySQL Servers and API nodes Global consistent view of distributed data
  • 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Data distribution awareness • Key-value with hash on primary key • Complemented by ordered in-memory- optimised T-Tree indexes for fast searches • Cluster always knows where its data is - without a name node
  • 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Parallel cross-partition queries •Parallel execution on the data nodes and within data nodes • Utilizes up to 64 cores efficiently • Parallelizes all work even on single queries • Batches automatically • Event driven and asynchronous • Always consistent access to the entire distributed and partitioned dataset
  • 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Just mentioning - not topic of these slides Cluster and High Availability Copy of 1 Copy of 1 • Multiple copies of data are maintained for availability • A group of data nodes shares the same data • 1 - 4 replicas/copies of data can be configured
  • 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Replication locally and between physical locations • With awareness of data locality and availability domains for cloud Cluster Data Nodes Replication
  • 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Reading and writing data
  • 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Asynchronous design for reads and writes • Tasks organized in working blocks • Signals for communication between blocks • Job buffers for each working block • Executed in cluster VM • Cluster virtual machine kernel Working 
 „block“ Job 
 queue Signals
 (Events) • Transaction 
 Coordination • Data Manager • …
  • 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Single asynchronous simple read • A single transaction is coordinated by one Transaction Coordinator • One Data Manager per partition Transaction Coordinator (TC) Data Managers (DM) Async Signals
  • 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Single asynchronous simple read - distributed • Signals are send between blocks inside and between data nodes Transaction Coordinator (TC) Data Manager (DM) Data Node Data Node
  • 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Multiple asynchronous simple reads Transaction Coordinator (TC) Data Managers (DM)
  • 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Single asynchronous table scan Transaction Coordinator (TC) Data Managers (DM) • Scanning multiple partitions in parallel
  • 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | In a bit more detail LQH ACC TUP TUX API TC Scan request Stored procedure request Next scan request prepareTupleKeyRequest Next scan confirmation Tuple key request TransAI
  • 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cluster virtual machine
  • 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Cost of CPU context switches - around 3 - 5us Context switch
  • 33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Multiple Threads per data node • Each thread handles some blocks • Parallel execution of • … multiple queries from … • … multiple users on … • … multiple MySQL Servers • Communication with signals • Goal: minimize context switching Virtual machine model NIC Main thread Data manager Receive Send Disk/SSD IO
  • 34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Data is partitioned inside the data nodes • Communication asynchronously, event driven on cluster’s VM • No group communication - instead using • Distributed row locks • Non-blocking 2-phase commit Lock free and multi core VM NIC Main thread Data manager Receive Send Disk/SSD IO
  • 35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Concurrent requests in multithreaded Virtual Machine • Out of order execution possible Transaction Coordinator (TC) Data Managers (DM)
  • 36. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Multithreaded Virtual Machine • Each partition can be managed in its own thread via Data Manager • There can be separate threads for disks, sending and receiving over network as well • Threads can be distributed and locked to CPU cores for real time Data manager threads Data nodes with 
 data partitions
  • 37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cross partition joins •Parallel execution across data nodes and within data nodes •Cluster queries distributed data as if it was a single consolidated database • Joins are pushed down to data nodes • Result consolidation in MySQL Server • Parallel joins with MySQL Server query engine
  • 38. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Massive parallel system executing parallel queries Receive Send Transaction Data Manager Data Node Receive Send Transaction Data Manager Data Node
  • 39. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Some conclusions • Asynchronous event driven model • Very hard to establish (distributed, out of order execution, …) • Once established it is easy to extend (create new blocks, signals) • Very simple to split, scale and multithread as blocks already communicate asynchronously • Allows to parallelize even single queries over multiple machines and cores • Network is the bottleneck • Virtual machine model helps to avoid CPU overhead • Context switches • Memory