SlideShare a Scribd company logo
Distributed RDBMS 
Data Distribution Policy: Part 1 
What is a data distribution policy? 
October 2014
2 
Data Distribution Policy: Part 1 
Distributed RDBMSs provide many scalability, availability 
and performance advantages. 
But how do you “distribute” data? This presentation 
gives you a practical understanding of key issues to a 
successful distributed RDBMS. 
The presentation explores: 
• What a data distribution policy is 
• The challenges faced when data is distributed via sharding 
• What defines a good data distribution policy 
• The best way to distribute data for your application and 
workload
3 
Why is a Distributed Relational Database Good? 
Distributed relational databases are a perfect match for 
Cloud computing models and distributed Cloud 
infrastructure. 
They are the way forward for delivering web scale 
applications and keeping ACID properties. 
• Social apps 
• Games 
• Many concurrent users 
• High transaction throughput 
• Very large data volumes
4 
What Is a Data Distribution Policy? 
A data distribution policy describes the rules under 
which data is distributed. 
A policy that matches your application’s unique workflow 
will give you critical web scale benefits, including: 
• Endless scalability 
• High availability 
• Geo-location of data near user populations 
• Multi-tenancy 
• Archiving capabilities 
• Data tiering
NOTE: A poorly conceived data distribution policy will: 
• Degrade system performance 
• Use more system resources 
• Cause you maintenance problems 
This presentation outlines attributes of good data 
distribution policies. 
5 
Data Distribution Must Match App Workflow
3 Key Questions about a Distributed RDBMS 
1. How is data distributed in a distributed RDBMS? 
2. What is the best way to distribute data for “my unique 
6 
application”? 
3. How do I retune my distributed database for optimal 
performance as my application evolves and usage 
patterns change? 
Answer: This is all managed through your data distribution 
policy.
What about Sharding? 
Sharding is the old way to create a distributed database. 
In the past, developers needed to program data distribution 
logic into their actual applications in order to distribute data 
across an array of linked databases. 
Consequently, sharding was born, which entailed: 
• Splitting up databases into slices of data 
• Running every read or write through new custom-built 
7 
application code in order to place and locate bits of data
8 
Sharding Challenges 
Some great work was accomplished using sharding, but it’s 
slow and detailed work, and it creates major challenges, 
including: 
1. Increasingly difficult operational issues, such as backup, 
adding indexes, and changing schemas 
2. Checking that query paths actually yield accurate results 
Explore more details on sharding challenges: 
• “Top 10 DIY MySQL Sharding Challenges” 
• “Database Scalability: The Sharding Conflict”
So, What Makes a Good Data Distribution Policy? 
1. Even and predictable workload 
9 
distribution across the clusters in your 
distributed database 
2. Immense scalability and availability 
3. The ability to handle more concurrent 
users, higher transaction throughput, 
and bigger volumes of data 
All benefits are all lost with a poorly 
conceived data distribution policy that 
does not align to your application’s 
unique usage and workloads.
Problem: When a Single Instance Database 
Reaches Its Limit 
Imagine we have a single database that is starting to 
exhibit signs of reaching its capacity limits. 
Its throughput becomes unpredictable and users become 
frustrated waiting for queries to be processed. 
10
Solution: Evolving to a Distributed Database 
The best way to improve the situation is to evolve to a 
distributed RDBMS, which would result in: 
• Evenly dividing the total workload across an array of 
11 
database clusters 
• A decreased number of queries that any particular 
database cluster (or shard) receives 
• Minimizing the cross-database chatter (from cluster to 
cluster, or shard to shard), so that each transaction can 
be completed within a single cluster in a single fetch/trip 
Recommended reading: 
• “Challenges in Querying a Distributed Relational Database” 
for more information.
Example of a Good Distribution Policy 
With 1,000,000 transactions equally spread across four 
database clusters: 
• We want to minimize cross-database chatter (cluster to 
12 
cluster), and 
• Ensure that a specific transaction or query can complete 
within a specific database and in a single fetch/trip.
Example of a Bad Distribution Policy 
A bad data distribution policy does not respect how the 
data is actually used, and can make matters worse. 
Each transaction or query has to access or collect data 
from multiple clusters, therefore increasing the overall 
workload. 
13
Data Distribution Policy Summary 
14 
Data Distribution Policy 
Bad Data Distribution Policy Good Data Distribution Policy 
The load isn't distributed – it’s multiplied! Distributes the workload evenly across 
available resources 
Doesn’t scale Distributes the sessions 
Adding an additional DB does NOT reduce 
the overall workload 
Delivers linear scalability 
The limitation of a single DB becomes the 
limitation of the entire array 
Adding another database, increases the 
overall scale potential of the distributed 
database 
When queries need data from multiple 
DBs, transactions must commit multiple 
separate DBs (2PC) before completing. This 
adds a lot of overhead to each Commit. 
Queries complete using data from a single, 
smaller database. This reduces a lot of 
overhead to any Commits.
What Is the Best Way to Distribute Data for Your 
Applications and Workloads? 
Unless we distribute data intelligently and aligned to 
application requirements, we will not achieve any benefit. 
Actually, things can become worse than before. 
Data must be distributed across a cluster of smaller 
databases in a way that maintains relational integrity, two-phase 
15 
commit and rollback. 
The natural question we are lead to ask is: 
“OK, So what is the best way to distribute data for my 
applications and my workloads?” 
This is answered in PART 2 of this Distributed RDBMS 
Data Distribution Policy slide presentation.
Additional Distributed RDBMS Resources 
To develop a custom made data distribution policy for your 
RDBMS and application, look for Part 2 of this slide 
presentation. 
We also recommend the following resources: 
• Four table Types You Need To Know To Scale Your 
16 
Relational Database 
• Distributed Databases and Cascading Tables 
• Discover your Application Scalability Score with 
ScaleBase Analysis Genie 
• Optimizing Sharding Policies to Scale Out MySQL – 
Choosing the Best Data Distribution Policy (whitepaper)
ScaleBase Software 
• ScaleBase is a distributed database built on MySQL and 
17 
optimized for the cloud. It deploys in minutes so your 
database can handle an unlimited number of users, 
humongous volumes of data, and faster transactions. 
• It dynamically optimizes workloads and availability by 
logically distributing data across public, private, and geo-distributed 
clouds.
ScaleBase Software 
18 
“What differentiates ScaleBase is its ability 
to add scalability without the need to migrate 
to new database architecture or make any 
changes to existing applications” 
- Matt Aslett, The 451 Group 
“ScaleBase allows us to effectively scale, 
without downtime, and without having to 
rewrite our application.” 
- Sheeri Cabral, Mozilla
Try ScaleBase Today 
ScaleBase software is available for free: 
• ScaleBase Website 
• Amazon Marketplace 
• Rackspace Marketplace 
• IBM Cloud marketplace 
• ScaleBase’s free online Analysis Genie service 
AWS Marketplace Guide and a AWS Getting Started 
Tutorial are available from the documentation section of the 
ScaleBase website. 
19 
Contact ScaleBase 
sales@scalebase.com
Data Distribution Policy: Part 2 and 3 
Data Distribution Policy Part 2: 
• The different approaches to data distribution 
• How to create your own data distribution policy, whether you 
20 
are scaling an existing application or creating a new app. 
• How ScaleBase can help you create your policy 
Data Distribution Policy Part 3: 
• Three stages of your data distribution policy’s lifecycle. 
• Adapting the distributed RDBMS to match application changes. 
• Ensuring that your distributed relational database is flexible and 
elastic enough to accommodate endless growth and change.
Distributed RDBMS 
Data Distribution Policy: Part 1 
October 2014

More Related Content

What's hot (20)

PDF
Service Mesh Talk for CTO Forum
Rick Hightower
 
PDF
Open Development
Medsphere
 
PDF
Architecting for the cloud storage misc topics
Len Bass
 
PDF
Big Data using NoSQL Technologies
Amit Singh
 
PPTX
What is Cloud DBMS?
Bhaskara Reddy Sannapureddy
 
PDF
Designing For Occasionally Connected Apps Slideshare
Dean Willson
 
PDF
Guide to NoSQL with MySQL
Samuel Rohaut
 
PDF
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
PDF
Building Data Warehouse in SQL Server
Antonios Chatzipavlis
 
PPTX
Pervasive analytics through data & analytic centricity
Cloudera, Inc.
 
PPTX
Data Warehouse Optimization
Cloudera, Inc.
 
PPTX
Dedup with hadoop
Neeta Pande
 
PDF
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
Vasu S
 
PPTX
Introduction to Microsoft’s Master Data Services (MDS)
James Serra
 
PPTX
SnapLogic Cloud Integration
SnapLogic
 
PDF
5 Steps for Architecting a Data Lake
MetroStar
 
PPT
Migration services (DB2 to Teradata)
ModakAnalytics
 
DOC
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
djkucera
 
PDF
Building a data warehouse of call data records
David Walker
 
PDF
Datavail Health Check
Datavail
 
Service Mesh Talk for CTO Forum
Rick Hightower
 
Open Development
Medsphere
 
Architecting for the cloud storage misc topics
Len Bass
 
Big Data using NoSQL Technologies
Amit Singh
 
What is Cloud DBMS?
Bhaskara Reddy Sannapureddy
 
Designing For Occasionally Connected Apps Slideshare
Dean Willson
 
Guide to NoSQL with MySQL
Samuel Rohaut
 
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Building Data Warehouse in SQL Server
Antonios Chatzipavlis
 
Pervasive analytics through data & analytic centricity
Cloudera, Inc.
 
Data Warehouse Optimization
Cloudera, Inc.
 
Dedup with hadoop
Neeta Pande
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
Vasu S
 
Introduction to Microsoft’s Master Data Services (MDS)
James Serra
 
SnapLogic Cloud Integration
SnapLogic
 
5 Steps for Architecting a Data Lake
MetroStar
 
Migration services (DB2 to Teradata)
ModakAnalytics
 
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
djkucera
 
Building a data warehouse of call data records
David Walker
 
Datavail Health Check
Datavail
 

Viewers also liked (17)

PPTX
Data sync on iOS with Couchbase Mobile
Thiago Alencar
 
PPT
Tối ưu hóa việc ghi dữ liệu với Gearman
Minh Nguyen Vo Cao
 
PDF
Low Latency Logging with RabbitMQ (PHP London - 4th Sep 2014)
James Titcumb
 
PPTX
PHP Performance with APC + Memcached
Ford AntiTrust
 
PPT
Gearman and asynchronous processing in PHP applications
Teamskunkworks
 
PPTX
Gearman, Supervisor and PHP - Job Management with Sanity!
Abu Ashraf Masnun
 
KEY
Scale like a pro with Gearman
Amal Raghav
 
PDF
Gearman for MySQL
Giuseppe Maxia
 
PPTX
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase
 
PPTX
Creating a WebSocket-Chat-Application with Jetty Embedded - Techcamp 2014
Minh Nguyen Vo Cao
 
PDF
Distributed Queue System using Gearman
Eric Cho
 
PDF
Methods of Sharding MySQL
Laine Campbell
 
PPTX
Scaling php
David Trần
 
PDF
Dev and Ops Collaboration and Awareness at Etsy and Flickr
John Allspaw
 
PDF
Distributed Systems
Paulo Gandra de Sousa
 
PPT
Distributed Database Management System
Hardik Patil
 
PDF
Gearman: A Job Server made for Scale
Mike Willbanks
 
Data sync on iOS with Couchbase Mobile
Thiago Alencar
 
Tối ưu hóa việc ghi dữ liệu với Gearman
Minh Nguyen Vo Cao
 
Low Latency Logging with RabbitMQ (PHP London - 4th Sep 2014)
James Titcumb
 
PHP Performance with APC + Memcached
Ford AntiTrust
 
Gearman and asynchronous processing in PHP applications
Teamskunkworks
 
Gearman, Supervisor and PHP - Job Management with Sanity!
Abu Ashraf Masnun
 
Scale like a pro with Gearman
Amal Raghav
 
Gearman for MySQL
Giuseppe Maxia
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase
 
Creating a WebSocket-Chat-Application with Jetty Embedded - Techcamp 2014
Minh Nguyen Vo Cao
 
Distributed Queue System using Gearman
Eric Cho
 
Methods of Sharding MySQL
Laine Campbell
 
Scaling php
David Trần
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
John Allspaw
 
Distributed Systems
Paulo Gandra de Sousa
 
Distributed Database Management System
Hardik Patil
 
Gearman: A Job Server made for Scale
Mike Willbanks
 
Ad

Similar to Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribution Policy (20)

PPTX
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
Vladi Vexler
 
PPT
Hadoop
Mallikarjuna G D
 
PPTX
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
PPTX
Database-Management-Systems-An-Introduction (1).pptx
DinarRoe
 
PDF
MySQL Cluster no PayPal
MySQL Brasil
 
PPTX
DDBMS
Ravinder Kamboj
 
PPTX
Big Data Overview 2013-2014
KMS Technology
 
PPT
Chapter 05
Ahmed Gamal
 
PPTX
CST204 DBMSMODULE1 PPT (1).pptx
MEGHANA508383
 
PPT
Parallel&DistributedDatabase.ppt
Funnyclips17
 
PPT
pddb.ppt
Eyersu Selemon
 
PPTX
Distributed Databases - Concepts & Architectures
Daniel Marcous
 
PPTX
Distributed DBMS - Unit 1 - Introduction
Gyanmanjari Institute Of Technology
 
PPTX
CodeFutures - Scaling Your Database in the Cloud
RightScale
 
PPT
Database Concepts 101
Amit Garg
 
PPTX
Pmit 6102-14-lec1-intro
Jesmin Rahaman
 
PPTX
Challenges in Querying a Distributed Relational Database
ScaleBase
 
PPT
DBMS
akshaythusoo
 
PPTX
INTRODUCTION TO DATABASE
CS_GDRCST
 
PPT
ch 5 Data Resource Management Data Resource Management .ppt
qalanderhayat
 
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
Vladi Vexler
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
Database-Management-Systems-An-Introduction (1).pptx
DinarRoe
 
MySQL Cluster no PayPal
MySQL Brasil
 
Big Data Overview 2013-2014
KMS Technology
 
Chapter 05
Ahmed Gamal
 
CST204 DBMSMODULE1 PPT (1).pptx
MEGHANA508383
 
Parallel&DistributedDatabase.ppt
Funnyclips17
 
pddb.ppt
Eyersu Selemon
 
Distributed Databases - Concepts & Architectures
Daniel Marcous
 
Distributed DBMS - Unit 1 - Introduction
Gyanmanjari Institute Of Technology
 
CodeFutures - Scaling Your Database in the Cloud
RightScale
 
Database Concepts 101
Amit Garg
 
Pmit 6102-14-lec1-intro
Jesmin Rahaman
 
Challenges in Querying a Distributed Relational Database
ScaleBase
 
INTRODUCTION TO DATABASE
CS_GDRCST
 
ch 5 Data Resource Management Data Resource Management .ppt
qalanderhayat
 
Ad

More from ScaleBase (8)

PPTX
Database Scalability - The Shard Conflict
ScaleBase
 
PDF
ScaleBase Webinar: Strategies for scaling MySQL
ScaleBase
 
PDF
Scaling MySQL: Catch 22 of Read Write Splitting
ScaleBase
 
PDF
Scaling MySQL: Benefits of Automatic Data Distribution
ScaleBase
 
PDF
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
ScaleBase
 
PDF
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase
 
PPTX
ScaleBase Backs Mozilla's new app store
ScaleBase
 
PDF
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase
 
Database Scalability - The Shard Conflict
ScaleBase
 
ScaleBase Webinar: Strategies for scaling MySQL
ScaleBase
 
Scaling MySQL: Catch 22 of Read Write Splitting
ScaleBase
 
Scaling MySQL: Benefits of Automatic Data Distribution
ScaleBase
 
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
ScaleBase
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase
 
ScaleBase Backs Mozilla's new app store
ScaleBase
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase
 

Recently uploaded (20)

PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Executive Business Intelligence Dashboards
vandeslie24
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 

Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribution Policy

  • 1. Distributed RDBMS Data Distribution Policy: Part 1 What is a data distribution policy? October 2014
  • 2. 2 Data Distribution Policy: Part 1 Distributed RDBMSs provide many scalability, availability and performance advantages. But how do you “distribute” data? This presentation gives you a practical understanding of key issues to a successful distributed RDBMS. The presentation explores: • What a data distribution policy is • The challenges faced when data is distributed via sharding • What defines a good data distribution policy • The best way to distribute data for your application and workload
  • 3. 3 Why is a Distributed Relational Database Good? Distributed relational databases are a perfect match for Cloud computing models and distributed Cloud infrastructure. They are the way forward for delivering web scale applications and keeping ACID properties. • Social apps • Games • Many concurrent users • High transaction throughput • Very large data volumes
  • 4. 4 What Is a Data Distribution Policy? A data distribution policy describes the rules under which data is distributed. A policy that matches your application’s unique workflow will give you critical web scale benefits, including: • Endless scalability • High availability • Geo-location of data near user populations • Multi-tenancy • Archiving capabilities • Data tiering
  • 5. NOTE: A poorly conceived data distribution policy will: • Degrade system performance • Use more system resources • Cause you maintenance problems This presentation outlines attributes of good data distribution policies. 5 Data Distribution Must Match App Workflow
  • 6. 3 Key Questions about a Distributed RDBMS 1. How is data distributed in a distributed RDBMS? 2. What is the best way to distribute data for “my unique 6 application”? 3. How do I retune my distributed database for optimal performance as my application evolves and usage patterns change? Answer: This is all managed through your data distribution policy.
  • 7. What about Sharding? Sharding is the old way to create a distributed database. In the past, developers needed to program data distribution logic into their actual applications in order to distribute data across an array of linked databases. Consequently, sharding was born, which entailed: • Splitting up databases into slices of data • Running every read or write through new custom-built 7 application code in order to place and locate bits of data
  • 8. 8 Sharding Challenges Some great work was accomplished using sharding, but it’s slow and detailed work, and it creates major challenges, including: 1. Increasingly difficult operational issues, such as backup, adding indexes, and changing schemas 2. Checking that query paths actually yield accurate results Explore more details on sharding challenges: • “Top 10 DIY MySQL Sharding Challenges” • “Database Scalability: The Sharding Conflict”
  • 9. So, What Makes a Good Data Distribution Policy? 1. Even and predictable workload 9 distribution across the clusters in your distributed database 2. Immense scalability and availability 3. The ability to handle more concurrent users, higher transaction throughput, and bigger volumes of data All benefits are all lost with a poorly conceived data distribution policy that does not align to your application’s unique usage and workloads.
  • 10. Problem: When a Single Instance Database Reaches Its Limit Imagine we have a single database that is starting to exhibit signs of reaching its capacity limits. Its throughput becomes unpredictable and users become frustrated waiting for queries to be processed. 10
  • 11. Solution: Evolving to a Distributed Database The best way to improve the situation is to evolve to a distributed RDBMS, which would result in: • Evenly dividing the total workload across an array of 11 database clusters • A decreased number of queries that any particular database cluster (or shard) receives • Minimizing the cross-database chatter (from cluster to cluster, or shard to shard), so that each transaction can be completed within a single cluster in a single fetch/trip Recommended reading: • “Challenges in Querying a Distributed Relational Database” for more information.
  • 12. Example of a Good Distribution Policy With 1,000,000 transactions equally spread across four database clusters: • We want to minimize cross-database chatter (cluster to 12 cluster), and • Ensure that a specific transaction or query can complete within a specific database and in a single fetch/trip.
  • 13. Example of a Bad Distribution Policy A bad data distribution policy does not respect how the data is actually used, and can make matters worse. Each transaction or query has to access or collect data from multiple clusters, therefore increasing the overall workload. 13
  • 14. Data Distribution Policy Summary 14 Data Distribution Policy Bad Data Distribution Policy Good Data Distribution Policy The load isn't distributed – it’s multiplied! Distributes the workload evenly across available resources Doesn’t scale Distributes the sessions Adding an additional DB does NOT reduce the overall workload Delivers linear scalability The limitation of a single DB becomes the limitation of the entire array Adding another database, increases the overall scale potential of the distributed database When queries need data from multiple DBs, transactions must commit multiple separate DBs (2PC) before completing. This adds a lot of overhead to each Commit. Queries complete using data from a single, smaller database. This reduces a lot of overhead to any Commits.
  • 15. What Is the Best Way to Distribute Data for Your Applications and Workloads? Unless we distribute data intelligently and aligned to application requirements, we will not achieve any benefit. Actually, things can become worse than before. Data must be distributed across a cluster of smaller databases in a way that maintains relational integrity, two-phase 15 commit and rollback. The natural question we are lead to ask is: “OK, So what is the best way to distribute data for my applications and my workloads?” This is answered in PART 2 of this Distributed RDBMS Data Distribution Policy slide presentation.
  • 16. Additional Distributed RDBMS Resources To develop a custom made data distribution policy for your RDBMS and application, look for Part 2 of this slide presentation. We also recommend the following resources: • Four table Types You Need To Know To Scale Your 16 Relational Database • Distributed Databases and Cascading Tables • Discover your Application Scalability Score with ScaleBase Analysis Genie • Optimizing Sharding Policies to Scale Out MySQL – Choosing the Best Data Distribution Policy (whitepaper)
  • 17. ScaleBase Software • ScaleBase is a distributed database built on MySQL and 17 optimized for the cloud. It deploys in minutes so your database can handle an unlimited number of users, humongous volumes of data, and faster transactions. • It dynamically optimizes workloads and availability by logically distributing data across public, private, and geo-distributed clouds.
  • 18. ScaleBase Software 18 “What differentiates ScaleBase is its ability to add scalability without the need to migrate to new database architecture or make any changes to existing applications” - Matt Aslett, The 451 Group “ScaleBase allows us to effectively scale, without downtime, and without having to rewrite our application.” - Sheeri Cabral, Mozilla
  • 19. Try ScaleBase Today ScaleBase software is available for free: • ScaleBase Website • Amazon Marketplace • Rackspace Marketplace • IBM Cloud marketplace • ScaleBase’s free online Analysis Genie service AWS Marketplace Guide and a AWS Getting Started Tutorial are available from the documentation section of the ScaleBase website. 19 Contact ScaleBase [email protected]
  • 20. Data Distribution Policy: Part 2 and 3 Data Distribution Policy Part 2: • The different approaches to data distribution • How to create your own data distribution policy, whether you 20 are scaling an existing application or creating a new app. • How ScaleBase can help you create your policy Data Distribution Policy Part 3: • Three stages of your data distribution policy’s lifecycle. • Adapting the distributed RDBMS to match application changes. • Ensuring that your distributed relational database is flexible and elastic enough to accommodate endless growth and change.
  • 21. Distributed RDBMS Data Distribution Policy: Part 1 October 2014