SlideShare a Scribd company logo
Demystifying the
Distributed Database
Landscape
A survey of technologies
Getting Started with ScyllaDB
Join our
next ScyllaDB
Virtual Workshop!
scylladb.com/webinars
2
April 28, 2022 | 10AM PT | 1PM ET | 6PM GMT
Poll
Where are you in your NoSQL
Adoption?
3
Demystifying the
Distributed Database
Landscape
A survey of technologies
5
+ For distributed, data-intensive apps that require high
performance and low latency
+ 400+ users worldwide
+ Results
+ Comcast: Reduced P99 latencies by 95%
+ FireEye: 1500% improvement in throughput
+ Discord: Reduced C* nodes from ~140 to 6
+ iFood: 9X cost reduction vs. DynamoDB
+ Open Source, Enterprise and Cloud options
+ Fully compatible with Apache Cassandra and Amazon
DynamoDB
About ScyllaDB
1ms <1ms
10ms
1M
10M
ScyllaDB Universe of 400+ Users
400+ Companies Use ScyllaDB
Seamless experiences
across content + devices
Fast computation of flight
pricing
Corporate fleet
management
Real-time analytics
2,000,000 SKU -commerce
management
Real-time location tracking
for friends/family
Video recommendation
management
IoT for industrial
machines
Synchronize browser
properties for millions
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
6
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Distributed storage for
distributed ledger tech
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Peter Corless
7
Director of Technical Advocacy @ ScyllaDB
+ Listen to & share user stories
+ Write blogs & case studies
+ Play (and design) strategy & roleplaying games
+ @PeterCorless on Twitter
This Next Tech Cycle
The wave of innovation we’re currently riding.
8
Hardware, software,
and methodologies are all
co-evolving to create this
next tech cycle.
9
This Next Tech Cycle
2000 2010
2020 2025+
Transistor
Count
42M
Pentium 4
(2000)
228M
Pentium D
(2005)
2.3B
Xeon Nahalem-EX
(2010)
10B
SPARC M7
(2015)
39B
Epyc Rome
(2019)
Core
Count 1 2 8 32 64
~60B?
Epyc Genoa
(2022)
96
~80B?
Epyc Bergamo
(2023)
128
1.2 ZB
IP traffic
(2016)
2 ZB
Data stored
(2010)
64 ZB
Data stored
(2020)
Broadband
Speeds
3G
(2002)
105mbps
(2014)
1.5 mbps
(2002)
16 mbps
(2008)
Wireless
Services
3Gbps
(2021)
1Gbps
(2018)
4G
(2014)
5G
(2018)
Zettabyte
Era
~180 ZB
Data stored
(2025)
Public Cloud
to Multicloud
AWS
(2006)
GCP
(2008)
Azure
(2010)
1021
10
Azure Arc
11
+ Compute
+ From 100+ cores → 1,000+ cores per server
+ From multicore CPUs → full System on a Chip (SoC) designs (CPU, GPU, Cache, Memory)
+ Memory
+ Terabyte-scale RAM per server
+ DDR5 — 4600 MHz in 2020, 8000 MHz by 2024
+ DDR6 — 9600 MHz by 2025
+ Persistent memory — memory mode
+ Storage
+ Petabyte-scale storage per server
+ NVMe 2.0 [2021] — separation of base and transport
+ Persistent memory — app direct (storage) mode
Hardware Still Vertically Scaling
Hybrid & Multi-cloud is Now-ish
12
Azure Arc
13
+ Agile [c. 2000]
+ Microservices Architecture [2005]
+ CI/CD = CI [1991] + CD [2009]
+ DevOps [2009]
+ Chaos Monkey [2011]
+ Kubernetes [2014]
+ GitOps [2017]
+ DevSecOps [2018]
Methodologies Still Evolving
How It Started
How It’s Going
How It Evolved
14
+ <1 terabyte
+ 1 to 50 terabytes
+ 50-100 terabytes
+ >100 terabytes
How much data do you have under management in your own
transactional database systems?
Poll Question
15
The Distributed
Database Landscape
Here there be monstrous databases!
16
Distributed Database Landscape
2021
SQL
+ Distributed SQL
+ NewSQL
NoSQL
+ Key-value
+ Document
+ Wide-column
+ Graph
Multi-model
+ SQL + NoSQL
+ Multiple NoSQL
Production Environments
+ On-premises
+ Co-location
+ Public cloud
+ Private cloud
+ Hybrid cloud
+ Multicloud
+ Edge
+ IoT / Embedded
Business / Use Models
+ Open Source License
+ Enterprise License
+ OEM License
+ Service Agreements
Use Cases
+ OLTP
+ OLAP
+ HTAP
+ Time Series
17
Top 100 Databases
(and Database-like systems)
on DB-Engines.com ranking
[as of April 2022]
+ 47 SQL
+ 25 NoSQL
+ 11 Multimodel
(multiple NoSQL models
and/or SQL + NoSQL)
+ 7 Search Engines
+ 5 Time Series
+ 5 Other
18
DB-Engines.com Top 100 Databases
19
“Well…”
Are all of the Top 100 “Distributed Databases?”
20
+ Clustering & Distribution Strategies
+ Local clustering — multiple nodes in the same datacenter share updates
+ Cross-cluster updates — multiple clusters can share data between them
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster
+ Node Roles, High Availability & Failover Strategies
+ Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes)
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)
+ Load balancing (client side or service in front of database)
+ Data Replication & Sharding Strategies
+ Replication Factors & Consistency Levels
+ Horizontal Scalability: Manual vs. Auto-sharding
+ Topology Awareness: Rack-awareness, Datacenter-awareness
What do you mean by a “Distributed Database?”
21
The Short List: Systems of Interest
SQL + NewSQL NoSQL
PostgreSQL MongoDB
CockroachDB Redis
ScyllaDB (Cassandra)
Just in case: SQL vs. NoSQL
Clustering & Locality: Topology Awareness
22
1
2
3
1
2
3
1
2
3
Cross-Cluster Updates or
Multi-Datacenter Clustering
Non-Rack Aware
Can have multiple nodes
in same rack
Rack Aware
Distribute nodes evenly across all
available racks in a datacenter
Zone and Datacenter Aware
Provides survivability across geography
Reduces local latencies
Considerations for data localization
Local Clustering
23
Support Multi-Datacenter Clustering?
SQL + NewSQL NoSQL
PostgreSQL MongoDB
CockroachDB Redis
ScyllaDB (Cassandra)
Designed for multi-datacenter
Designed for single-datacenter;
capable of multi-datacenter
24
Clustering & Replication: Primary-Replica Set
vs. Peer-to-Peer (Leaderless)
Primary-Replica (Multiple Replicas)
Only primary accepts writes; secondaries are read-only
Replication is “fan out” from primary to replicas
Write-heavy workloads can tax the primary
Peer-to-Peer Active-Active (Multi-Datacenter)
Each node accepts reads+writes
Inherently better load balancing
Deals better w/ write-heavy or mixed read-write workloads
R+W
R+W
Read
Only
Read
Only
Replication
Servers
Clients
Clients
Servers
ScyllaDB
MongoDB
25
Support Active-Active vs. Primary-Replica
SQL + NewSQL NoSQL
PostgreSQL MongoDB
CockroachDB Redis
ScyllaDB (Cassandra)
Active-Active
Primary-Replica; active-active only
through optional solutions
26
Clustering: Cross-Cluster Updates vs.
Multi-Datacenter Replication
Primary-Replica (Multiple Replicas)
Only primary accepts writes; secondaries are read-only
Source: MongoDB Source: ScyllaDB
Peer-to-Peer Active-Active (Multi-Datacenter)
Each node can accept reads and writes; leaderless
RF=3
RF=2
27
Topology Awareness
SQL + NewSQL NoSQL
PostgreSQL MongoDB
CockroachDB Redis
ScyllaDB (Cassandra)
Topology Aware
Not built-in, but can be added
28
PostgreSQL — distributed SQL
+ Clustering & Distribution Strategies
+ Local clustering — multiple nodes in the same datacenter share updates
+ Cross-cluster updates — multiple clusters can share data between them
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster
+ Node Roles, High Availability & Failover Strategies
+ Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes)
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)
+ Load balancing (client side or service in front of database)
+ Data Replication & Sharding Strategies
+ Replication Factors & Consistency Levels
+ Horizontal Scalability: Manual Sharding vs. Auto-sharding
+ Topology Awareness: Rack-awareness, Datacenter-awareness
Part of base offering
Can be added, but not part of base
29
CockroachDB — NewSQL
+ Clustering & Distribution Strategies
+ Local clustering — multiple nodes in the same datacenter share updates
+ Cross-cluster updates — multiple clusters can share data between them
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster
+ Node Roles, High Availability & Failover Strategies
+ Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes)
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)
+ Load balancing (client side or service in front of database)
+ Data Replication & Sharding Strategies
+ Replication Factors & Consistency Levels
+ Horizontal Scalability: Manual vs. Auto-sharding
+ Topology Awareness: Rack-awareness*, Datacenter-awareness
* Can be manually configured using localities
Part of base offering
Can be added, but not part of base
30
+ Clustering & Distribution Strategies
+ Local clustering — multiple nodes in the same datacenter share updates
+ Cross-cluster updates — multiple clusters can share data between them
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster
+ Node Roles, High Availability & Failover Strategies
+ Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes)
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)
+ Load balancing (client side or service in front of database)
+ Data Replication & Sharding Strategies
+ Replication Factors & Consistency Levels
+ Horizontal Scalability: Manual vs. Auto-sharding
+ Topology Awareness: Rack-awareness, Datacenter-awareness
MongoDB — the leading document store
Part of base offering
Can be added, but not part of base
31
+ Clustering & Distribution Strategies
+ Local clustering — multiple nodes in the same datacenter share updates
+ Cross-cluster updates — multiple clusters can share data between them
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster*
+ Node Roles, High Availability & Failover Strategies
+ Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes)
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)*
+ Load balancing (client side or service in front of database)
+ Data Replication & Sharding Strategies
+ Replication Factors & Consistency Levels (e.g., strong locally; causal consistency in active-active*)
+ Horizontal Scalability: Manual vs. Auto-sharding
+ Topology Awareness: Rack-awareness, Datacenter-awareness
Redis — key-value in-memory DB/cache
* Redis Enterprise feature
Part of base offering
Can be added, but not part of base
32
+ Clustering & Distribution Strategies
+ Local clustering — multiple nodes in the same datacenter share updates
+ Cross-cluster updates — multiple clusters can share data between them
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster
+ Node Roles, High Availability & Failover Strategies
+ Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes)
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)
+ Load balancing (client side or service in front of database*)
+ Data Replication & Sharding Strategies
+ Replication Factors & Consistency Levels
+ Horizontal Scalability: Manual vs. Auto-sharding
+ Topology Awareness: Rack-awareness, Datacenter-awareness
ScyllaDB
Part of base offering
* For DynamoDB-compatible API
33
But for now, let’s move on...
Where are Distributed
Databases Headed
Next?
Time to read the tea leaves
34
35
The Trend for SQL
+ Google Trends for “SQL”
are at 22% rate of 2004
+ Book citations for “SQL”
peaked in 2008 and
were down to 28% of
that rate by 2019
+ Back to 1995 levels of
interest, basically
+ Still dwarfs other
database terms like
“NoSQL” or “NewSQL” or
“RDBMS”
+ No single term or
technology sums up the
distributed database
market anymore
36
+ Cambrian Explosion will Continue — “What is a database anyway?”
+ Distributed Databases of all kinds
+ Distributed Streaming — “Kafka as a database?” (kSQL says “Yes!”)
+ Distributed Ledgers — “Blockchains/DAGs as a database?”
+ Further fragmentation of the market
+ NoSQL + SQL blending increasingly
+ Evolution of NoSQL back to SQL assumptions
+ Adding back Strong Consistency, Schema Constraints, Strict Typing
Where are Distributed Databases Going?
37
+ “Cloud Native” — What does that mean to you?
+ Elasticity — Faster provisioning/decommissioning, autoscaling
+ Serverless — “I don’t want to manage hardware; just give me an API.”
+ Uncoupling Compute from Storage — Tiered Storage, Plug-in Storage
+ Data over Time
+ Built for Event Streaming, Time Series
+ Data over Space
+ Geospatial queries, Geoindexing
+ Geographic / political boundaries — GDPR, data localization regulatory
compliance
Further Trends in Distributed Databases
38
+ Increasing Focus on Developer Enablement and Developer Experience (DX)
+ APIs for extensibility: extensions, plugins, modules, add-ons, integration layers
+ Database Specific: PostgreSQL extensions, Redis modules
+ Cross-industry: GraphQL, OpenAPI (Swagger), etc.
+ AI/ML integration and incorporation into databases
+ “Building models where your data resides” — Martin Heller (Apr 2021)
+ Amazon Redshift ML
+ BigQuery ML
+ Oracle, Db2, Microsoft SQL Server
Database as a Development Platform
39
+ Tighter Coupling of Data Engineering + Data Sciences + Operations
+ Repairing rifts of the past decade
+ Bridging huge divides between people and systems
+ From “Data Pipelining” (production-oriented) to...
+ “Data Supply Chains” (consumption-oriented)
+ Like “Software Supply Chain,” but for data and data products.
Data Teaming
40
+ Specializing databases to run in the cloud (and cloud-only)
+ Providing “concierge” services
+ Ecosystem: can integrate into cloud vendor’s (or partners’) offerings
+ Managed for you — at a price
+ Making Open Source databases easier to run on infrastructural level
+ Making self-managed operations simpler
+ Flexibility: can run on premises or in the cloud
+ Self-service model — so long as you have the skillz
We Need Different Kinds of “Easy”
Hope You Enjoyed Your Trip!
http://slack.scylladb.com/
41
+ Kostja Osipov
+ Serge Leontiev
Thanks
Any errors, omissions, misinterpretations,
misrepresentations or misunderstandings
are purely my own.
Please send suggestions and corrections
to peter@scylladb.com
People who helped educate me
Disclaimer
42
Poll
How much data do you have under
management in your own
transactional database?
43
United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
@PeterCorless
Q&A
Join our Next Virtual Workshop!
scylladb.com/webinars
Thank You! Stay in touch!

More Related Content

What's hot (20)

PDF
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
PPTX
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
PDF
A Reference Architecture for ETL 2.0
DataWorks Summit
 
PPTX
Kafka replication apachecon_2013
Jun Rao
 
PDF
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
PPTX
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking VN
 
PDF
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
PPTX
ProxySQL for MySQL
Mydbops
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
Log Structured Merge Tree
University of California, Santa Cruz
 
PDF
3D: DBT using Databricks and Delta
Databricks
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PPTX
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PDF
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
PDF
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Databricks
 
PPTX
Apache kafka 관리와 모니터링
JANGWONSEO4
 
PDF
MariaDB MaxScale monitor 매뉴얼
NeoClova
 
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Kafka replication apachecon_2013
Jun Rao
 
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking VN
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica
 
ProxySQL for MySQL
Mydbops
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Log Structured Merge Tree
University of California, Santa Cruz
 
3D: DBT using Databricks and Delta
Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
Apache Spark Architecture
Alexey Grishchenko
 
Parquet performance tuning: the missing guide
Ryan Blue
 
Cassandra Introduction & Features
DataStax Academy
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Databricks
 
Apache kafka 관리와 모니터링
JANGWONSEO4
 
MariaDB MaxScale monitor 매뉴얼
NeoClova
 

Similar to Demystifying the Distributed Database Landscape (DevOps) (1).pdf (20)

PDF
Demystifying the Distributed Database Landscape
ScyllaDB
 
PDF
Evolution of Distributed Database Technologies in the Digital era
Vishal Puri
 
PDF
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
PDF
MongoDB: What, why, when
Eugenio Minardi
 
ODP
Front Range PHP NoSQL Databases
Jon Meredith
 
PPTX
Beyond Jurassic NoSQL: New Designs for a New World
ScyllaDB
 
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
KEY
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
PDF
In-Memory Data Grids - Ampool (1)
Chinmay Kulkarni
 
PDF
Aerospike Hybrid Memory Architecture
Aerospike, Inc.
 
PPTX
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
PPT
SQL or NoSQL, that is the question!
Andraz Tori
 
PDF
Transforming the Database: Critical Innovations for Performance at Scale
ScyllaDB
 
PPTX
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
PPTX
MonogDB Admin 101 - MonogDBDays Munich
Marc Schwering
 
PPTX
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
PPTX
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
PPTX
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
PPTX
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
PDF
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Demystifying the Distributed Database Landscape
ScyllaDB
 
Evolution of Distributed Database Technologies in the Digital era
Vishal Puri
 
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
MongoDB: What, why, when
Eugenio Minardi
 
Front Range PHP NoSQL Databases
Jon Meredith
 
Beyond Jurassic NoSQL: New Designs for a New World
ScyllaDB
 
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
In-Memory Data Grids - Ampool (1)
Chinmay Kulkarni
 
Aerospike Hybrid Memory Architecture
Aerospike, Inc.
 
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
SQL or NoSQL, that is the question!
Andraz Tori
 
Transforming the Database: Critical Innovations for Performance at Scale
ScyllaDB
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
MonogDB Admin 101 - MonogDBDays Munich
Marc Schwering
 
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
Unit 1_Cloud Data Managementttttttt.pptx
ssuserbc695b
 
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
PDF
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
PDF
Leading a High-Stakes Database Migration
ScyllaDB
 
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
PDF
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Ad

Recently uploaded (20)

PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 

Demystifying the Distributed Database Landscape (DevOps) (1).pdf

  • 2. Getting Started with ScyllaDB Join our next ScyllaDB Virtual Workshop! scylladb.com/webinars 2 April 28, 2022 | 10AM PT | 1PM ET | 6PM GMT
  • 3. Poll Where are you in your NoSQL Adoption? 3
  • 5. 5 + For distributed, data-intensive apps that require high performance and low latency + 400+ users worldwide + Results + Comcast: Reduced P99 latencies by 95% + FireEye: 1500% improvement in throughput + Discord: Reduced C* nodes from ~140 to 6 + iFood: 9X cost reduction vs. DynamoDB + Open Source, Enterprise and Cloud options + Fully compatible with Apache Cassandra and Amazon DynamoDB About ScyllaDB 1ms <1ms 10ms 1M 10M ScyllaDB Universe of 400+ Users
  • 6. 400+ Companies Use ScyllaDB Seamless experiences across content + devices Fast computation of flight pricing Corporate fleet management Real-time analytics 2,000,000 SKU -commerce management Real-time location tracking for friends/family Video recommendation management IoT for industrial machines Synchronize browser properties for millions Threat intelligence service using JanusGraph Real time fraud detection across 6M transactions/day Uber scale, mission critical chat & messaging app 6 Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Unified ML feature store across the business Cryptocurrency exchange app Geography-based recommendations Distributed storage for distributed ledger tech Global operations- Avon, Body Shop + more Predictable performance for on sale surges GPS-based exercise tracking
  • 7. Peter Corless 7 Director of Technical Advocacy @ ScyllaDB + Listen to & share user stories + Write blogs & case studies + Play (and design) strategy & roleplaying games + @PeterCorless on Twitter
  • 8. This Next Tech Cycle The wave of innovation we’re currently riding. 8
  • 9. Hardware, software, and methodologies are all co-evolving to create this next tech cycle. 9
  • 10. This Next Tech Cycle 2000 2010 2020 2025+ Transistor Count 42M Pentium 4 (2000) 228M Pentium D (2005) 2.3B Xeon Nahalem-EX (2010) 10B SPARC M7 (2015) 39B Epyc Rome (2019) Core Count 1 2 8 32 64 ~60B? Epyc Genoa (2022) 96 ~80B? Epyc Bergamo (2023) 128 1.2 ZB IP traffic (2016) 2 ZB Data stored (2010) 64 ZB Data stored (2020) Broadband Speeds 3G (2002) 105mbps (2014) 1.5 mbps (2002) 16 mbps (2008) Wireless Services 3Gbps (2021) 1Gbps (2018) 4G (2014) 5G (2018) Zettabyte Era ~180 ZB Data stored (2025) Public Cloud to Multicloud AWS (2006) GCP (2008) Azure (2010) 1021 10 Azure Arc
  • 11. 11 + Compute + From 100+ cores → 1,000+ cores per server + From multicore CPUs → full System on a Chip (SoC) designs (CPU, GPU, Cache, Memory) + Memory + Terabyte-scale RAM per server + DDR5 — 4600 MHz in 2020, 8000 MHz by 2024 + DDR6 — 9600 MHz by 2025 + Persistent memory — memory mode + Storage + Petabyte-scale storage per server + NVMe 2.0 [2021] — separation of base and transport + Persistent memory — app direct (storage) mode Hardware Still Vertically Scaling
  • 12. Hybrid & Multi-cloud is Now-ish 12 Azure Arc
  • 13. 13 + Agile [c. 2000] + Microservices Architecture [2005] + CI/CD = CI [1991] + CD [2009] + DevOps [2009] + Chaos Monkey [2011] + Kubernetes [2014] + GitOps [2017] + DevSecOps [2018] Methodologies Still Evolving How It Started How It’s Going How It Evolved
  • 14. 14
  • 15. + <1 terabyte + 1 to 50 terabytes + 50-100 terabytes + >100 terabytes How much data do you have under management in your own transactional database systems? Poll Question 15
  • 16. The Distributed Database Landscape Here there be monstrous databases! 16
  • 17. Distributed Database Landscape 2021 SQL + Distributed SQL + NewSQL NoSQL + Key-value + Document + Wide-column + Graph Multi-model + SQL + NoSQL + Multiple NoSQL Production Environments + On-premises + Co-location + Public cloud + Private cloud + Hybrid cloud + Multicloud + Edge + IoT / Embedded Business / Use Models + Open Source License + Enterprise License + OEM License + Service Agreements Use Cases + OLTP + OLAP + HTAP + Time Series 17
  • 18. Top 100 Databases (and Database-like systems) on DB-Engines.com ranking [as of April 2022] + 47 SQL + 25 NoSQL + 11 Multimodel (multiple NoSQL models and/or SQL + NoSQL) + 7 Search Engines + 5 Time Series + 5 Other 18 DB-Engines.com Top 100 Databases
  • 19. 19 “Well…” Are all of the Top 100 “Distributed Databases?”
  • 20. 20 + Clustering & Distribution Strategies + Local clustering — multiple nodes in the same datacenter share updates + Cross-cluster updates — multiple clusters can share data between them + Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster + Node Roles, High Availability & Failover Strategies + Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes) + Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF) + Load balancing (client side or service in front of database) + Data Replication & Sharding Strategies + Replication Factors & Consistency Levels + Horizontal Scalability: Manual vs. Auto-sharding + Topology Awareness: Rack-awareness, Datacenter-awareness What do you mean by a “Distributed Database?”
  • 21. 21 The Short List: Systems of Interest SQL + NewSQL NoSQL PostgreSQL MongoDB CockroachDB Redis ScyllaDB (Cassandra) Just in case: SQL vs. NoSQL
  • 22. Clustering & Locality: Topology Awareness 22 1 2 3 1 2 3 1 2 3 Cross-Cluster Updates or Multi-Datacenter Clustering Non-Rack Aware Can have multiple nodes in same rack Rack Aware Distribute nodes evenly across all available racks in a datacenter Zone and Datacenter Aware Provides survivability across geography Reduces local latencies Considerations for data localization Local Clustering
  • 23. 23 Support Multi-Datacenter Clustering? SQL + NewSQL NoSQL PostgreSQL MongoDB CockroachDB Redis ScyllaDB (Cassandra) Designed for multi-datacenter Designed for single-datacenter; capable of multi-datacenter
  • 24. 24 Clustering & Replication: Primary-Replica Set vs. Peer-to-Peer (Leaderless) Primary-Replica (Multiple Replicas) Only primary accepts writes; secondaries are read-only Replication is “fan out” from primary to replicas Write-heavy workloads can tax the primary Peer-to-Peer Active-Active (Multi-Datacenter) Each node accepts reads+writes Inherently better load balancing Deals better w/ write-heavy or mixed read-write workloads R+W R+W Read Only Read Only Replication Servers Clients Clients Servers ScyllaDB MongoDB
  • 25. 25 Support Active-Active vs. Primary-Replica SQL + NewSQL NoSQL PostgreSQL MongoDB CockroachDB Redis ScyllaDB (Cassandra) Active-Active Primary-Replica; active-active only through optional solutions
  • 26. 26 Clustering: Cross-Cluster Updates vs. Multi-Datacenter Replication Primary-Replica (Multiple Replicas) Only primary accepts writes; secondaries are read-only Source: MongoDB Source: ScyllaDB Peer-to-Peer Active-Active (Multi-Datacenter) Each node can accept reads and writes; leaderless RF=3 RF=2
  • 27. 27 Topology Awareness SQL + NewSQL NoSQL PostgreSQL MongoDB CockroachDB Redis ScyllaDB (Cassandra) Topology Aware Not built-in, but can be added
  • 28. 28 PostgreSQL — distributed SQL + Clustering & Distribution Strategies + Local clustering — multiple nodes in the same datacenter share updates + Cross-cluster updates — multiple clusters can share data between them + Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster + Node Roles, High Availability & Failover Strategies + Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes) + Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF) + Load balancing (client side or service in front of database) + Data Replication & Sharding Strategies + Replication Factors & Consistency Levels + Horizontal Scalability: Manual Sharding vs. Auto-sharding + Topology Awareness: Rack-awareness, Datacenter-awareness Part of base offering Can be added, but not part of base
  • 29. 29 CockroachDB — NewSQL + Clustering & Distribution Strategies + Local clustering — multiple nodes in the same datacenter share updates + Cross-cluster updates — multiple clusters can share data between them + Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster + Node Roles, High Availability & Failover Strategies + Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes) + Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF) + Load balancing (client side or service in front of database) + Data Replication & Sharding Strategies + Replication Factors & Consistency Levels + Horizontal Scalability: Manual vs. Auto-sharding + Topology Awareness: Rack-awareness*, Datacenter-awareness * Can be manually configured using localities Part of base offering Can be added, but not part of base
  • 30. 30 + Clustering & Distribution Strategies + Local clustering — multiple nodes in the same datacenter share updates + Cross-cluster updates — multiple clusters can share data between them + Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster + Node Roles, High Availability & Failover Strategies + Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes) + Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF) + Load balancing (client side or service in front of database) + Data Replication & Sharding Strategies + Replication Factors & Consistency Levels + Horizontal Scalability: Manual vs. Auto-sharding + Topology Awareness: Rack-awareness, Datacenter-awareness MongoDB — the leading document store Part of base offering Can be added, but not part of base
  • 31. 31 + Clustering & Distribution Strategies + Local clustering — multiple nodes in the same datacenter share updates + Cross-cluster updates — multiple clusters can share data between them + Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster* + Node Roles, High Availability & Failover Strategies + Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes) + Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)* + Load balancing (client side or service in front of database) + Data Replication & Sharding Strategies + Replication Factors & Consistency Levels (e.g., strong locally; causal consistency in active-active*) + Horizontal Scalability: Manual vs. Auto-sharding + Topology Awareness: Rack-awareness, Datacenter-awareness Redis — key-value in-memory DB/cache * Redis Enterprise feature Part of base offering Can be added, but not part of base
  • 32. 32 + Clustering & Distribution Strategies + Local clustering — multiple nodes in the same datacenter share updates + Cross-cluster updates — multiple clusters can share data between them + Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster + Node Roles, High Availability & Failover Strategies + Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes) + Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF) + Load balancing (client side or service in front of database*) + Data Replication & Sharding Strategies + Replication Factors & Consistency Levels + Horizontal Scalability: Manual vs. Auto-sharding + Topology Awareness: Rack-awareness, Datacenter-awareness ScyllaDB Part of base offering * For DynamoDB-compatible API
  • 33. 33 But for now, let’s move on...
  • 34. Where are Distributed Databases Headed Next? Time to read the tea leaves 34
  • 35. 35 The Trend for SQL + Google Trends for “SQL” are at 22% rate of 2004 + Book citations for “SQL” peaked in 2008 and were down to 28% of that rate by 2019 + Back to 1995 levels of interest, basically + Still dwarfs other database terms like “NoSQL” or “NewSQL” or “RDBMS” + No single term or technology sums up the distributed database market anymore
  • 36. 36 + Cambrian Explosion will Continue — “What is a database anyway?” + Distributed Databases of all kinds + Distributed Streaming — “Kafka as a database?” (kSQL says “Yes!”) + Distributed Ledgers — “Blockchains/DAGs as a database?” + Further fragmentation of the market + NoSQL + SQL blending increasingly + Evolution of NoSQL back to SQL assumptions + Adding back Strong Consistency, Schema Constraints, Strict Typing Where are Distributed Databases Going?
  • 37. 37 + “Cloud Native” — What does that mean to you? + Elasticity — Faster provisioning/decommissioning, autoscaling + Serverless — “I don’t want to manage hardware; just give me an API.” + Uncoupling Compute from Storage — Tiered Storage, Plug-in Storage + Data over Time + Built for Event Streaming, Time Series + Data over Space + Geospatial queries, Geoindexing + Geographic / political boundaries — GDPR, data localization regulatory compliance Further Trends in Distributed Databases
  • 38. 38 + Increasing Focus on Developer Enablement and Developer Experience (DX) + APIs for extensibility: extensions, plugins, modules, add-ons, integration layers + Database Specific: PostgreSQL extensions, Redis modules + Cross-industry: GraphQL, OpenAPI (Swagger), etc. + AI/ML integration and incorporation into databases + “Building models where your data resides” — Martin Heller (Apr 2021) + Amazon Redshift ML + BigQuery ML + Oracle, Db2, Microsoft SQL Server Database as a Development Platform
  • 39. 39 + Tighter Coupling of Data Engineering + Data Sciences + Operations + Repairing rifts of the past decade + Bridging huge divides between people and systems + From “Data Pipelining” (production-oriented) to... + “Data Supply Chains” (consumption-oriented) + Like “Software Supply Chain,” but for data and data products. Data Teaming
  • 40. 40 + Specializing databases to run in the cloud (and cloud-only) + Providing “concierge” services + Ecosystem: can integrate into cloud vendor’s (or partners’) offerings + Managed for you — at a price + Making Open Source databases easier to run on infrastructural level + Making self-managed operations simpler + Flexibility: can run on premises or in the cloud + Self-service model — so long as you have the skillz We Need Different Kinds of “Easy”
  • 41. Hope You Enjoyed Your Trip! http://slack.scylladb.com/ 41
  • 42. + Kostja Osipov + Serge Leontiev Thanks Any errors, omissions, misinterpretations, misrepresentations or misunderstandings are purely my own. Please send suggestions and corrections to [email protected] People who helped educate me Disclaimer 42
  • 43. Poll How much data do you have under management in your own transactional database? 43
  • 44. United States 2445 Faber St, Suite #200 Palo Alto, CA USA 94303 Israel Maskit 4 Herzliya, Israel 4673304 www.scylladb.com @scylladb @PeterCorless Q&A Join our Next Virtual Workshop! scylladb.com/webinars Thank You! Stay in touch!