SlideShare a Scribd company logo
Zing Database – Distributed Key-Value Database Nguyễn Quang Nam Zing Web-Technical Team
Content Why Introduction Overview architecture 1 3 2 Single Server/Storage 4 Distribution 5
Introduction
Some statistics: - Feeds: 1.6 B, 700 GB hard drive in 4 DB instances, 8 caching servers, 136 GB memory cache in used. - User Profiles: 44.5 M registered accounts, 2 database instances, 30 GB memory cache. - Comments: 350 M, 50 GB hard drive in 2 DB instances, 20 GB memory cache
Why
Access time L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main memory reference 100 ns Compress 1K bytes with Zippy 10,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from network 10,000,000 ns Read 1 MB sequentially from disk 30,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns by Jeff Dean (http://labs.google.com/people/jeff)
Standard & Real Requirement - Time to load a page < 200 ms - Read data rate ~12K ops/sec - Write data rate ~8K ops/sec - Caching service/Database recovery time < 5 mins
Existent thing - RDBMS (MySQL, MSSQL): Write: too slow; Read: so so with a small DB, too bad with a huge DB - Cassandra (by Facebook): difficult to do operation/maintain, and performance is not so good - HBase/Hadoop: We use this for log system - MongoDB, Membase, Tokyo Tyrant, .. : OK! we use these in several cases, but not suitable for all
Overview architecture
 
Server/Storage
ZNonblockingServer - Based on TNonblockingServer (Apache Thrift) - 185K reqs/sec (original TNonblockingServer is just 45K reqs/sec) - Serialize/Deserialize data - Prevent overload server - Data is not secured while transferring - Protect service from invalid requests
ICache - Least Recently Used/Time based expiration strategy - zlru_table<key_type, value_type>: hash table data structure - Re-write malloc/free functions instead of using standard malloc/free in glibc to reduce memory fragment - Support dirty-items marking => for lazy DB flush
ZiDB - Separate into DataFile & IndexFile - 1 seek for a read, 1-2 seeks for a write - IndexFile (hash structure) is loaded onto memory as a mapping file (shared memory) to reduce system call - Write-ahead log to avoid data loss - Data magic-padding - Checksum & checkpoint for repair data - Partitioning DB for easier maintenance
Distribution
Key requirements: - Scalability - Load balance - Availability - Consistency
2 Models: - Centralized: 1 addressing server & multiple storage servers => bottleneck & single-point-of-failure - Peer-peer: Each server includes addressing module & storage 2 Types of routing: - Client routing: Each client itself does the addressing and query data  - Server routing: The addressing is done at server
Operation Flows * Addressing module is moved into each storage node in Peer-peer model  Business Logic Server Addressing Server (DHT) Storage Layer Storage Node 1 ICache ZiDB Storage Module Storage Node N ICache ZiDB Storage Module … (1)  Request key locations (2) Key locations (3) Get & Set  operations (4) Operation  returns
Addressing: - Provide key locations of resources - Basically a Distributed Hash Table, using consistent hashing - Hashing: Jenkins, Murmur, or any algorithm that satisfies two conditions:   - Uniform distribution of generated keys in the key space   - Consistency (MD5, SHA are bad choice since performance)
Addressing - Node location: Each node is assigned a continuous range of IDs (hashed key)
Addressing - Node location: Golden ratio principle (a/b = 2b/a) - Init ratio = 1.618 - Max ratio ~ 2.6 - Easy to implement - Easy for routing from client 2 3 4 5 1
Server 1: 1,2,3 Server 2: 4,5,6,7 Server 3: 8,9 1 4 7 3 6 2 5 8 9 Addressing - Node location: Virtual nodes - Each real server has multiple virtual nodes on ring - More virtual nodes, more balance of load - Hard to maintain table of nodes
A A A B B C Addressing – Multi-layer rings - Store the change history of system  - Provide availability/reconfigurability - Able to put a node on ring manually * Write: data is located on the highest ring * Read: data is located on the highest ring, then lower rings if not found
Replication & Backup  - Each node has one primary range of IDs, and Some secondary range of IDs - Each real node need a backup instance to replace in case  it’s down * Data is queried from primary node, then secondary nodes
Configuration: to find the best parameters to configure DB or to choose the suitable DB type.  - How many read/write per second? - Length Deviation of data: data length is same same or much different each others,  - Has updation/deletion data?  - How important of data: acceptable loss or not - The old data can be recycled?
Q & A Contact: Nguyễn Quang Nam [email_address] http://me.zing.vn/nam.nq

More Related Content

What's hot (19)

PPTX
How Facebook actually works????
Dhruv Patel
 
PDF
Redis : Database, cache, pub/sub and more at Jelly button games
Redis Labs
 
PPTX
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
Buff Nguyen
 
PPTX
10 domino integration
darwinodb
 
PPTX
Microsoft Web Technology Stack
Lushanthan Sivaneasharajah
 
PDF
Newsql 2015-150213024325-conversion-gate01
Jagadeesha DG
 
PDF
High Performance - Joomla!Days NL 2009 #jd09nl
Joomla!Days Netherlands
 
PPT
Zarafa SummerCamp 2012 - Steve Hardy Friday Keynote
Zarafa
 
PPTX
Operationalizing MongoDB at AOL
radiocats
 
PDF
WordCamp RVA 2011 - Performance & Tuning
Timothy Wood
 
PPTX
Modern Distributed Messaging and RPC
Max Alexejev
 
PDF
[WSO2Con EU 2017] Ballerina: Exploring Data Integration
WSO2
 
PDF
Introduction to Apache BookKeeper Distributed Storage
Streamlio
 
PPT
Ui perf
Franz Allan See
 
PPTX
When to Use MongoDB...and When You Should Not...
MongoDB
 
PDF
Load balancing at tuenti
Ricardo Bartolomé
 
PPT
Zarafa SummerCamp 2012 - Exchange Web Services, technical information
Zarafa
 
KEY
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
PDF
Optimising for Performance
thomas_mb
 
How Facebook actually works????
Dhruv Patel
 
Redis : Database, cache, pub/sub and more at Jelly button games
Redis Labs
 
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
Buff Nguyen
 
10 domino integration
darwinodb
 
Microsoft Web Technology Stack
Lushanthan Sivaneasharajah
 
Newsql 2015-150213024325-conversion-gate01
Jagadeesha DG
 
High Performance - Joomla!Days NL 2009 #jd09nl
Joomla!Days Netherlands
 
Zarafa SummerCamp 2012 - Steve Hardy Friday Keynote
Zarafa
 
Operationalizing MongoDB at AOL
radiocats
 
WordCamp RVA 2011 - Performance & Tuning
Timothy Wood
 
Modern Distributed Messaging and RPC
Max Alexejev
 
[WSO2Con EU 2017] Ballerina: Exploring Data Integration
WSO2
 
Introduction to Apache BookKeeper Distributed Storage
Streamlio
 
When to Use MongoDB...and When You Should Not...
MongoDB
 
Load balancing at tuenti
Ricardo Bartolomé
 
Zarafa SummerCamp 2012 - Exchange Web Services, technical information
Zarafa
 
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
Optimising for Performance
thomas_mb
 

Viewers also liked (15)

PPT
Big data
Luis Goldster
 
PDF
Design a scalable social network: Problems and Solutions
Chau Thanh
 
PDF
IoT and developer chances
Chau Thanh
 
PDF
Buiding and Deploying SaaS with WSO2 as as-a-Service
WSO2
 
PPTX
Memcached vs redis
qianshi
 
PDF
Design a scalable site: Problem and solutions
Chau Thanh
 
PDF
Sơ lược kiến trúc hệ thống Zing Me
zingopen
 
PDF
Building ZingMe News Feed System
Chau Thanh
 
PDF
Design a scalable social network: Problems and solutions
Chau Thanh
 
PDF
Architecture Patterns - Open Discussion
Nguyen Tung
 
PDF
Zingme practice for building scalable website with PHP
Chau Thanh
 
PDF
SaaS Introduction-May2014
Nguyen Tung
 
PDF
Microservice Architecture
Nguyen Tung
 
PPT
7 Stages of Scaling Web Applications
David Mitzenmacher
 
PDF
facebook architecture for 600M users
Jongyoon Choi
 
Big data
Luis Goldster
 
Design a scalable social network: Problems and Solutions
Chau Thanh
 
IoT and developer chances
Chau Thanh
 
Buiding and Deploying SaaS with WSO2 as as-a-Service
WSO2
 
Memcached vs redis
qianshi
 
Design a scalable site: Problem and solutions
Chau Thanh
 
Sơ lược kiến trúc hệ thống Zing Me
zingopen
 
Building ZingMe News Feed System
Chau Thanh
 
Design a scalable social network: Problems and solutions
Chau Thanh
 
Architecture Patterns - Open Discussion
Nguyen Tung
 
Zingme practice for building scalable website with PHP
Chau Thanh
 
SaaS Introduction-May2014
Nguyen Tung
 
Microservice Architecture
Nguyen Tung
 
7 Stages of Scaling Web Applications
David Mitzenmacher
 
facebook architecture for 600M users
Jongyoon Choi
 
Ad

Similar to Zing Database – Distributed Key-Value Database (20)

PPT
SQL or NoSQL, that is the question!
Andraz Tori
 
PPTX
Overview of MongoDB and Other Non-Relational Databases
Andrew Kandels
 
PPT
Handling Data in Mega Scale Web Systems
Vineet Gupta
 
PPTX
Big Data and the growing relevance of NoSQL
Abhijit Sharma
 
ODP
Nonrelational Databases
Udi Bauman
 
PPT
Document Databases & RavenDB
Brian Ritchie
 
PDF
Granular Archival and Nearline Storage Using MySQL, S3, and SQS
waltjones
 
PDF
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
PPTX
Breaking the Relational Headlock: A Survey of NoSQL Datastores
gdusbabek
 
PDF
Thoughts on Transaction and Consistency Models
iammutex
 
PPT
No sql landscape_nosqltips
imarcticblue
 
PPTX
Big Data (NJ SQL Server User Group)
Don Demcsak
 
PDF
MongoDB @ SourceForge
iammutex
 
PDF
Where do I put this data? #lessql
Ezra Zygmuntowicz
 
KEY
Playing Nice with Others
Jeremy Hinegardner
 
PPT
Schemaless Databases
Dan Gunter
 
PDF
SDEC2011 NoSQL Data modelling
Korea Sdec
 
ODP
Front Range PHP NoSQL Databases
Jon Meredith
 
PDF
CMF: a pain in the F @ PHPDay 05-14-2011
Alessandro Nadalin
 
PPTX
GIDS 2016 Understanding and Building No SQLs
techmaddy
 
SQL or NoSQL, that is the question!
Andraz Tori
 
Overview of MongoDB and Other Non-Relational Databases
Andrew Kandels
 
Handling Data in Mega Scale Web Systems
Vineet Gupta
 
Big Data and the growing relevance of NoSQL
Abhijit Sharma
 
Nonrelational Databases
Udi Bauman
 
Document Databases & RavenDB
Brian Ritchie
 
Granular Archival and Nearline Storage Using MySQL, S3, and SQS
waltjones
 
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
gdusbabek
 
Thoughts on Transaction and Consistency Models
iammutex
 
No sql landscape_nosqltips
imarcticblue
 
Big Data (NJ SQL Server User Group)
Don Demcsak
 
MongoDB @ SourceForge
iammutex
 
Where do I put this data? #lessql
Ezra Zygmuntowicz
 
Playing Nice with Others
Jeremy Hinegardner
 
Schemaless Databases
Dan Gunter
 
SDEC2011 NoSQL Data modelling
Korea Sdec
 
Front Range PHP NoSQL Databases
Jon Meredith
 
CMF: a pain in the F @ PHPDay 05-14-2011
Alessandro Nadalin
 
GIDS 2016 Understanding and Building No SQLs
techmaddy
 
Ad

More from zingopen (20)

PPTX
Zing Me cung cấp gói hỗ trợ miễn phí cho Doanh nghiệp
zingopen
 
PDF
Zing Me Platform Policy
zingopen
 
PPTX
Zing Me Workshop 11082012
zingopen
 
PPTX
Quản lý Zing Me fanpage một cách hiệu quả
zingopen
 
PDF
The social shop- proposal
zingopen
 
PDF
Tích hợp kỹ thuật của Ứng dụng trên Zing Me
zingopen
 
PDF
Zing Open Platform APIs
zingopen
 
PPTX
Fanpage Management
zingopen
 
PPTX
Partnership Proposal
zingopen
 
PPTX
Cơ hội và thách thức cho DN Vừa và Nhỏ trên MXH
zingopen
 
PDF
Checklist Zing Me Fanpage
zingopen
 
PDF
Check List Zing Me Fan page
zingopen
 
PDF
Check List Zing Me Fan page
zingopen
 
PDF
Check list Zing Me Fan page
zingopen
 
PPTX
Behavior of Zing Me users
zingopen
 
PDF
Zing Me Users Proflie
zingopen
 
PDF
Build fame and make money with social media
zingopen
 
PPTX
Google cooperate with VNG_Presentation
zingopen
 
PPTX
Branding in Farm 2
zingopen
 
PPT
Zing me credential
zingopen
 
Zing Me cung cấp gói hỗ trợ miễn phí cho Doanh nghiệp
zingopen
 
Zing Me Platform Policy
zingopen
 
Zing Me Workshop 11082012
zingopen
 
Quản lý Zing Me fanpage một cách hiệu quả
zingopen
 
The social shop- proposal
zingopen
 
Tích hợp kỹ thuật của Ứng dụng trên Zing Me
zingopen
 
Zing Open Platform APIs
zingopen
 
Fanpage Management
zingopen
 
Partnership Proposal
zingopen
 
Cơ hội và thách thức cho DN Vừa và Nhỏ trên MXH
zingopen
 
Checklist Zing Me Fanpage
zingopen
 
Check List Zing Me Fan page
zingopen
 
Check List Zing Me Fan page
zingopen
 
Check list Zing Me Fan page
zingopen
 
Behavior of Zing Me users
zingopen
 
Zing Me Users Proflie
zingopen
 
Build fame and make money with social media
zingopen
 
Google cooperate with VNG_Presentation
zingopen
 
Branding in Farm 2
zingopen
 
Zing me credential
zingopen
 

Recently uploaded (20)

PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 

Zing Database – Distributed Key-Value Database

  • 1. Zing Database – Distributed Key-Value Database Nguyễn Quang Nam Zing Web-Technical Team
  • 2. Content Why Introduction Overview architecture 1 3 2 Single Server/Storage 4 Distribution 5
  • 4. Some statistics: - Feeds: 1.6 B, 700 GB hard drive in 4 DB instances, 8 caching servers, 136 GB memory cache in used. - User Profiles: 44.5 M registered accounts, 2 database instances, 30 GB memory cache. - Comments: 350 M, 50 GB hard drive in 2 DB instances, 20 GB memory cache
  • 5. Why
  • 6. Access time L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main memory reference 100 ns Compress 1K bytes with Zippy 10,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from network 10,000,000 ns Read 1 MB sequentially from disk 30,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns by Jeff Dean (http://labs.google.com/people/jeff)
  • 7. Standard & Real Requirement - Time to load a page < 200 ms - Read data rate ~12K ops/sec - Write data rate ~8K ops/sec - Caching service/Database recovery time < 5 mins
  • 8. Existent thing - RDBMS (MySQL, MSSQL): Write: too slow; Read: so so with a small DB, too bad with a huge DB - Cassandra (by Facebook): difficult to do operation/maintain, and performance is not so good - HBase/Hadoop: We use this for log system - MongoDB, Membase, Tokyo Tyrant, .. : OK! we use these in several cases, but not suitable for all
  • 10.  
  • 12. ZNonblockingServer - Based on TNonblockingServer (Apache Thrift) - 185K reqs/sec (original TNonblockingServer is just 45K reqs/sec) - Serialize/Deserialize data - Prevent overload server - Data is not secured while transferring - Protect service from invalid requests
  • 13. ICache - Least Recently Used/Time based expiration strategy - zlru_table<key_type, value_type>: hash table data structure - Re-write malloc/free functions instead of using standard malloc/free in glibc to reduce memory fragment - Support dirty-items marking => for lazy DB flush
  • 14. ZiDB - Separate into DataFile & IndexFile - 1 seek for a read, 1-2 seeks for a write - IndexFile (hash structure) is loaded onto memory as a mapping file (shared memory) to reduce system call - Write-ahead log to avoid data loss - Data magic-padding - Checksum & checkpoint for repair data - Partitioning DB for easier maintenance
  • 16. Key requirements: - Scalability - Load balance - Availability - Consistency
  • 17. 2 Models: - Centralized: 1 addressing server & multiple storage servers => bottleneck & single-point-of-failure - Peer-peer: Each server includes addressing module & storage 2 Types of routing: - Client routing: Each client itself does the addressing and query data - Server routing: The addressing is done at server
  • 18. Operation Flows * Addressing module is moved into each storage node in Peer-peer model Business Logic Server Addressing Server (DHT) Storage Layer Storage Node 1 ICache ZiDB Storage Module Storage Node N ICache ZiDB Storage Module … (1) Request key locations (2) Key locations (3) Get & Set operations (4) Operation returns
  • 19. Addressing: - Provide key locations of resources - Basically a Distributed Hash Table, using consistent hashing - Hashing: Jenkins, Murmur, or any algorithm that satisfies two conditions: - Uniform distribution of generated keys in the key space - Consistency (MD5, SHA are bad choice since performance)
  • 20. Addressing - Node location: Each node is assigned a continuous range of IDs (hashed key)
  • 21. Addressing - Node location: Golden ratio principle (a/b = 2b/a) - Init ratio = 1.618 - Max ratio ~ 2.6 - Easy to implement - Easy for routing from client 2 3 4 5 1
  • 22. Server 1: 1,2,3 Server 2: 4,5,6,7 Server 3: 8,9 1 4 7 3 6 2 5 8 9 Addressing - Node location: Virtual nodes - Each real server has multiple virtual nodes on ring - More virtual nodes, more balance of load - Hard to maintain table of nodes
  • 23. A A A B B C Addressing – Multi-layer rings - Store the change history of system - Provide availability/reconfigurability - Able to put a node on ring manually * Write: data is located on the highest ring * Read: data is located on the highest ring, then lower rings if not found
  • 24. Replication & Backup - Each node has one primary range of IDs, and Some secondary range of IDs - Each real node need a backup instance to replace in case it’s down * Data is queried from primary node, then secondary nodes
  • 25. Configuration: to find the best parameters to configure DB or to choose the suitable DB type. - How many read/write per second? - Length Deviation of data: data length is same same or much different each others, - Has updation/deletion data? - How important of data: acceptable loss or not - The old data can be recycled?
  • 26. Q & A Contact: Nguyễn Quang Nam [email_address] http://me.zing.vn/nam.nq