SlideShare a Scribd company logo
PalDB
Introduction to PalDB
Mathieu Bastian - October 2015
Summary
❖ PalDB is an embeddable write-once key-value store
❖ Written in Java, no dependencies and only 110K JAR
❖ Very fast read performance, 2M+ reads/second
❖ Simple, works like an immutable un-typed HashMap
❖ Compact, holds in a single binary file
❖ Open-sourced at LinkedIn in 2015
Why PalDB?
❖ Need for an efficient solution to package side-data
❖ Inappropriate existing solutions
‣ Raw data files (CSV, JSON, Avro, Thrift) require complex
parsing code and in-memory data structures
‣ Embeddable key-value stores (LevelDB, RocksDB) have large
overhead due to read/write capabilities
‣ Traditional in-memory data structures (List, HashSet, HashMap)
take too much memory and require load time
Features
✓ All primitives and arrays, no schema needed
✓ Random read & iteration (unsorted)
✓ No load time, and uses off-heap memory
✓ Custom serializers can be defined
✓ Read from store file, stream or resources within JAR
✓ Holds in a single binary file
Write-once
❖ Write-once, read many
❖ Once a store has been written and closed, it can’t be
modified
❖ Typical use-case is to transport pre-created datasets
❖ Principal benefit is a more compact store size
Code: Write store
Java
StoreWriter writer = PalDB.createWriter(new File("store.paldb"));
writer.put("foo", "bar");
writer.put(1213, new int[] {1, 2, 3});
writer.close();
Scala
val writer: StoreWriter = PalDB.createWriter(new File("store.paldb"));
writer.put("foo", "bar");
writer.put(1213, Array(1, 2, 3));
writer.close();
Code: Read store
Java
StoreReader reader = PalDB.createReader(new File("store.paldb"));
String val1 = reader.get("foo");
int[] val2 = reader.get(1213);
reader.close();
Scala
val reader: StoreReader = PalDB.createReader(new File("store.paldb"));
val val1: String = reader.get("foo");
var val2: Array[Int] = reader.get(1213);
reader.close();
Benchmark summary
❖ When compared to embeddable key-value stores
(LevelDB, RocksDB)
‣ PalDB has 5X to 15X higher throughput on datasets
fitting in memory*
❖ When compared to in-memory Java HashSet/HashMap
‣ PalDB has 2X to 5X lower throughput
‣ Uses 6X less memory
* PalDB does not intend to scale to very large disk indices like RocksDB or LevelDB
Throughput
❖ Throughput benchmark between PalDB, LevelDB and
RocksDB (higher is better)
Memory
❖ Memory usage benchmark between PalDB and a Java
HashSet (lower is better)
PalDB © 2015 LinkedIn Corp. Licensed under the terms of the Apache License, Version 2.0.
Code & documentation available on GitHub

https://github.com/linkedin/PalDB
PalDB

More Related Content

What's hot (20)

PPTX
Azure DocumentDB 101
Ike Ellis
 
KEY
Mongodb lab
Bas van Oudenaarde
 
PPTX
MongoDB
Muhammad zubair
 
PDF
Replicating application data into materialized views
Zach Cox
 
PDF
Updating materialized views and caches using kafka
Zach Cox
 
ODP
Draft slide of Demystifying DHT in GlusterFS
Ankit Raj
 
PDF
ArangoDB
ArangoDB Database
 
PPTX
Introduction to Redis
Arnab Mitra
 
PDF
Ceph Day Beijing: Containers and Ceph
Ceph Community
 
PPTX
MongoDB_Sharan_Prakash_Babu
Sharan
 
PDF
FOXX - a Javascript application framework on top of ArangoDB
ArangoDB Database
 
PPTX
Operationalizing MongoDB at AOL
radiocats
 
PPTX
MongoDB Aggregation MongoSF May 2011
Chris Westin
 
KEY
KeyValue Stores
Mauro Pompilio
 
PPTX
Comparison with storing data using NoSQL(CouchDB) and a relational database.
eross77
 
PPT
PENXY - Redis in Azure
mourhoon
 
PDF
Visualize your graph database
Michael Hackstein
 
ODP
Redis IU
Isaiah Edem
 
PDF
CouchDB: replicated data store for distributed proxy server
tkramar
 
Azure DocumentDB 101
Ike Ellis
 
Mongodb lab
Bas van Oudenaarde
 
Replicating application data into materialized views
Zach Cox
 
Updating materialized views and caches using kafka
Zach Cox
 
Draft slide of Demystifying DHT in GlusterFS
Ankit Raj
 
Introduction to Redis
Arnab Mitra
 
Ceph Day Beijing: Containers and Ceph
Ceph Community
 
MongoDB_Sharan_Prakash_Babu
Sharan
 
FOXX - a Javascript application framework on top of ArangoDB
ArangoDB Database
 
Operationalizing MongoDB at AOL
radiocats
 
MongoDB Aggregation MongoSF May 2011
Chris Westin
 
KeyValue Stores
Mauro Pompilio
 
Comparison with storing data using NoSQL(CouchDB) and a relational database.
eross77
 
PENXY - Redis in Azure
mourhoon
 
Visualize your graph database
Michael Hackstein
 
Redis IU
Isaiah Edem
 
CouchDB: replicated data store for distributed proxy server
tkramar
 

Similar to Introduction to PalDB (16)

PPTX
Map db
Debmalya Jash
 
PPTX
Map db
Debmalya Jash
 
PDF
MapDB - taking Java collections to the next level
JavaDayUA
 
PPTX
Unit 3
vishal choudhary
 
PDF
Terracotta's OffHeap Explained
Chris Dennis
 
PDF
IBM Java PackedObjects
Marcel Mitran
 
PDF
Collections forceawakens
RichardWarburton
 
PDF
Kyotoproducts
Mikio Hirabayashi
 
PPTX
Jug Lugano - Scale over the limits
Davide Carnevali
 
PDF
Introduction to Tokyo Products
Mikio Hirabayashi
 
PDF
Introduction to tokyo products
jacky wu
 
PPT
Android | Busy Java Developers Guide to Android: Persistence | Ted Neward
JAX London
 
PDF
Persistent Memory Programming with Pmemkv
Intel® Software
 
PDF
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
srisatish ambati
 
ODP
Vote NO for MySQL
Ulf Wendel
 
PDF
Datastores
Raveen Vijayan
 
MapDB - taking Java collections to the next level
JavaDayUA
 
Terracotta's OffHeap Explained
Chris Dennis
 
IBM Java PackedObjects
Marcel Mitran
 
Collections forceawakens
RichardWarburton
 
Kyotoproducts
Mikio Hirabayashi
 
Jug Lugano - Scale over the limits
Davide Carnevali
 
Introduction to Tokyo Products
Mikio Hirabayashi
 
Introduction to tokyo products
jacky wu
 
Android | Busy Java Developers Guide to Android: Persistence | Ted Neward
JAX London
 
Persistent Memory Programming with Pmemkv
Intel® Software
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
srisatish ambati
 
Vote NO for MySQL
Ulf Wendel
 
Datastores
Raveen Vijayan
 
Ad

Recently uploaded (20)

PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Human Resources Information System (HRIS)
Amity University, Patna
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Ad

Introduction to PalDB

  • 1. PalDB Introduction to PalDB Mathieu Bastian - October 2015
  • 2. Summary ❖ PalDB is an embeddable write-once key-value store ❖ Written in Java, no dependencies and only 110K JAR ❖ Very fast read performance, 2M+ reads/second ❖ Simple, works like an immutable un-typed HashMap ❖ Compact, holds in a single binary file ❖ Open-sourced at LinkedIn in 2015
  • 3. Why PalDB? ❖ Need for an efficient solution to package side-data ❖ Inappropriate existing solutions ‣ Raw data files (CSV, JSON, Avro, Thrift) require complex parsing code and in-memory data structures ‣ Embeddable key-value stores (LevelDB, RocksDB) have large overhead due to read/write capabilities ‣ Traditional in-memory data structures (List, HashSet, HashMap) take too much memory and require load time
  • 4. Features ✓ All primitives and arrays, no schema needed ✓ Random read & iteration (unsorted) ✓ No load time, and uses off-heap memory ✓ Custom serializers can be defined ✓ Read from store file, stream or resources within JAR ✓ Holds in a single binary file
  • 5. Write-once ❖ Write-once, read many ❖ Once a store has been written and closed, it can’t be modified ❖ Typical use-case is to transport pre-created datasets ❖ Principal benefit is a more compact store size
  • 6. Code: Write store Java StoreWriter writer = PalDB.createWriter(new File("store.paldb")); writer.put("foo", "bar"); writer.put(1213, new int[] {1, 2, 3}); writer.close(); Scala val writer: StoreWriter = PalDB.createWriter(new File("store.paldb")); writer.put("foo", "bar"); writer.put(1213, Array(1, 2, 3)); writer.close();
  • 7. Code: Read store Java StoreReader reader = PalDB.createReader(new File("store.paldb")); String val1 = reader.get("foo"); int[] val2 = reader.get(1213); reader.close(); Scala val reader: StoreReader = PalDB.createReader(new File("store.paldb")); val val1: String = reader.get("foo"); var val2: Array[Int] = reader.get(1213); reader.close();
  • 8. Benchmark summary ❖ When compared to embeddable key-value stores (LevelDB, RocksDB) ‣ PalDB has 5X to 15X higher throughput on datasets fitting in memory* ❖ When compared to in-memory Java HashSet/HashMap ‣ PalDB has 2X to 5X lower throughput ‣ Uses 6X less memory * PalDB does not intend to scale to very large disk indices like RocksDB or LevelDB
  • 9. Throughput ❖ Throughput benchmark between PalDB, LevelDB and RocksDB (higher is better)
  • 10. Memory ❖ Memory usage benchmark between PalDB and a Java HashSet (lower is better)
  • 11. PalDB © 2015 LinkedIn Corp. Licensed under the terms of the Apache License, Version 2.0. Code & documentation available on GitHub
 https://github.com/linkedin/PalDB PalDB