SlideShare a Scribd company logo
PG-Strom
Query Acceleration Engine of PostgreSQL
Powered by GPGPU
NEC OSS Promotion Center
The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
Self Introduction
▌Name: KaiGai Kohei
▌Company: NEC
▌Mission: Software architect & Intrepreneur
▌Background:
 Linux kernel development (2003~?)
 PostgreSQL development (2006~)
 SAP alliance (2011~2013)
 PG-Strom development & productization (2012~)
▌PG-Strom Project:
 In-company startup of NEC
 Also, an open source software project
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.2
What is PG-Strom
▌An Extension of PostgreSQL
▌Off-loads CPU intensive SQL workloads to GPU processors
▌Major Features
① Automatic and just-in-time GPU code generation from SQL
② Asynchronous and concurrent query executor
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.3
database
Query
Executor
Query
Planner
Custom
Executor
Custom
Planner
GPU code
on the flySQL
command
Async-
Execution
PG-Strom
Query
Frontend
Concept
▌No Pain
 Looks like a traditional PostgreSQL database from standpoint of
applications, thus, we can utilize existing tools, drivers, applications.
▌No Tuning
 Massive computing capability by GPGPU kills necessity of database
tuning by human. It allows engineering folks to focus on the task only
human can do.
▌No Complexity
 No need to export large data to external tools from RDBMS, because its
computing performance is sufficient to run the workloads nearby data.
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.4
RDBMS and bottleneck (1/2)
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 5
Storage
Processor
Data
RAM
Data Size > RAM Data Size < RAM
Storage
Processor
Data
RAM
In the future?
Processor
Wide
Band
RAM
Non-
volatile
RAM
Data
World of current cpu/memory bottleneck
Join, Aggregation, Sort, Projection, ...
[strategy]
• burstable access pattern
• parallel algorithm
World of traditional disk-i/o bottleneck
SeqScan, IndexScan, ...
[strategy]
• reduction of i/o (size, count)
• distribution of disk (RAID)
RDBMS and bottleneck (2/2)
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 6
Processor
RAM
Storage
bandwidth:
multiple
hundreds GB/s
bandwidth:
multiple GB/s
Background (1/4) – Semiconductor Trend
▌Movement to CPU/GPU integrated architecture rather than multicore CPU
▌Free lunch for SW by HW evolution will finish soon
 Unless software is not designed to utilize GPU capability,
unable to pull-out the full hardware capability.
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 7
SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013
Background (2/4) – Features of GPU
▌Characteristics
 Larger percentage of ALUs on chip
 Relatively smaller percentage of cache
and control logic
Advantages to simple calculation in
parallel, but not complicated logic
 Much higher number of cores per price
• GTX750Ti (640core) with $150
GPU CPU
Model Nvidia Tesla K20X
Intel Xeon
E5-2670 v3
Architecture Kepler Haswell
Launch Nov-2012 Sep-2014
# of transistors 7.1billion 3.84billion
# of cores 2688 (simple) 12 (functional)
Core clock 732MHz
2.6GHz,
up to 3.5GHz
Peak Flops
(single precision)
3.95TFLOPS
998.4GFLOPS
(with AVX2)
DRAM size 6GB, GDDR5
768GB/socket,
DDR4
Memory band 250GB/s 68GB/s
Power
consumption
235W 135W
Price $3,000 $2,094
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 8
SOURCE: CUDA C Programming Guide (v6.5)
Background (3/4) – How GPU works
●item[0]
step.1 step.2 step.4step.3
Computing
the sum of array:
𝑖𝑡𝑒𝑚[𝑖]
𝑖=0…𝑁−1
with N-cores of GPU
◆
●
▲ ■ ★
● ◆
●
● ◆ ▲
●
● ◆
●
● ◆ ▲ ■
●
● ◆
●
● ◆ ▲
●
● ◆
●
item[1]
item[2]
item[3]
item[4]
item[5]
item[6]
item[7]
item[8]
item[9]
item[10]
item[11]
item[12]
item[13]
item[14]
item[15]
Total sum of items[]
with log2N steps
Inter core synchronization by HW support
Background (4/4) – Custom-Plan Interface
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.10
Aggregate
SELECT cat, avg(x) FROM t1, t2
WHERE t1.id = t2.id AND y > 100
GROUP BY cat;
Scan on t1 Scan on t2
Join
t1 t2
key: cat
• Hash Join
• Merge Join
• Nested Loop
• Custom Join
• Seq Scan
• Index Scan
• Index-Only Scan
• Tid Scan
• Custom Scan
IndexScan on t1
y > 100
“BulkLoad” on t1
“GpuHashJoin”
t1.id = t2.id
PG-Strom Features
▌Logics
 GpuScan ... Parallel evaluation of scan qualifiers
 GpuHashJoin ... Parallel multi-relational join
 GpuPreAgg ... Two phase aggregation
 GpuSort ... GPU + CPU Hybrid Sorting
 GpuNestedLoop (in develop)
▌Data Types
 Integer, Float, Date/Time, Numeric, Text
▌Function and Operators
 Equality and comparison operators
 Arithmetic operators and mathematical functions
 Aggregates: count, min/max, sum, avg, std, var, corr, regr
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.11
Automatic GPU code generation
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.12
postgres=# SET pg_strom.show_device_kernel = on;
postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10;
QUERY PLAN
--------------------------------------------------------------------------------
Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77)
Output: id, cat, aid, bid, cid, did, eid, x, y, z
Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision)
Features: likely-tuple-slot
Kernel Source: #include "opencl_common.h“
:
static pg_bool_t
gpuscan_qual_eval(__private cl_int *errcode,
__global kern_parambuf *kparams,
__global kern_data_store *kds,
__global kern_data_store *ktoast,
size_t kds_index)
{
pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0);
pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index);
pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index);
return pgfn_float8lt(errcode,
pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0);
}
Implementation (1/3) – GpuScan
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.13
Table
DMA
Send
DMA
Recv
DMA
Send
DMA
Recv
DMA
Send
DMA
Recv
Execution of
auto-generated
GPU code
Result
Output
Stream
Input
Stream
Chunk
(16~64MB)
PostgreSQL
PG-Strom
Software Architecture (1/2) – current version
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 14
GPU Code Generator
Storage
Storage Manager
Shared
Buffer
Query
Parser
Query
Optimizer
Query
Executor
SQL Query
Breaks down
the query to
parse tree
Makes query
execution plan
Run the query
Custom-PlanAPIs
GpuScan
GpuHashJoin
GpuPreAgg
GPU Program
Manager
PG-Strom
OpenCL
Server
Message Queue
GpuSort
PostgreSQL
PG-Strom
Software Architecture (2/2) – upcoming version
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 15
GPU Code Generator
Storage
Storage Manager
Shared
Buffer
Query
Parser
Query
Optimizer
Query
Executor
SQL Query
Breaks down
the query to
parse tree
Makes query
execution plan
Run the query
Custom-PlanAPIs
GpuScan
GpuHashJoin
GpuPreAgg
CUDA controller
GpuSort
DMA
Buffer
Just-in-time
compilewith
NVRTC
Copy
DMA
kernel launch
Implementation (2/3) – GpuHashJoin
PG-Strom Preview Feb-2015Page. 16
Inner
relation
Outer
relation
Inner
relation
Outer
relation
Hash Table Hash Table
Next stage Next stage
CPU just
references
materialized
results
Hash-Table
Search by CPU
Sequential
Materialization
by CPU
Parallel
Materialization
Parallel
Hash-Table
Search
vanilla Hash-Join GpuHashJoin
Benchmark result (1/2) – simple tables join
▌Benchmark Query:
SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....];
▌Environment:
 t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded
 CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.17
18.19 19.45 21.04 23.66 26.69
37.64 43.22 49.57 56.38
64.27
87.73
109.73
132.21
155.10
179.62
207.85
233.31
263.51
0.00
50.00
100.00
150.00
200.00
250.00
300.00
2 3 4 5 6 7 8 9 10
QueryResponseTime[sec]
number of tables joined
Simple Tables Join Benchmark
PG-Strom PostgreSQL
Implementation (3/3) – GpuPreAgg
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.18
Table
1st Stage
Reduction
2nd Stage
Reduction
Chunk
(16~64MB)
Benchmark result (2/2) – Star Schema Model
▌40 typical reporting queries
▌100GB of retail / start-schema data, all pre-loaded
▌Environment
 CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.19
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
1600.00
1800.00
2000.00
Q.01
Q.02
Q.03
Q.04
Q.05
Q.06
Q.07
Q.08
Q.09
Q.10
Q.11
Q.12
Q.13
Q.14
Q.15
Q.16
Q.17
Q.18
Q.19
Q.20
Q.21
Q.22
Q.23
Q.24
Q.25
Q.26
Q.27
Q.28
Q.29
Q.30
Q.31
Q.32
Q.33
Q.34
Q.35
Q.36
Q.37
Q.38
Q.39
Q.40
QueryResponseTime[sec]
Typical Reporting Queries on Retail / Star-Schema Data
PG-Strom PostgreSQL
Expected Scenario – Reduction of ETL
▌ETL – Its design is human centric task
▌Replication – much automatous task
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.20
ERPCRMSCM BI
OLTP
database
OLAP
database
ETL
OLAP CubesMaster / Fact Tables
BI
Replication
Replica of
Master / Fact Tables
Optimized to
transaction
workloads
Optimized to
analytic
workloads
Sufficient to
analytic
workloads also
PG-Strom
Direction of PG-Strom
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.21
Development Plan
 Migration of OpenCL to CUDA
 Add support of GpuNestedLoop
 Add support multi-functional kernel
 Standardization of custom-join interface
 ...and more...?
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.22
Short term target: PostgreSQL v9.5 timeline (2015)
Middle term target: PostgreSQL v9.6 timeline (2016)
Current version: PG-Strom β + PostgreSQL v9.5devel
 Integration with funnel executor
 Investigation to SSD/NvRAM utilization
 Custom-sort/aggregate interface
 Add support for spatial data types (?)
Enhancement Idea (1/3) – GpuNestedLoop
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.23
Inner-Relation
(Ny: relatively small)
Outer-Relation
(Nx:relativelylarge)
●
●
●
●
●
●
●
●
●
●
●●●●●●●
blockDim.x
blockDim.y
Ny items
Nx
items
Thread
(X=2, Y=3)
Each GPU threads evaluates
non-equalar join condition
in parallel
2-dimensional
GPU kernel launch
(Nx×Ny threads at once)
Enhancement Idea (2/3) – Multi-functional GPU kernel
PG-Strom Preview Feb-201524
magnetic
storage
GpuScan
GpuHashJoin
GpuPreAgg
Final result
current query execution
Data
Chunk
on-memory
cache
Load
GpuScan
Kernel
GpuHashJoin
Kernel
GpuPreAgg
Kernel
execution
on GPU device
Interaction
between
CPU and GPU
multiple times
magnetic
storage
Final result
execution in newer version
on-memory
cache
Interaction
between
CPU and GPU
Only once
GpuMultiOps
GpuScan
GpuHashJoin
GpuPreAgg
GpuMultiOps
Kernel
•PreAgg
•HashJoin
•Scan
Data
Chunk
Enhancement Idea (3/3) – Funnel executor integration
PG-Strom Preview Feb-201525
magnetic
storage
on-memory
cache
Scan
Join
Aggregate
Final result
current query execution
Query
Execution
Plan
PG-Strom may
replace GPU
version, but host
system run with
single thread.
execution in PostgreSQL v9.6
Funnel Executor
Partial
Aggregate
Partial
Scan
Partial
Aggregate
Partial
Scan
Partial
Aggregate
Join
Partial
Scan
Join Join
magnetic
storage
SSD
device
on-memory cache
Final result
Combined Aggregate
Funnel executor
assigns a part of
query execution
task on worker
processes
Combines
multiple partial
aggregates to
generate the
final result
Let’s try – Deployment on AWS
Page. 26
Search by “strom” !
AWS GPU Instance (g2.2xlarge)
CPU Xeon E5-2670 (8 xCPU)
RAM 15GB
GPU NVIDIA GRID K2 (1536core)
Storage 60GB of SSD
Price $0.898/hour, $646.56/mon
(*) Price for on-demand instance
on Tokyo region at Nov-2014
The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL
Welcome to your involvement
▌How to be involved?
 as a user
 as a developer
 as a business partner
▌Source code
 https://github.com/pg-strom/devel
▌Contact US
 e-mail: kaigai@ak.jp.nec.com
 twitter: @kkaigai
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.27
check it out!
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom

More Related Content

PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
Kohei KaiGai
 
PDF
SQL+GPU+SSD=∞ (English)
Kohei KaiGai
 
PDF
PG-Strom
Kohei KaiGai
 
PDF
PG-Strom - GPU Accelerated Asyncr
Kohei KaiGai
 
PDF
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
PDF
pgconfasia2016 plcuda en
Kohei KaiGai
 
PDF
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
Kohei KaiGai
 
SQL+GPU+SSD=∞ (English)
Kohei KaiGai
 
PG-Strom
Kohei KaiGai
 
PG-Strom - GPU Accelerated Asyncr
Kohei KaiGai
 
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
pgconfasia2016 plcuda en
Kohei KaiGai
 
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 

What's hot (20)

PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
Jan Holčapek
 
PDF
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
 
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
PDF
PostgreSQL with OpenCL
Muhaza Liebenlito
 
PDF
PG-Strom - A FDW module utilizing GPU device
Kohei KaiGai
 
PDF
20181212 - PGconfASIA - LT - English
Kohei KaiGai
 
PDF
20201006_PGconf_Online_Large_Data_Processing
Kohei KaiGai
 
PDF
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
 
PDF
20201128_OSC_Fukuoka_Online_GPUPostGIS
Kohei KaiGai
 
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Kohei KaiGai
 
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
Kohei KaiGai
 
PDF
20181016_pgconfeu_ssd2gpu_multi
Kohei KaiGai
 
PPTX
GPGPU programming with CUDA
Savith Satheesh
 
PDF
20181025_pgconfeu_lt_gstorefdw
Kohei KaiGai
 
PPTX
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Ural-PDC
 
PDF
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 
PDF
Easy and High Performance GPU Programming for Java Programmers
Kazuaki Ishizaki
 
PDF
20180920_DBTS_PGStrom_EN
Kohei KaiGai
 
PDF
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Jan Holčapek
 
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
PostgreSQL with OpenCL
Muhaza Liebenlito
 
PG-Strom - A FDW module utilizing GPU device
Kohei KaiGai
 
20181212 - PGconfASIA - LT - English
Kohei KaiGai
 
20201006_PGconf_Online_Large_Data_Processing
Kohei KaiGai
 
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
Kohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Kohei KaiGai
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
Kohei KaiGai
 
20181016_pgconfeu_ssd2gpu_multi
Kohei KaiGai
 
GPGPU programming with CUDA
Savith Satheesh
 
20181025_pgconfeu_lt_gstorefdw
Kohei KaiGai
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Ural-PDC
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 
Easy and High Performance GPU Programming for Java Programmers
Kazuaki Ishizaki
 
20180920_DBTS_PGStrom_EN
Kohei KaiGai
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
Ad

Viewers also liked (8)

PDF
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Kohei KaiGai
 
PDF
Row-level Security
Kohei KaiGai
 
PDF
Custom Scan API - PostgreSQL Unconference #3 (18-Jan-2014)
Kohei KaiGai
 
PDF
SQL+GPU+SSD=∞ (Japanese)
Kohei KaiGai
 
PDF
20170127 JAWS HPC-UG#8
Kohei KaiGai
 
PDF
An Intelligent Storage?
Kohei KaiGai
 
PDF
pgconfasia2016 lt ssd2gpu
Kohei KaiGai
 
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Kohei KaiGai
 
Row-level Security
Kohei KaiGai
 
Custom Scan API - PostgreSQL Unconference #3 (18-Jan-2014)
Kohei KaiGai
 
SQL+GPU+SSD=∞ (Japanese)
Kohei KaiGai
 
20170127 JAWS HPC-UG#8
Kohei KaiGai
 
An Intelligent Storage?
Kohei KaiGai
 
pgconfasia2016 lt ssd2gpu
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
Ad

Similar to 20150318-SFPUG-Meetup-PGStrom (20)

PDF
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
PDF
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
Equnix Business Solutions
 
PDF
20181210 - PGconf.ASIA Unconference
Kohei KaiGai
 
PDF
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Arnon Shimoni
 
PPTX
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
PDF
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
PDF
PostgreSQL Prologue
Md. Golam Hossain
 
PDF
SQL CUDA
Muhaza Liebenlito
 
PDF
Pgopencl
Tim Child
 
PDF
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Teresa Giacomini
 
PDF
What's New in PostgreSQL 9.3
EDB
 
PDF
An evening with Postgresql
Joshua Drake
 
PDF
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Matej Misik
 
PPTX
Getting started with postgresql
botsplash.com
 
PDF
PostgreSQL performance archaeology
Tomas Vondra
 
PDF
Postgres NoSQL - Delivering Apps Faster
EDB
 
PPTX
PostgreSQL - Object Relational Database
Mubashar Iqbal
 
PDF
Don't panic! - Postgres introduction
Federico Campoli
 
PDF
Explain this!
Fabio Telles Rodriguez
 
PDF
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PgDay.Seoul
 
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
Equnix Business Solutions
 
20181210 - PGconf.ASIA Unconference
Kohei KaiGai
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Arnon Shimoni
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
PostgreSQL Prologue
Md. Golam Hossain
 
Pgopencl
Tim Child
 
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Teresa Giacomini
 
What's New in PostgreSQL 9.3
EDB
 
An evening with Postgresql
Joshua Drake
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Matej Misik
 
Getting started with postgresql
botsplash.com
 
PostgreSQL performance archaeology
Tomas Vondra
 
Postgres NoSQL - Delivering Apps Faster
EDB
 
PostgreSQL - Object Relational Database
Mubashar Iqbal
 
Don't panic! - Postgres introduction
Federico Campoli
 
Explain this!
Fabio Telles Rodriguez
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PgDay.Seoul
 

More from Kohei KaiGai (20)

PDF
20221116_DBTS_PGStrom_History
Kohei KaiGai
 
PDF
20221111_JPUG_CustomScan_API
Kohei KaiGai
 
PDF
20211112_jpugcon_gpu_and_arrow
Kohei KaiGai
 
PDF
20210928_pgunconf_hll_count
Kohei KaiGai
 
PDF
20210731_OSC_Kyoto_PGStrom3.0
Kohei KaiGai
 
PDF
20210511_PGStrom_GpuCache
Kohei KaiGai
 
PDF
20201113_PGconf_Japan_GPU_PostGIS
Kohei KaiGai
 
PDF
20200828_OSCKyoto_Online
Kohei KaiGai
 
PDF
20200806_PGStrom_PostGIS_GstoreFdw
Kohei KaiGai
 
PDF
20200424_Writable_Arrow_Fdw
Kohei KaiGai
 
PDF
20191211_Apache_Arrow_Meetup_Tokyo
Kohei KaiGai
 
PDF
20191115-PGconf.Japan
Kohei KaiGai
 
PDF
20190926_Try_RHEL8_NVMEoF_Beta
Kohei KaiGai
 
PDF
20190925_DBTS_PGStrom
Kohei KaiGai
 
PDF
20190516_DLC10_PGStrom
Kohei KaiGai
 
PDF
20190418_PGStrom_on_ArrowFdw
Kohei KaiGai
 
PDF
20190314 PGStrom Arrow_Fdw
Kohei KaiGai
 
PDF
20181212 - PGconf.ASIA - LT
Kohei KaiGai
 
PDF
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
Kohei KaiGai
 
PDF
20180920_DBTS_PGStrom_JP
Kohei KaiGai
 
20221116_DBTS_PGStrom_History
Kohei KaiGai
 
20221111_JPUG_CustomScan_API
Kohei KaiGai
 
20211112_jpugcon_gpu_and_arrow
Kohei KaiGai
 
20210928_pgunconf_hll_count
Kohei KaiGai
 
20210731_OSC_Kyoto_PGStrom3.0
Kohei KaiGai
 
20210511_PGStrom_GpuCache
Kohei KaiGai
 
20201113_PGconf_Japan_GPU_PostGIS
Kohei KaiGai
 
20200828_OSCKyoto_Online
Kohei KaiGai
 
20200806_PGStrom_PostGIS_GstoreFdw
Kohei KaiGai
 
20200424_Writable_Arrow_Fdw
Kohei KaiGai
 
20191211_Apache_Arrow_Meetup_Tokyo
Kohei KaiGai
 
20191115-PGconf.Japan
Kohei KaiGai
 
20190926_Try_RHEL8_NVMEoF_Beta
Kohei KaiGai
 
20190925_DBTS_PGStrom
Kohei KaiGai
 
20190516_DLC10_PGStrom
Kohei KaiGai
 
20190418_PGStrom_on_ArrowFdw
Kohei KaiGai
 
20190314 PGStrom Arrow_Fdw
Kohei KaiGai
 
20181212 - PGconf.ASIA - LT
Kohei KaiGai
 
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
Kohei KaiGai
 
20180920_DBTS_PGStrom_JP
Kohei KaiGai
 

Recently uploaded (20)

PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Software Development Methodologies in 2025
KodekX
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 

20150318-SFPUG-Meetup-PGStrom

  • 1. PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <[email protected]>
  • 2. Self Introduction ▌Name: KaiGai Kohei ▌Company: NEC ▌Mission: Software architect & Intrepreneur ▌Background:  Linux kernel development (2003~?)  PostgreSQL development (2006~)  SAP alliance (2011~2013)  PG-Strom development & productization (2012~) ▌PG-Strom Project:  In-company startup of NEC  Also, an open source software project PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.2
  • 3. What is PG-Strom ▌An Extension of PostgreSQL ▌Off-loads CPU intensive SQL workloads to GPU processors ▌Major Features ① Automatic and just-in-time GPU code generation from SQL ② Asynchronous and concurrent query executor PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.3 database Query Executor Query Planner Custom Executor Custom Planner GPU code on the flySQL command Async- Execution PG-Strom Query Frontend
  • 4. Concept ▌No Pain  Looks like a traditional PostgreSQL database from standpoint of applications, thus, we can utilize existing tools, drivers, applications. ▌No Tuning  Massive computing capability by GPGPU kills necessity of database tuning by human. It allows engineering folks to focus on the task only human can do. ▌No Complexity  No need to export large data to external tools from RDBMS, because its computing performance is sufficient to run the workloads nearby data. PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.4
  • 5. RDBMS and bottleneck (1/2) DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 5 Storage Processor Data RAM Data Size > RAM Data Size < RAM Storage Processor Data RAM In the future? Processor Wide Band RAM Non- volatile RAM Data
  • 6. World of current cpu/memory bottleneck Join, Aggregation, Sort, Projection, ... [strategy] • burstable access pattern • parallel algorithm World of traditional disk-i/o bottleneck SeqScan, IndexScan, ... [strategy] • reduction of i/o (size, count) • distribution of disk (RAID) RDBMS and bottleneck (2/2) DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 6 Processor RAM Storage bandwidth: multiple hundreds GB/s bandwidth: multiple GB/s
  • 7. Background (1/4) – Semiconductor Trend ▌Movement to CPU/GPU integrated architecture rather than multicore CPU ▌Free lunch for SW by HW evolution will finish soon  Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability. DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 7 SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013
  • 8. Background (2/4) – Features of GPU ▌Characteristics  Larger percentage of ALUs on chip  Relatively smaller percentage of cache and control logic Advantages to simple calculation in parallel, but not complicated logic  Much higher number of cores per price • GTX750Ti (640core) with $150 GPU CPU Model Nvidia Tesla K20X Intel Xeon E5-2670 v3 Architecture Kepler Haswell Launch Nov-2012 Sep-2014 # of transistors 7.1billion 3.84billion # of cores 2688 (simple) 12 (functional) Core clock 732MHz 2.6GHz, up to 3.5GHz Peak Flops (single precision) 3.95TFLOPS 998.4GFLOPS (with AVX2) DRAM size 6GB, GDDR5 768GB/socket, DDR4 Memory band 250GB/s 68GB/s Power consumption 235W 135W Price $3,000 $2,094 DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 8 SOURCE: CUDA C Programming Guide (v6.5)
  • 9. Background (3/4) – How GPU works ●item[0] step.1 step.2 step.4step.3 Computing the sum of array: 𝑖𝑡𝑒𝑚[𝑖] 𝑖=0…𝑁−1 with N-cores of GPU ◆ ● ▲ ■ ★ ● ◆ ● ● ◆ ▲ ● ● ◆ ● ● ◆ ▲ ■ ● ● ◆ ● ● ◆ ▲ ● ● ◆ ● item[1] item[2] item[3] item[4] item[5] item[6] item[7] item[8] item[9] item[10] item[11] item[12] item[13] item[14] item[15] Total sum of items[] with log2N steps Inter core synchronization by HW support
  • 10. Background (4/4) – Custom-Plan Interface PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.10 Aggregate SELECT cat, avg(x) FROM t1, t2 WHERE t1.id = t2.id AND y > 100 GROUP BY cat; Scan on t1 Scan on t2 Join t1 t2 key: cat • Hash Join • Merge Join • Nested Loop • Custom Join • Seq Scan • Index Scan • Index-Only Scan • Tid Scan • Custom Scan IndexScan on t1 y > 100 “BulkLoad” on t1 “GpuHashJoin” t1.id = t2.id
  • 11. PG-Strom Features ▌Logics  GpuScan ... Parallel evaluation of scan qualifiers  GpuHashJoin ... Parallel multi-relational join  GpuPreAgg ... Two phase aggregation  GpuSort ... GPU + CPU Hybrid Sorting  GpuNestedLoop (in develop) ▌Data Types  Integer, Float, Date/Time, Numeric, Text ▌Function and Operators  Equality and comparison operators  Arithmetic operators and mathematical functions  Aggregates: count, min/max, sum, avg, std, var, corr, regr PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.11
  • 12. Automatic GPU code generation PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.12 postgres=# SET pg_strom.show_device_kernel = on; postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10; QUERY PLAN -------------------------------------------------------------------------------- Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77) Output: id, cat, aid, bid, cid, did, eid, x, y, z Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision) Features: likely-tuple-slot Kernel Source: #include "opencl_common.h“ : static pg_bool_t gpuscan_qual_eval(__private cl_int *errcode, __global kern_parambuf *kparams, __global kern_data_store *kds, __global kern_data_store *ktoast, size_t kds_index) { pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0); pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index); pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index); return pgfn_float8lt(errcode, pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0); }
  • 13. Implementation (1/3) – GpuScan PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.13 Table DMA Send DMA Recv DMA Send DMA Recv DMA Send DMA Recv Execution of auto-generated GPU code Result Output Stream Input Stream Chunk (16~64MB)
  • 14. PostgreSQL PG-Strom Software Architecture (1/2) – current version DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 14 GPU Code Generator Storage Storage Manager Shared Buffer Query Parser Query Optimizer Query Executor SQL Query Breaks down the query to parse tree Makes query execution plan Run the query Custom-PlanAPIs GpuScan GpuHashJoin GpuPreAgg GPU Program Manager PG-Strom OpenCL Server Message Queue GpuSort
  • 15. PostgreSQL PG-Strom Software Architecture (2/2) – upcoming version DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 15 GPU Code Generator Storage Storage Manager Shared Buffer Query Parser Query Optimizer Query Executor SQL Query Breaks down the query to parse tree Makes query execution plan Run the query Custom-PlanAPIs GpuScan GpuHashJoin GpuPreAgg CUDA controller GpuSort DMA Buffer Just-in-time compilewith NVRTC Copy DMA kernel launch
  • 16. Implementation (2/3) – GpuHashJoin PG-Strom Preview Feb-2015Page. 16 Inner relation Outer relation Inner relation Outer relation Hash Table Hash Table Next stage Next stage CPU just references materialized results Hash-Table Search by CPU Sequential Materialization by CPU Parallel Materialization Parallel Hash-Table Search vanilla Hash-Join GpuHashJoin
  • 17. Benchmark result (1/2) – simple tables join ▌Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....]; ▌Environment:  t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded  CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.17 18.19 19.45 21.04 23.66 26.69 37.64 43.22 49.57 56.38 64.27 87.73 109.73 132.21 155.10 179.62 207.85 233.31 263.51 0.00 50.00 100.00 150.00 200.00 250.00 300.00 2 3 4 5 6 7 8 9 10 QueryResponseTime[sec] number of tables joined Simple Tables Join Benchmark PG-Strom PostgreSQL
  • 18. Implementation (3/3) – GpuPreAgg PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.18 Table 1st Stage Reduction 2nd Stage Reduction Chunk (16~64MB)
  • 19. Benchmark result (2/2) – Star Schema Model ▌40 typical reporting queries ▌100GB of retail / start-schema data, all pre-loaded ▌Environment  CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.19 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 1800.00 2000.00 Q.01 Q.02 Q.03 Q.04 Q.05 Q.06 Q.07 Q.08 Q.09 Q.10 Q.11 Q.12 Q.13 Q.14 Q.15 Q.16 Q.17 Q.18 Q.19 Q.20 Q.21 Q.22 Q.23 Q.24 Q.25 Q.26 Q.27 Q.28 Q.29 Q.30 Q.31 Q.32 Q.33 Q.34 Q.35 Q.36 Q.37 Q.38 Q.39 Q.40 QueryResponseTime[sec] Typical Reporting Queries on Retail / Star-Schema Data PG-Strom PostgreSQL
  • 20. Expected Scenario – Reduction of ETL ▌ETL – Its design is human centric task ▌Replication – much automatous task PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.20 ERPCRMSCM BI OLTP database OLAP database ETL OLAP CubesMaster / Fact Tables BI Replication Replica of Master / Fact Tables Optimized to transaction workloads Optimized to analytic workloads Sufficient to analytic workloads also PG-Strom
  • 21. Direction of PG-Strom PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.21
  • 22. Development Plan  Migration of OpenCL to CUDA  Add support of GpuNestedLoop  Add support multi-functional kernel  Standardization of custom-join interface  ...and more...? PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.22 Short term target: PostgreSQL v9.5 timeline (2015) Middle term target: PostgreSQL v9.6 timeline (2016) Current version: PG-Strom β + PostgreSQL v9.5devel  Integration with funnel executor  Investigation to SSD/NvRAM utilization  Custom-sort/aggregate interface  Add support for spatial data types (?)
  • 23. Enhancement Idea (1/3) – GpuNestedLoop PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.23 Inner-Relation (Ny: relatively small) Outer-Relation (Nx:relativelylarge) ● ● ● ● ● ● ● ● ● ● ●●●●●●● blockDim.x blockDim.y Ny items Nx items Thread (X=2, Y=3) Each GPU threads evaluates non-equalar join condition in parallel 2-dimensional GPU kernel launch (Nx×Ny threads at once)
  • 24. Enhancement Idea (2/3) – Multi-functional GPU kernel PG-Strom Preview Feb-201524 magnetic storage GpuScan GpuHashJoin GpuPreAgg Final result current query execution Data Chunk on-memory cache Load GpuScan Kernel GpuHashJoin Kernel GpuPreAgg Kernel execution on GPU device Interaction between CPU and GPU multiple times magnetic storage Final result execution in newer version on-memory cache Interaction between CPU and GPU Only once GpuMultiOps GpuScan GpuHashJoin GpuPreAgg GpuMultiOps Kernel •PreAgg •HashJoin •Scan Data Chunk
  • 25. Enhancement Idea (3/3) – Funnel executor integration PG-Strom Preview Feb-201525 magnetic storage on-memory cache Scan Join Aggregate Final result current query execution Query Execution Plan PG-Strom may replace GPU version, but host system run with single thread. execution in PostgreSQL v9.6 Funnel Executor Partial Aggregate Partial Scan Partial Aggregate Partial Scan Partial Aggregate Join Partial Scan Join Join magnetic storage SSD device on-memory cache Final result Combined Aggregate Funnel executor assigns a part of query execution task on worker processes Combines multiple partial aggregates to generate the final result
  • 26. Let’s try – Deployment on AWS Page. 26 Search by “strom” ! AWS GPU Instance (g2.2xlarge) CPU Xeon E5-2670 (8 xCPU) RAM 15GB GPU NVIDIA GRID K2 (1536core) Storage 60GB of SSD Price $0.898/hour, $646.56/mon (*) Price for on-demand instance on Tokyo region at Nov-2014 The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL
  • 27. Welcome to your involvement ▌How to be involved?  as a user  as a developer  as a business partner ▌Source code  https://github.com/pg-strom/devel ▌Contact US  e-mail: [email protected]  twitter: @kkaigai PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.27 check it out!