SlideShare a Scribd company logo
Introducing PgOpenCL
        A New PostgreSQL
       Procedural Language
Unlocking the Power of the GPU!
                By
             Tim Child
Bio

Tim Child
• 35 years experience of software development
• Formerly
  •   VP Oracle Corporation
  •   VP BEA Systems Inc.
  •   VP Informix
  •   Leader at Illustra, Autodesk, Navteq, Intuit, …
• 30+ years experience in 3D, CAD, GIS and DBMS
Terminology
Term                  Description
Procedure Language    Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )
GPU                   Graphics Processing Unit (highly specialized CPU for graphics)
GPGPU                 General Purpose GPU (non-graphics programming on a GPU)
CUDA                  Nvidia’s GPU programming environment
APU                   Accelerated Processing Unit      (AMD’s Hybrid CPU & GPU chip)
ISO C99               Modern standard version of the C language
OpenCL                Open Compute Language
OpenMP                Open Multi-Processing (parallelizing compilers)
SIMD                  Single Instruction Multiple Data (Vector instructions )
SSE                   x86, x64 (Intel, AMD) Streaming SIMD Extensions
xPU                   Any Processing Unit device (CPU, GPU, APU)
Kernel                Functions that execute on a OpenCL Device
Work Item             Instance of a Kernel
Workgroup             A group of Work Items
FLOP                  Floating Point Operation (single = SQL real type )
MIC                   Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
Some Technology Trends
            Impacting DBMS
• Solid State Storage
    – Reduced Access Time, Lower Power, Increasing in capacity
• Virtualization
    – Server consolidation, Specialized VM’s, lowers direct costs
• Cloud Computing
    – EC2, Azure, … lowers capital requirements
• Multi-Core
    – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications

• xPU (GPU/APU)
    –   GPU >1000 Cores
    –    > 1T FLOP /s @ €2500
    –   APU = CPU + GPU Chip Hybrids due in Mid 2011
    –   2 T FLOP /s for $2.10 per hour (AWS EC2)
    –   Intel MIC “Knights Corner “ > 50 x86 Cores
Compute Intensive
    xPU Database Applications
•   Bioinformatics

•   Signal/Audio/Image Processing/Video

•   Data Mining & Analytics

•   Searching

•   Sorting

•   Spatial Selections and Joins

•   Map/Reduce

•   Scientific Computing

•   Many Others …
GPU vs CPU
Vendor           NVidia       ATI Radeon      Intel
Architecture     Fermi         Evergreen    Nehalem
Cores              448           1600          4
                  Simple        Simple      Complex
Transistors       3.1 B         2.15 B       731 M
Clock            1.5 G Hz      851 M Hz      3 G Hz
Peak Float       1500 G        2720 G         96 G
Performance      FLOP / s      FLOP / s     FLOP / s
Peak Double       750 G         544 G         48 G
Performance      FLOP / s      FLOP / s     FLOP / s
Memory          ~ 190 G / s   ~ 153 G / s   ~ 30 G / s
Bandwidth
Power             250 W        > 250 W        80 W
Consumption
SIMD / Vector     Many          Many         SSE4+
Instructions
Multi-Core Performance




Source NVidia
Future (Mid 2011)
                 APU Based PC
APU (Accelerated Processing Unit)

              APU Chip
      CPU             CPU                 ~20 GB/s     System RAM


         North Bridge
        ~20 GB/s                                           APU’s
                          PCIE ~12 GB/s
                          PCIE ~12 GB/s




                                                     Adds an Embedded
      Embedded                                             GPU
        GPU


                   Discrete
                                          150 GB/s     Graphic RAM
                     GPU

             Source AMD
Scalar vs. SIMD
Scalar Instruction
          C=A+B                           1       +       2        =        3




SIMD Instruction                              1       3       5         7

                                                          +
      Vector C = Vector A + Vector B          2       4       6        8

                                                          =
                                              3       7       11       15


        OpenCL
                  Vector lengths 2,4,8,16 for char, short, int, float, double
Summarizing xPU
            Trends
• Many more xPU Cores in our Future
• Compute Environment becoming Hybrid
  – CPU and GPU’s
  – Need CPU to give access to GPU power
• GPU Capabilities
  – Lots of cores
  – Vector/SIMD Instructions
  – Fast Memory
• GPU Futures
  – Virtual Memory
  – Multi-tasking / Pre-emption
Scaling PostgreSQL Queries
                       on xPU’s
            Multi-Core CPU                                           Many Core GPU


 PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL       PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
  Threads     Threads    Threads    Threads        Threads     Threads    Threads    Thread     Thread



                                                   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
Postgres                                            Threads    Threads    Threads    Thread     Thread
Process


                                                   PgOpenCL              PgOpenCL   PgOpenCL   PgOpenCL
                                                              PgOpenCL
                                                    Threads               Threads    Thread     Thread
                                                               Threads




                                              Using More
                                              Transistors
Parallel
      Programming Systems
Category             CUDA     OpenMP       OpenCL
Language               C      C, Fortran     C
Cross Platform         X          √           √
Standard             Vendor   OpenMP       Khronos
CPU                    X          √           √
GPU                    √          X           √
Clusters               X          √           X

Compilation / Link   Static     Static     Dynamic
What is OpenCL?
• OpenCL - Open Compute Language
  –   Subset of C 99
  –   Open Specification
  –   Proposed by Apple
  –   Many Companies Collaborated on the Specification
  –   Portable, Device Agnostic
  –   Specification maintained by Khronos Group
• PgOpenCL
  – OpenCL as a PostgreSQL Procedural Language
System Overview
                                    DBMS Server

                                                   PgOpenCL
                                                    PgOpenCL
  Web     HTTP     Web               SQL              SQL
                                                       SQL
Browser           Server             Statement     Procedure
                                                    Procedure

                                                       PCIe X2 Bus
                           TCP/IP

                   App
                                      PostgreSQL              GPGPU
                  Server




                                        Disk I/O     Tables
                           TCP/IP
          PostgreSQL
            Client
OpenCL
                       Language
• A subset of ISO C99
   – - But without some C99 features such as standard C99 headers,
   – function pointers, recursion, variable length arrays, and bit fields
• A superset of ISO C99 with additions for:
   –   - Work-items and Workgroups
   –   - Vector types
   –   - Synchronization
   –   - Address space qualifiers
• Also includes a large set of built-in functions
   – - Image manipulation
   – - Work-item manipulation,
   – - Specialized math routines, etc.
PgOpenCL
             Components
• New PostgreSQL Procedural Language
  – Language handler
     • Maps arguments
     • Calls function
     • Returns results
  – Language validator
     • Creates Function with parameter & syntax checking
     • Compiles Function to a Binary format
• New data types
  – cl_double4, cl_double8, ….
• System Admin Pseudo-Tables
  – Platform, Device, Run-Time, …
PgOpenCL
 Admin
PGOpenCL
                        Function Declaration
CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])
AS $BODY$

#pragma PGOPENCL Platform : ATI Stream
#pragma PGOPENCL Device : CPU

__kernel __attribute__((reqd_work_group_size(64, 1, 1)))
void VectorAdd( __global const float *a, __global const float *b, __global float *c)
  {
    int i = get_global_id(0);

      c[i] = a[i] + b[i];
  }

$BODY$
Language PgOpenCL;
PgOpenCL
                                   Execution Model
            A
Table
            B

            Select Table                    100’s - 1000’s of
              to Array                      Threads (Kernels)

                                        xPU
                                           VectorAdd(A, B)
        A           +        B                Returns C               =       C


                            Copy                                                  Unnest Array
                                                                 Copy               To Table
            Table

                C       C    C      C   C   C    C    C      C    C       C   C      C
Using
               Re-Shaped Tables
                       100’s - 1000’s of
    Table of           Threads (Kernels)                  Table of
     Arrays                                                Arrays
                  A    +   B     =         C

A
                                                      C     C        C   C
B
                   xPU
                      VectorAdd(A, B)
                         Returns C
A
                                                      C     C        C   C
B

                Copy
                                               Copy
Today’s GPGPU
              Challenges
• No Pre-emptive Multi-Tasking
• No Virtual Memory
• Limited Bandwidth to discrete GPGPU
   – 1 – 8 G/s over PCIe Bus
• Hard to Program
   – New Parallel Algorithms and constructs
   – “New” C language dialect
• Immature Tools
   – Compilers, IDE, Debuggers, Profilers - early years
• Data organization really matters
   – Types, Structure, and Alignment
   – SQL needs to Shape the Data
• Profiling and Debugging is not easy

Solves Well for Problem Sets with the Right Shape!
Making a Problem
                           Work for You
        • Determine % Parallelism Possible
for ( i = 0, i <  ∞, i++)
            for ( j = 0; j < ∞; j++ )
                      for ( k = 0; k <   ∞; k++ )


        • Arrange data to fit available GPU RAM
        •    Ensure calculation time >> I/O transfer overhead
        •    Learn about Parallel Algorithms and the OpenCL language
        •    Learn new tools
        •    Carefully choose Data Types, Organization and Alignments
        •    Profile and Measure at Every Stage
PgOpenCL
     System Requirements
• PostgreSQL 9.x
• For GPU’s
   – AMD ATI OpenCL Stream SDK 2.x
   – NVidia CUDA 3.x SDK
   – Recent Macs with O/S 11.6
• For CPU’s (Pentium M or more recent)
   – AMD ATI OpenCL Stream SDK 2.x
   – Intel OpenCL SDK Alpha Release (x86)
   – Recent Macs with O/S 11.6
PGOpenCL
                                   Status
    Today        1Q 2011
  Prototype       Beta


     2010             2011


• Wish List
       • Beta Testers
              – Existing OpenCL App?
              – Have a GPU App?
       • Contributors
              – Code server side functions?
       • Sponsors & Supporters
           – AMD Fusion Fund?
           – Khronos?
PgOpenCL
               Future Plans
• Increase Platform Support
• Scatter/Gather Functions
• Additional Type Support
   – Image Types
   – Sparse Matrices
• Run-Time
   –   Asynchronous
   –   Events
   –   Profiling
   –   Debugging
Using the
                                Whole Brain
                        APU Chip
PgOpenCl                           PgOpenCl
  PgOpenCL                           PgOpenCL
                 CPU
         CPU                    CPU
      Postgres                                  You can’t be in a
                                                parallel universe
                                                  with a single
                                                     brain!
                 North Bridge
             ~20 GB/s
                                                 • Heterogeneous Compute Environments
                          PgOpenCl
                            PgOpenCl                  • CPU’s, GPU’s, APU’s
             Embedded         PgOpenCl                • Expect 100’s – 1000’s of cores
                                PgOpenCl
               GPU                PgOpenCL




             The Future Is Parallel: What's a Programmer to Do?
Summarizing
              PgOpenCL
• Supports Heterogeneous Parallel Compute Environments
    • CPU’s, GPU’s, APU’s

• OpenCL
    • Portable and high-performance framework
        –Ideal for computationally intensive algorithms
        –Access to all compute resources (CPU, APU, GPU)
        –Well-defined computation/memory model
    •Efficient parallel programming language
        –C99 with extensions for task and data parallelism
        –Rich set of built-in functions
    •Open standard for heterogeneous parallel computing
• PgOpenCL
   • Integrates PostgreSQL with OpenCL
   • Provides Easy SQL Access to xPU’s
       • APU, CPU, GPGPU
   • Integrates OpenCL
       • SQL + Web Apps(PHP, Ruby, … )
More
                    Information
•   PGOpenCL
        • Twitter @3DMashUp

•   OpenCL

• www.khronos.org/opencl/


• www.amd.com/us/products/technologies/stream-technology/opencl/


• http://software.intel.com/en-us/articles/intel-opencl-sdk


• http://www.nvidia.com/object/cuda_opencl_new.html


• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
Q&A

• Using Parallel Applications?
• Benefits of OpenCL / PgOpenCL?
• Want to Collaborate on PgOpenCL?

More Related Content

What's hot (20)

PDF
20150318-SFPUG-Meetup-PGStrom
Kohei KaiGai
 
PDF
pgconfasia2016 plcuda en
Kohei KaiGai
 
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
PDF
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
 
PDF
Easy and High Performance GPU Programming for Java Programmers
Kazuaki Ishizaki
 
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
Jan Holčapek
 
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Equnix Business Solutions
 
PDF
Making Hardware Accelerator Easier to Use
Kazuaki Ishizaki
 
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Akihiro Hayashi
 
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 
PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Danielle Womboldt
 
PDF
20181212 - PGconfASIA - LT - English
Kohei KaiGai
 
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Kohei KaiGai
 
PDF
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
PDF
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
 
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
Kohei KaiGai
 
PPTX
GPGPU programming with CUDA
Savith Satheesh
 
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
PDF
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
20150318-SFPUG-Meetup-PGStrom
Kohei KaiGai
 
pgconfasia2016 plcuda en
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
 
Easy and High Performance GPU Programming for Java Programmers
Kazuaki Ishizaki
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Jan Holčapek
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Equnix Business Solutions
 
Making Hardware Accelerator Easier to Use
Kazuaki Ishizaki
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Akihiro Hayashi
 
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Danielle Womboldt
 
20181212 - PGconfASIA - LT - English
Kohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Kohei KaiGai
 
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
Kohei KaiGai
 
GPGPU programming with CUDA
Savith Satheesh
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 

Viewers also liked (8)

PDF
Task Parallel Library (TPL)
Muhammad Zaid Sarfraz
 
PDF
TPL Dataflow – зачем и для кого?
GoSharp
 
PPTX
Task Parallel Library 2014
Lluis Franco
 
PDF
An Intelligent Storage?
Kohei KaiGai
 
PDF
20170127 JAWS HPC-UG#8
Kohei KaiGai
 
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
PDF
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Task Parallel Library (TPL)
Muhammad Zaid Sarfraz
 
TPL Dataflow – зачем и для кого?
GoSharp
 
Task Parallel Library 2014
Lluis Franco
 
An Intelligent Storage?
Kohei KaiGai
 
20170127 JAWS HPC-UG#8
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Ad

Similar to PostgreSQL with OpenCL (20)

PDF
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
laparuma
 
PPTX
Gpu archi
Piyush Mittal
 
PPT
Current Trends in HPC
Putchong Uthayopas
 
PPTX
Gpgpu intro
Dominik Seifert
 
PDF
The Rise of Parallel Computing
bakers84
 
PDF
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
HSA Foundation
 
PDF
Computing using GPUs
Shree Kumar
 
PDF
[05][cuda 및 fermi 최적화 기술] hryu optimization
laparuma
 
PPTX
Cuda Architecture
Piyush Mittal
 
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
PDF
Trip down the GPU lane with Machine Learning
Renaldas Zioma
 
PDF
Programming the PS3
Olivier Grisel
 
PDF
The road to multi/many core computing
Osvaldo Gervasi
 
PDF
Open CL For Haifa Linux Club
Ofer Rosenberg
 
PDF
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
PPT
Vpu technology &gpgpu computing
Arka Ghosh
 
PDF
OpenCL & the Future of Desktop High Performance Computing in CAD
Design World
 
PDF
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 
PDF
Using GPUs for parallel processing
asm100
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
laparuma
 
Gpu archi
Piyush Mittal
 
Current Trends in HPC
Putchong Uthayopas
 
Gpgpu intro
Dominik Seifert
 
The Rise of Parallel Computing
bakers84
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
HSA Foundation
 
Computing using GPUs
Shree Kumar
 
[05][cuda 및 fermi 최적화 기술] hryu optimization
laparuma
 
Cuda Architecture
Piyush Mittal
 
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
Trip down the GPU lane with Machine Learning
Renaldas Zioma
 
Programming the PS3
Olivier Grisel
 
The road to multi/many core computing
Osvaldo Gervasi
 
Open CL For Haifa Linux Club
Ofer Rosenberg
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
Vpu technology &gpgpu computing
Arka Ghosh
 
OpenCL & the Future of Desktop High Performance Computing in CAD
Design World
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 
Using GPUs for parallel processing
asm100
 
Ad

Recently uploaded (20)

PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

PostgreSQL with OpenCL

  • 1. Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
  • 2. Bio Tim Child • 35 years experience of software development • Formerly • VP Oracle Corporation • VP BEA Systems Inc. • VP Informix • Leader at Illustra, Autodesk, Navteq, Intuit, … • 30+ years experience in 3D, CAD, GIS and DBMS
  • 3. Terminology Term Description Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … ) GPU Graphics Processing Unit (highly specialized CPU for graphics) GPGPU General Purpose GPU (non-graphics programming on a GPU) CUDA Nvidia’s GPU programming environment APU Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip) ISO C99 Modern standard version of the C language OpenCL Open Compute Language OpenMP Open Multi-Processing (parallelizing compilers) SIMD Single Instruction Multiple Data (Vector instructions ) SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions xPU Any Processing Unit device (CPU, GPU, APU) Kernel Functions that execute on a OpenCL Device Work Item Instance of a Kernel Workgroup A group of Work Items FLOP Floating Point Operation (single = SQL real type ) MIC Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
  • 4. Some Technology Trends Impacting DBMS • Solid State Storage – Reduced Access Time, Lower Power, Increasing in capacity • Virtualization – Server consolidation, Specialized VM’s, lowers direct costs • Cloud Computing – EC2, Azure, … lowers capital requirements • Multi-Core – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications • xPU (GPU/APU) – GPU >1000 Cores – > 1T FLOP /s @ €2500 – APU = CPU + GPU Chip Hybrids due in Mid 2011 – 2 T FLOP /s for $2.10 per hour (AWS EC2) – Intel MIC “Knights Corner “ > 50 x86 Cores
  • 5. Compute Intensive xPU Database Applications • Bioinformatics • Signal/Audio/Image Processing/Video • Data Mining & Analytics • Searching • Sorting • Spatial Selections and Joins • Map/Reduce • Scientific Computing • Many Others …
  • 6. GPU vs CPU Vendor NVidia ATI Radeon Intel Architecture Fermi Evergreen Nehalem Cores 448 1600 4 Simple Simple Complex Transistors 3.1 B 2.15 B 731 M Clock 1.5 G Hz 851 M Hz 3 G Hz Peak Float 1500 G 2720 G 96 G Performance FLOP / s FLOP / s FLOP / s Peak Double 750 G 544 G 48 G Performance FLOP / s FLOP / s FLOP / s Memory ~ 190 G / s ~ 153 G / s ~ 30 G / s Bandwidth Power 250 W > 250 W 80 W Consumption SIMD / Vector Many Many SSE4+ Instructions
  • 8. Future (Mid 2011) APU Based PC APU (Accelerated Processing Unit) APU Chip CPU CPU ~20 GB/s System RAM North Bridge ~20 GB/s APU’s PCIE ~12 GB/s PCIE ~12 GB/s Adds an Embedded Embedded GPU GPU Discrete 150 GB/s Graphic RAM GPU Source AMD
  • 9. Scalar vs. SIMD Scalar Instruction C=A+B 1 + 2 = 3 SIMD Instruction 1 3 5 7 + Vector C = Vector A + Vector B 2 4 6 8 = 3 7 11 15 OpenCL Vector lengths 2,4,8,16 for char, short, int, float, double
  • 10. Summarizing xPU Trends • Many more xPU Cores in our Future • Compute Environment becoming Hybrid – CPU and GPU’s – Need CPU to give access to GPU power • GPU Capabilities – Lots of cores – Vector/SIMD Instructions – Fast Memory • GPU Futures – Virtual Memory – Multi-tasking / Pre-emption
  • 11. Scaling PostgreSQL Queries on xPU’s Multi-Core CPU Many Core GPU PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Threads Threads Threads Threads Threads Thread Thread PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Postgres Threads Threads Threads Thread Thread Process PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Thread Thread Threads Using More Transistors
  • 12. Parallel Programming Systems Category CUDA OpenMP OpenCL Language C C, Fortran C Cross Platform X √ √ Standard Vendor OpenMP Khronos CPU X √ √ GPU √ X √ Clusters X √ X Compilation / Link Static Static Dynamic
  • 13. What is OpenCL? • OpenCL - Open Compute Language – Subset of C 99 – Open Specification – Proposed by Apple – Many Companies Collaborated on the Specification – Portable, Device Agnostic – Specification maintained by Khronos Group • PgOpenCL – OpenCL as a PostgreSQL Procedural Language
  • 14. System Overview DBMS Server PgOpenCL PgOpenCL Web HTTP Web SQL SQL SQL Browser Server Statement Procedure Procedure PCIe X2 Bus TCP/IP App PostgreSQL GPGPU Server Disk I/O Tables TCP/IP PostgreSQL Client
  • 15. OpenCL Language • A subset of ISO C99 – - But without some C99 features such as standard C99 headers, – function pointers, recursion, variable length arrays, and bit fields • A superset of ISO C99 with additions for: – - Work-items and Workgroups – - Vector types – - Synchronization – - Address space qualifiers • Also includes a large set of built-in functions – - Image manipulation – - Work-item manipulation, – - Specialized math routines, etc.
  • 16. PgOpenCL Components • New PostgreSQL Procedural Language – Language handler • Maps arguments • Calls function • Returns results – Language validator • Creates Function with parameter & syntax checking • Compiles Function to a Binary format • New data types – cl_double4, cl_double8, …. • System Admin Pseudo-Tables – Platform, Device, Run-Time, …
  • 18. PGOpenCL Function Declaration CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[]) AS $BODY$ #pragma PGOPENCL Platform : ATI Stream #pragma PGOPENCL Device : CPU __kernel __attribute__((reqd_work_group_size(64, 1, 1))) void VectorAdd( __global const float *a, __global const float *b, __global float *c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } $BODY$ Language PgOpenCL;
  • 19. PgOpenCL Execution Model A Table B Select Table 100’s - 1000’s of to Array Threads (Kernels) xPU VectorAdd(A, B) A + B Returns C = C Copy Unnest Array Copy To Table Table C C C C C C C C C C C C C
  • 20. Using Re-Shaped Tables 100’s - 1000’s of Table of Threads (Kernels) Table of Arrays Arrays A + B = C A C C C C B xPU VectorAdd(A, B) Returns C A C C C C B Copy Copy
  • 21. Today’s GPGPU Challenges • No Pre-emptive Multi-Tasking • No Virtual Memory • Limited Bandwidth to discrete GPGPU – 1 – 8 G/s over PCIe Bus • Hard to Program – New Parallel Algorithms and constructs – “New” C language dialect • Immature Tools – Compilers, IDE, Debuggers, Profilers - early years • Data organization really matters – Types, Structure, and Alignment – SQL needs to Shape the Data • Profiling and Debugging is not easy Solves Well for Problem Sets with the Right Shape!
  • 22. Making a Problem Work for You • Determine % Parallelism Possible for ( i = 0, i < ∞, i++) for ( j = 0; j < ∞; j++ ) for ( k = 0; k < ∞; k++ ) • Arrange data to fit available GPU RAM • Ensure calculation time >> I/O transfer overhead • Learn about Parallel Algorithms and the OpenCL language • Learn new tools • Carefully choose Data Types, Organization and Alignments • Profile and Measure at Every Stage
  • 23. PgOpenCL System Requirements • PostgreSQL 9.x • For GPU’s – AMD ATI OpenCL Stream SDK 2.x – NVidia CUDA 3.x SDK – Recent Macs with O/S 11.6 • For CPU’s (Pentium M or more recent) – AMD ATI OpenCL Stream SDK 2.x – Intel OpenCL SDK Alpha Release (x86) – Recent Macs with O/S 11.6
  • 24. PGOpenCL Status Today 1Q 2011 Prototype Beta 2010 2011 • Wish List • Beta Testers – Existing OpenCL App? – Have a GPU App? • Contributors – Code server side functions? • Sponsors & Supporters – AMD Fusion Fund? – Khronos?
  • 25. PgOpenCL Future Plans • Increase Platform Support • Scatter/Gather Functions • Additional Type Support – Image Types – Sparse Matrices • Run-Time – Asynchronous – Events – Profiling – Debugging
  • 26. Using the Whole Brain APU Chip PgOpenCl PgOpenCl PgOpenCL PgOpenCL CPU CPU CPU Postgres You can’t be in a parallel universe with a single brain! North Bridge ~20 GB/s • Heterogeneous Compute Environments PgOpenCl PgOpenCl • CPU’s, GPU’s, APU’s Embedded PgOpenCl • Expect 100’s – 1000’s of cores PgOpenCl GPU PgOpenCL The Future Is Parallel: What's a Programmer to Do?
  • 27. Summarizing PgOpenCL • Supports Heterogeneous Parallel Compute Environments • CPU’s, GPU’s, APU’s • OpenCL • Portable and high-performance framework –Ideal for computationally intensive algorithms –Access to all compute resources (CPU, APU, GPU) –Well-defined computation/memory model •Efficient parallel programming language –C99 with extensions for task and data parallelism –Rich set of built-in functions •Open standard for heterogeneous parallel computing • PgOpenCL • Integrates PostgreSQL with OpenCL • Provides Easy SQL Access to xPU’s • APU, CPU, GPGPU • Integrates OpenCL • SQL + Web Apps(PHP, Ruby, … )
  • 28. More Information • PGOpenCL • Twitter @3DMashUp • OpenCL • www.khronos.org/opencl/ • www.amd.com/us/products/technologies/stream-technology/opencl/ • http://software.intel.com/en-us/articles/intel-opencl-sdk • http://www.nvidia.com/object/cuda_opencl_new.html • http://developer.apple.com/technologies/mac/snowleopard/opencl.html
  • 29. Q&A • Using Parallel Applications? • Benefits of OpenCL / PgOpenCL? • Want to Collaborate on PgOpenCL?