SlideShare a Scribd company logo
FastROCS: What does it mean to be “fast”?
OpenEye Scientific Software
Brian Cole

March 26, 2013

© 2013 OpenEye Scientific Software
FastROCS and the “Chasm”
OpenEye Scientific Software
Brian Cole

March 26, 2013

© 2013 OpenEye Scientific Software
ROCS: Rapid Overlay of Chemical Structures

March 26, 2013

© 2013 OpenEye Scientific Software
LeadHopper

March 26, 2013

© 2013 OpenEye Scientific Software
And then you wait…

March 26, 2013

© 2013 OpenEye Scientific Software
High	
  
is	
  
Best	
  

Shape	
  Overlays	
  per	
  Second	
  

What is FastROCS?

CPU	
  
© 2013 OpenEye Scientific Software

GPU	
  
What is FastROCS?

High	
  
is	
  
Best	
  

Shape	
  Overlays	
  per	
  Second	
  

1,000,000	
  
100,000	
  
10,000	
  
1,000	
  
100	
  
10	
  
1	
  

CPU	
  
© 2013 OpenEye Scientific Software

GPU	
  
What is FastROCS?

High	
  
is	
  
Best	
  

Shape	
  Overlays	
  per	
  Second	
  

600,000	
  
500,000	
  
400,000	
  
300,000	
  
200,000	
  
100,000	
  
0	
  

CPU	
  
©	
  2013	
  OpenEye	
  Scien;fic	
  So>ware	
  

GPU	
  
Low	
  
is	
  
Best	
  

Log	
  (Elapsed	
  5me	
  in	
  seconds)	
  

But I want it now!
100,000	
  
10,000	
  

ROCS	
  

1,000	
  
100	
  

FastROCS	
  

10	
  
1	
  
1	
  

10	
  

Log	
  (cores/GPUs)	
  
March 26, 2013

© 2013 OpenEye Scientific Software

100	
  
Riding Moore’s Law

High	
  
is	
  
Best	
  

Shape	
  Overlays	
  per	
  Second	
  

2,000,000	
  
1,800,000	
  
1,600,000	
  
1,400,000	
  
1,200,000	
  
1,000,000	
  
800,000	
  
600,000	
  
400,000	
  
200,000	
  
0	
  
C1060	
   C2050	
   C2075	
   C2090	
  
March 26, 2013

© 2013 OpenEye Scientific Software

K10	
  

K20	
  
ROCS user base
• 
• 
• 
• 
• 

Every Pharma R&D
Many BioTechs
Many Universities
National Labs and Research Centers
Other software companies

March 26, 2013

© 2013 OpenEye Scientific Software
Licenses by Year

High	
  
is	
  
Best	
  

2009	
  
March 26, 2013

ROCS	
  
FastROCS	
  

2010	
  

2011	
  
© 2013 OpenEye Scientific Software

2012	
  
Licenses by Year (Linear Scale)
Pharmageddon	
  	
  

ROCS	
  
FastROCS	
  

%15	
  
2009	
  
March 26, 2013

2010	
  

2011	
  
© 2013 OpenEye Scientific Software

2012	
  
All ROCS users (linear scale)

Academics	
  
ROCS	
  
FastROCS	
  

%3	
  
2009	
  
March 26, 2013

2010	
  

2011	
  

2012	
  

© 2013 OpenEye Scientific Software
Technology Adoption Lifecycle

%2.5	
   %13.5	
   %34	
  

%34	
  

FastROCS	
  
March 26, 2013

© 2013 OpenEye Scientific Software

%16	
  
What’s in the “chasm”?
•  “ROCS is already fast enough”

Some	
  other	
  ;me…	
  

•  “The results aren’t bitwise comparable”

•  “There’s nothing else to run on the GPU”
•  “GPUs are different”

March 26, 2013

© 2013 OpenEye Scientific Software

GTC!	
  
FastROCS Quick Start
• 
• 
• 
• 
• 
• 

crtl-alt-F1 (to switch to a non X-server terminal)
login as root
/sbin/init 3 (to turn off the X-server)
./NVIDIA-Linux-x86_64-285.05.09.run
reboot
./cuda.sh to give /dev/nvidia* correct permissions

•  tar –xzf fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz
•  openeye/bin/ShapeDatabaseServer.py database.oeb.gz
•  openeye/bin/ShapeDatabaseClient.py localhost:8080 query.sdf out.sdf

March 26, 2013

© 2013 OpenEye Scientific Software
ROCS Quick Start
S;ll	
  a	
  barrier	
  to	
  entry	
  to	
  work	
  around!	
  

•  tar –xzf ROCS-3.1.1-RHEL5-x64.tar.gz
•  openeye/bin/rocs query.sdf database.oeb.gz

March 26, 2013

© 2013 OpenEye Scientific Software
This is even worse!

fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz
NVidia	
  OpenCL	
  binaries	
  are	
  ;ghtly	
  	
  
locked	
  to	
  a	
  par;cular	
  driver	
  version	
  

March 26, 2013

© 2013 OpenEye Scientific Software
Worthwhile to upgrade
800,000	
  

High	
  
is	
  
Best	
  

Conformers	
  /	
  Second	
  

700,000	
  

%11	
  

600,000	
  
500,000	
  
400,000	
  
300,000	
  
200,000	
  
100,000	
  
0	
  
C2050	
  (260	
  Driver)	
  

March 26, 2013

© 2013 OpenEye Scientific Software

C2050	
  (295	
  Driver)	
  
Needed for new hardware
1,200,000	
  

High	
  
is	
  
Best	
  

Conformers	
  /	
  Second	
  

1,000,000	
  
800,000	
  
600,000	
  
400,000	
  
200,000	
  
0	
  
C2050	
  (295	
  Driver)	
  
March 26, 2013

© 2013 OpenEye Scientific Software

M2090	
  (295	
  Driver)	
  
High	
  
is	
  
Best	
  

Speedup	
  (Single	
  GPU	
  5me	
  /	
  Mul5-­‐GPU	
  5me)	
  

Scalability between drivers (4x C2050)
4	
  

3	
  
Ideal	
  
260	
  driver	
  

2	
  

295	
  driver	
  

1	
  
1	
  

March 26, 2013

2	
  
3	
  
Number	
  of	
  GPUs	
  	
  
© 2013 OpenEye Scientific Software

4	
  
High	
  
is	
  
Best	
  

Speedup	
  (Single	
  GPU	
  5me	
  /	
  Mul5-­‐GPU	
  5me)	
  

Really bad for 8x M2090
8	
  
7	
  
6	
  
5	
  
4	
  
3	
  
2	
  
1	
  
0	
  
1	
  

2	
  

3	
  

4	
  

5	
  

Number	
  of	
  GPUs	
  	
  

March 26, 2013

© 2013 OpenEye Scientific Software

6	
  

7	
  

8	
  
Ways to transfer to device
• 

CL_MEM_USE_HOST_PTR
– 

• 

CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR
– 

• 

kernelBuf = clCreateBuffer() - cacheable
ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE)
memcpy(ptr, data)
clEnqueueUnmapMemObject(ptr)

clEnqueueWriteBuffer
– 
– 

• 

kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable
ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE)
memcpy(ptr, data)
clEnqueueUnmapMemObject(ptr)

clEnqueueMapBuffer
– 
– 
– 
– 

• 

kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR)

CL_MEM_ALLOC_HOST_PTR
– 
– 
– 
– 

• 

kernelBuf = clCreateBuffer(CL_MEM_USE_HOST_PTR)

kernelBuf = clCreateBuffer() - cacheable
clEnqueueWriteBuffer(kernelBuf, data)

oclCopyCompute
– 
– 
– 
– 
– 

pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) – cacheable
pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable
memcpy(pinnedPtr, data)
kernelBuf = clCreateBuffer() – cacheable
clEnqueueWriteBuffer(kernelBuf, pinnedPtr)

March 26, 2013

© 2013 OpenEye Scientific Software
Ways to transfer from device
• 

CL_MEM_ALLOC_HOST_PTR
– 
– 
– 
– 

kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable
ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE)
memcpy(data, ptr)
clEnqueueUnmapMemObject(ptr)

•  clEnqueueMapBuffer
– 
– 
– 
– 

kernelBuf = clCreateBuffer() - cacheable
ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE)
memcpy(data, ptr)
clEnqueueUnmapMemObject(ptr)

•  clEnqueueReadBuffer

–  kernelBuf = clCreateBuffer() - cacheable
–  clEnqueueWriteBuffer(kernelBuf, data)

• 

oclCopyCompute
–  pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) –
cacheable
–  pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable
–  memcpy(pinnedPtr, data)
–  kernelBuf = clCreateBuffer() – cacheable
–  clEnqueueReadBuffer(kernelBuf, pinnedPtr)

March 26, 2013

© 2013 OpenEye Scientific Software
Speedup	
  (Time	
  Sequen5al	
  /	
  Time	
  Parallel)	
  

FastROCS	
  scalability	
  across	
  8x	
  M2070	
  
9	
  
8	
  
7	
  
6	
  
5	
  
4	
  
3	
  
2	
  
1	
  
0	
  
1	
  1	
  1	
  1	
  1	
  2	
  2	
  2	
  2	
  2	
  3	
  3	
  3	
  3	
  3	
  4	
  4	
  4	
  4	
  4	
  5	
  5	
  5	
  5	
  5	
  6	
  6	
  6	
  6	
  6	
  7	
  7	
  7	
  7	
  7	
  8	
  8	
  8	
  8	
  8	
  

Number	
  of	
  GPUs	
  U5lized	
  
March 26, 2013

© 2013 OpenEye Scientific Software
Lessons from the mess
•  clEnqueueWriteBuffer > clEnqueueMapBuffer

•  clEnqueueMapBuffer >> clEnqueueReadBuffer

•  CL_MEM_* constants aren’t worth the effort

March 26, 2013

© 2013 OpenEye Scientific Software
CUDA?
•  Serious customers will only use NVidia cards
•  Pinned memory
•  Better support for binaries and compatibility
•  CUDA support >> OpenCL support

March 26, 2013

© 2013 OpenEye Scientific Software
FastROCS CUDA port

High	
  
is	
  
Best	
  

Confomers	
  per	
  Second	
  

3,000,000	
  
2,500,000	
  
2,000,000	
  

2xC2075	
  

1,500,000	
  

2xC2090	
  

1,000,000	
  

2xK20	
  

500,000	
  
0	
  

OpenCL	
  
March 26, 2013

CUDA	
  

© 2013 OpenEye Scientific Software

CUDA-­‐
pinned	
  
CUDA Scaling?

High	
  
is	
  
Best	
  

Conformers	
  per	
  Second	
  

8,000,000	
  
7,000,000	
  
6,000,000	
  
5,000,000	
  
4,000,000	
  

CUDA	
  

3,000,000	
  

OpenCL	
  

2,000,000	
  

Ideal	
  

1,000,000	
  
0	
  
1	
  

2	
  

3	
  

4	
  

5	
  

6	
  

7	
  

8	
  

Number	
  of	
  individual	
  K10	
  GPUs	
  	
  
(Note,	
  each	
  K10	
  has	
  2	
  physical	
  GPUs	
  on	
  the	
  board)	
  
March 26, 2013

© 2013 OpenEye Scientific Software
CUDA vs OpenCL: Ding Ding!
•  Portability vs Innovation
•  NVidia vs Intel and AMD
•  Open vs Proprietary

•  Customers don’t care…

March 26, 2013

© 2013 OpenEye Scientific Software
ROCS Implementations
•  We only care a little…
• 
• 
• 
• 
• 
• 

Fortran code (1995)
C code (1999)
C++ wrapper code (2003)
OpenCL code (2009)
CUDA code (2012)
C++ thread-safe code (2013)

March 26, 2013

© 2013 OpenEye Scientific Software
OpenEye Software
•  Lots of Software

–  14 products
–  13 software libraries

•  C++ (no SIMD)
–  2.5 million lines

•  Python
–  416 thousand lines

•  Java
–  63 thousand lines

•  C#
–  38 thousand lines
©	
  2012	
  OpenEye	
  Scien;fic	
  So>ware	
  
The People

10	
  

20	
  

Programmers	
  
Hardcore	
  Scripter	
  
Other	
  stuff	
  

12	
  
•  GPGPU = ½ of a developer
–  Only %2.5 of development effort
© 2012 OpenEye Scientific Software
Technology Adoption Lifecycle

%2.5	
   %13.5	
   %34	
  

%34	
  

%16	
  

OpenEye	
  GPGPU	
  development	
  
March 26, 2013

© 2013 OpenEye Scientific Software
LinkedIn skills

%2.2	
  

March 26, 2013

© 2013 OpenEye Scientific Software
Technology Adoption Lifecycle

%2.5	
   %13.5	
   %34	
  

%34	
  

GPGPU	
  development	
  
March 26, 2013

© 2013 OpenEye Scientific Software

%16	
  
I Believe…
•  GPGPU computing can become ubiquitous…

•  By expressing parallelism everywhere…

•  We can make it easy for our customers…
–  Pre-installed in every operating system
–  Integrated seamlessly into every language
–  Then eventually becoming the CPU
March 26, 2013

© 2013 OpenEye Scientific Software
Acknowledgements
•  Nikolai Sakharnykh (NVidia)
•  Dave Mullaly (HP)
•  Exxact Computing

March 26, 2013

© 2013 OpenEye Scientific Software
Father of “ROCS”
Andrew Grant
April 28th 1963
December 29th 2012

March 26, 2013

© 2013 OpenEye Scientific Software
March 26, 2013

© 2013 OpenEye Scientific Software
DUD	
  Average	
  AUC	
  

Dude, where’s my color?
0.9	
  
0.8	
  
0.7	
  
0.6	
  
0.5	
  
0.4	
  
0.3	
  
0.2	
  
0.1	
  
0	
  

Shape	
  Only	
  
With	
  Color	
  

ROCS	
  
March 26, 2013

FastROCS	
  
© 2010 OpenEye Scientific Software
0	
  

March 26, 2013
© 2010 OpenEye Scientific Software

Kendall	
  Tau	
  Correla5on	
  Coefficient	
  
1.00	
  

0.95	
  

0.90	
  

0.85	
  

0.80	
  

0.75	
  

0.70	
  

0.65	
  

0.60	
  

0.55	
  

0.50	
  

0.45	
  

0.40	
  

0.35	
  

0.30	
  

0.25	
  

0.20	
  

0.15	
  

0.10	
  

Number	
  of	
  Targets	
  

ROCS vs FastROCS Histogram
12	
  

10	
  

8	
  

6	
  

4	
  

2	
  

More Related Content

PDF
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
NVIDIA Japan
 
PDF
Opportunities of ML-based data analytics in ABCI
Ryousei Takano
 
PDF
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
Ryousei Takano
 
PPTX
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Ural-PDC
 
PPTX
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
PPTX
Become a GC Hero
Tier1app
 
PPTX
Become a Garbage Collection Hero
Tier1app
 
PDF
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Yukio Saito
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
NVIDIA Japan
 
Opportunities of ML-based data analytics in ABCI
Ryousei Takano
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
Ryousei Takano
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Ural-PDC
 
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
Become a GC Hero
Tier1app
 
Become a Garbage Collection Hero
Tier1app
 
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Yukio Saito
 

What's hot (20)

PDF
Japan Lustre User Group 2014
Hitoshi Sato
 
PDF
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
Hitoshi Sato
 
PDF
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
PDF
numPYNQ: accelerating NumPy on PYNQ
NECST Lab @ Politecnico di Milano
 
PPTX
Secure lustre on openstack
James Beal
 
PPTX
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
NETFest
 
PDF
Hot Cloud'16: An Experiment on Bare-Metal BigData Provisioning
Ata Turk
 
PDF
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
 
PDF
計算力学シミュレーションに GPU は役立つのか?
Shinnosuke Furuya
 
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
 
PDF
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
 
DOC
Mandriva 2011 x86_64 rpm.lst
St Louis MUG
 
PPTX
Debugging CUDA applications
Rogue Wave Software
 
PDF
OCDET Activity and Glusterfs
Masanori Itoh
 
PDF
RAPIDS Overview
NVIDIA Japan
 
PDF
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Hitoshi Sato
 
PDF
Vacuum more efficient than ever
Masahiko Sawada
 
PDF
PG-Strom
Kohei KaiGai
 
PDF
SSD & HDD Performance Testing with TKperf
Werner Fischer
 
Japan Lustre User Group 2014
Hitoshi Sato
 
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
Hitoshi Sato
 
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
numPYNQ: accelerating NumPy on PYNQ
NECST Lab @ Politecnico di Milano
 
Secure lustre on openstack
James Beal
 
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
NETFest
 
Hot Cloud'16: An Experiment on Bare-Metal BigData Provisioning
Ata Turk
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
 
計算力学シミュレーションに GPU は役立つのか?
Shinnosuke Furuya
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
 
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
 
Mandriva 2011 x86_64 rpm.lst
St Louis MUG
 
Debugging CUDA applications
Rogue Wave Software
 
OCDET Activity and Glusterfs
Masanori Itoh
 
RAPIDS Overview
NVIDIA Japan
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Hitoshi Sato
 
Vacuum more efficient than ever
Masahiko Sawada
 
PG-Strom
Kohei KaiGai
 
SSD & HDD Performance Testing with TKperf
Werner Fischer
 
Ad

Viewers also liked (17)

PPT
PAUTES ENDEVINALLA
ldeniau
 
PPTX
Islam
almiklas
 
PDF
Making WordPress Themes and Plugins Translation Ready
WordPressBirmingham
 
PDF
Mature, Episode 9: "Hipsters"
stealmyscripts
 
PPTX
Anna Evaluation Question 1 Final
salesian2014as
 
PPTX
How does your media product represent particular social
salesian2014as
 
PPTX
Evaluation 4
salesian2014as
 
PPTX
In what ways does your media product use
salesian2014as
 
PPTX
Question 2
salesian2014as
 
PPTX
Benidorm
Inma Cuellar
 
PDF
LA CARTA DEI FONDAMENTI DI LGH - Missione, Visione, Valori
Roberta Coruzzi
 
PPTX
Online reputation management for car dealerships car dealer reputation mark...
Andrew Wroblewski
 
PDF
Annexures
Rk Kannan
 
DOC
A c partea 1
Ionut Tabara
 
PPTX
Dictionaries
Ana Elliot
 
PDF
nik-nak tweetalige kinderboekjes
Chris Sterkens
 
PAUTES ENDEVINALLA
ldeniau
 
Islam
almiklas
 
Making WordPress Themes and Plugins Translation Ready
WordPressBirmingham
 
Mature, Episode 9: "Hipsters"
stealmyscripts
 
Anna Evaluation Question 1 Final
salesian2014as
 
How does your media product represent particular social
salesian2014as
 
Evaluation 4
salesian2014as
 
In what ways does your media product use
salesian2014as
 
Question 2
salesian2014as
 
Benidorm
Inma Cuellar
 
LA CARTA DEI FONDAMENTI DI LGH - Missione, Visione, Valori
Roberta Coruzzi
 
Online reputation management for car dealerships car dealer reputation mark...
Andrew Wroblewski
 
Annexures
Rk Kannan
 
A c partea 1
Ionut Tabara
 
Dictionaries
Ana Elliot
 
nik-nak tweetalige kinderboekjes
Chris Sterkens
 
Ad

Similar to Molecular Shape Searching on GPUs: A Brave New World (20)

PDF
OpenCL & the Future of Desktop High Performance Computing in CAD
Design World
 
PDF
LCU13: GPGPU on ARM Experience Report
Linaro
 
PDF
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
PPTX
Graphic Processing Unit (GPU)
Jafar Khan
 
PDF
Advances in GPU Computing
Frédéric Parienté
 
PPT
Current Trends in HPC
Putchong Uthayopas
 
PDF
Cuda
Gopi Saiteja
 
PDF
Gpu perf-presentation
GiannisTsagatakis
 
PDF
GPGPU Computation
jtsagata
 
PDF
Introduction to OpenCL
Unai Lopez-Novoa
 
PDF
Newbie’s guide to_the_gpgpu_universe
Ofer Rosenberg
 
PDF
Computing using GPUs
Shree Kumar
 
PDF
clWrap: Nonsense free control of your GPU
John Colvin
 
PDF
Nvidia in bioinformatics
Shanker Trivedi
 
PPTX
OpenACC Monthly Highlights: November 2020
OpenACC
 
PPTX
GPU Computing: A brief overview
Rajiv Kumar
 
PDF
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
AMD Developer Central
 
PPTX
OpenACC Monthly Highlights September 2020
OpenACC
 
PPTX
Graphics processing unit ppt
Sandeep Singh
 
PDF
Increasing Cluster Performance by Combining rCUDA with Slurm
inside-BigData.com
 
OpenCL & the Future of Desktop High Performance Computing in CAD
Design World
 
LCU13: GPGPU on ARM Experience Report
Linaro
 
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Graphic Processing Unit (GPU)
Jafar Khan
 
Advances in GPU Computing
Frédéric Parienté
 
Current Trends in HPC
Putchong Uthayopas
 
Gpu perf-presentation
GiannisTsagatakis
 
GPGPU Computation
jtsagata
 
Introduction to OpenCL
Unai Lopez-Novoa
 
Newbie’s guide to_the_gpgpu_universe
Ofer Rosenberg
 
Computing using GPUs
Shree Kumar
 
clWrap: Nonsense free control of your GPU
John Colvin
 
Nvidia in bioinformatics
Shanker Trivedi
 
OpenACC Monthly Highlights: November 2020
OpenACC
 
GPU Computing: A brief overview
Rajiv Kumar
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
AMD Developer Central
 
OpenACC Monthly Highlights September 2020
OpenACC
 
Graphics processing unit ppt
Sandeep Singh
 
Increasing Cluster Performance by Combining rCUDA with Slurm
inside-BigData.com
 

More from Can Ozdoruk (16)

PPTX
ROAD FROM $0 TO $10M: 10 GROWTH TIPS
Can Ozdoruk
 
PDF
Cloudinary Webinar Responsive Images
Can Ozdoruk
 
PDF
Image optimization q_auto - f_auto
Can Ozdoruk
 
PDF
Boomerang-ConsumerElectronics-RAR
Can Ozdoruk
 
PDF
White-Paper-Consumer-Electronics
Can Ozdoruk
 
PDF
Boomerang-Toys-RAR
Can Ozdoruk
 
PDF
SacramentoKings_Case-Study
Can Ozdoruk
 
PDF
Product Marketing 101
Can Ozdoruk
 
PDF
AMBER14 & GPUs
Can Ozdoruk
 
PDF
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem
Can Ozdoruk
 
PDF
Supercharging MD Simulations with GPUs
Can Ozdoruk
 
PDF
NVIDIA Tesla K40 GPU
Can Ozdoruk
 
PDF
Introduction to SeqAn, an Open-source C++ Template Library
Can Ozdoruk
 
PDF
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Can Ozdoruk
 
PDF
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUs
Can Ozdoruk
 
PDF
AMBER and Kepler GPUs
Can Ozdoruk
 
ROAD FROM $0 TO $10M: 10 GROWTH TIPS
Can Ozdoruk
 
Cloudinary Webinar Responsive Images
Can Ozdoruk
 
Image optimization q_auto - f_auto
Can Ozdoruk
 
Boomerang-ConsumerElectronics-RAR
Can Ozdoruk
 
White-Paper-Consumer-Electronics
Can Ozdoruk
 
Boomerang-Toys-RAR
Can Ozdoruk
 
SacramentoKings_Case-Study
Can Ozdoruk
 
Product Marketing 101
Can Ozdoruk
 
AMBER14 & GPUs
Can Ozdoruk
 
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem
Can Ozdoruk
 
Supercharging MD Simulations with GPUs
Can Ozdoruk
 
NVIDIA Tesla K40 GPU
Can Ozdoruk
 
Introduction to SeqAn, an Open-source C++ Template Library
Can Ozdoruk
 
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Can Ozdoruk
 
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUs
Can Ozdoruk
 
AMBER and Kepler GPUs
Can Ozdoruk
 

Recently uploaded (20)

PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Software Development Methodologies in 2025
KodekX
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Doc9.....................................
SofiaCollazos
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 

Molecular Shape Searching on GPUs: A Brave New World

  • 1. FastROCS: What does it mean to be “fast”? OpenEye Scientific Software Brian Cole March 26, 2013 © 2013 OpenEye Scientific Software
  • 2. FastROCS and the “Chasm” OpenEye Scientific Software Brian Cole March 26, 2013 © 2013 OpenEye Scientific Software
  • 3. ROCS: Rapid Overlay of Chemical Structures March 26, 2013 © 2013 OpenEye Scientific Software
  • 4. LeadHopper March 26, 2013 © 2013 OpenEye Scientific Software
  • 5. And then you wait… March 26, 2013 © 2013 OpenEye Scientific Software
  • 6. High   is   Best   Shape  Overlays  per  Second   What is FastROCS? CPU   © 2013 OpenEye Scientific Software GPU  
  • 7. What is FastROCS? High   is   Best   Shape  Overlays  per  Second   1,000,000   100,000   10,000   1,000   100   10   1   CPU   © 2013 OpenEye Scientific Software GPU  
  • 8. What is FastROCS? High   is   Best   Shape  Overlays  per  Second   600,000   500,000   400,000   300,000   200,000   100,000   0   CPU   ©  2013  OpenEye  Scien;fic  So>ware   GPU  
  • 9. Low   is   Best   Log  (Elapsed  5me  in  seconds)   But I want it now! 100,000   10,000   ROCS   1,000   100   FastROCS   10   1   1   10   Log  (cores/GPUs)   March 26, 2013 © 2013 OpenEye Scientific Software 100  
  • 10. Riding Moore’s Law High   is   Best   Shape  Overlays  per  Second   2,000,000   1,800,000   1,600,000   1,400,000   1,200,000   1,000,000   800,000   600,000   400,000   200,000   0   C1060   C2050   C2075   C2090   March 26, 2013 © 2013 OpenEye Scientific Software K10   K20  
  • 11. ROCS user base •  •  •  •  •  Every Pharma R&D Many BioTechs Many Universities National Labs and Research Centers Other software companies March 26, 2013 © 2013 OpenEye Scientific Software
  • 12. Licenses by Year High   is   Best   2009   March 26, 2013 ROCS   FastROCS   2010   2011   © 2013 OpenEye Scientific Software 2012  
  • 13. Licenses by Year (Linear Scale) Pharmageddon     ROCS   FastROCS   %15   2009   March 26, 2013 2010   2011   © 2013 OpenEye Scientific Software 2012  
  • 14. All ROCS users (linear scale) Academics   ROCS   FastROCS   %3   2009   March 26, 2013 2010   2011   2012   © 2013 OpenEye Scientific Software
  • 15. Technology Adoption Lifecycle %2.5   %13.5   %34   %34   FastROCS   March 26, 2013 © 2013 OpenEye Scientific Software %16  
  • 16. What’s in the “chasm”? •  “ROCS is already fast enough” Some  other  ;me…   •  “The results aren’t bitwise comparable” •  “There’s nothing else to run on the GPU” •  “GPUs are different” March 26, 2013 © 2013 OpenEye Scientific Software GTC!  
  • 17. FastROCS Quick Start •  •  •  •  •  •  crtl-alt-F1 (to switch to a non X-server terminal) login as root /sbin/init 3 (to turn off the X-server) ./NVIDIA-Linux-x86_64-285.05.09.run reboot ./cuda.sh to give /dev/nvidia* correct permissions •  tar –xzf fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz •  openeye/bin/ShapeDatabaseServer.py database.oeb.gz •  openeye/bin/ShapeDatabaseClient.py localhost:8080 query.sdf out.sdf March 26, 2013 © 2013 OpenEye Scientific Software
  • 18. ROCS Quick Start S;ll  a  barrier  to  entry  to  work  around!   •  tar –xzf ROCS-3.1.1-RHEL5-x64.tar.gz •  openeye/bin/rocs query.sdf database.oeb.gz March 26, 2013 © 2013 OpenEye Scientific Software
  • 19. This is even worse! fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz NVidia  OpenCL  binaries  are  ;ghtly     locked  to  a  par;cular  driver  version   March 26, 2013 © 2013 OpenEye Scientific Software
  • 20. Worthwhile to upgrade 800,000   High   is   Best   Conformers  /  Second   700,000   %11   600,000   500,000   400,000   300,000   200,000   100,000   0   C2050  (260  Driver)   March 26, 2013 © 2013 OpenEye Scientific Software C2050  (295  Driver)  
  • 21. Needed for new hardware 1,200,000   High   is   Best   Conformers  /  Second   1,000,000   800,000   600,000   400,000   200,000   0   C2050  (295  Driver)   March 26, 2013 © 2013 OpenEye Scientific Software M2090  (295  Driver)  
  • 22. High   is   Best   Speedup  (Single  GPU  5me  /  Mul5-­‐GPU  5me)   Scalability between drivers (4x C2050) 4   3   Ideal   260  driver   2   295  driver   1   1   March 26, 2013 2   3   Number  of  GPUs     © 2013 OpenEye Scientific Software 4  
  • 23. High   is   Best   Speedup  (Single  GPU  5me  /  Mul5-­‐GPU  5me)   Really bad for 8x M2090 8   7   6   5   4   3   2   1   0   1   2   3   4   5   Number  of  GPUs     March 26, 2013 © 2013 OpenEye Scientific Software 6   7   8  
  • 24. Ways to transfer to device •  CL_MEM_USE_HOST_PTR –  •  CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR –  •  kernelBuf = clCreateBuffer() - cacheable ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) memcpy(ptr, data) clEnqueueUnmapMemObject(ptr) clEnqueueWriteBuffer –  –  •  kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) memcpy(ptr, data) clEnqueueUnmapMemObject(ptr) clEnqueueMapBuffer –  –  –  –  •  kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR) CL_MEM_ALLOC_HOST_PTR –  –  –  –  •  kernelBuf = clCreateBuffer(CL_MEM_USE_HOST_PTR) kernelBuf = clCreateBuffer() - cacheable clEnqueueWriteBuffer(kernelBuf, data) oclCopyCompute –  –  –  –  –  pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) – cacheable pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable memcpy(pinnedPtr, data) kernelBuf = clCreateBuffer() – cacheable clEnqueueWriteBuffer(kernelBuf, pinnedPtr) March 26, 2013 © 2013 OpenEye Scientific Software
  • 25. Ways to transfer from device •  CL_MEM_ALLOC_HOST_PTR –  –  –  –  kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) memcpy(data, ptr) clEnqueueUnmapMemObject(ptr) •  clEnqueueMapBuffer –  –  –  –  kernelBuf = clCreateBuffer() - cacheable ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) memcpy(data, ptr) clEnqueueUnmapMemObject(ptr) •  clEnqueueReadBuffer –  kernelBuf = clCreateBuffer() - cacheable –  clEnqueueWriteBuffer(kernelBuf, data) •  oclCopyCompute –  pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) – cacheable –  pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable –  memcpy(pinnedPtr, data) –  kernelBuf = clCreateBuffer() – cacheable –  clEnqueueReadBuffer(kernelBuf, pinnedPtr) March 26, 2013 © 2013 OpenEye Scientific Software
  • 26. Speedup  (Time  Sequen5al  /  Time  Parallel)   FastROCS  scalability  across  8x  M2070   9   8   7   6   5   4   3   2   1   0   1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5  5  6  6  6  6  6  7  7  7  7  7  8  8  8  8  8   Number  of  GPUs  U5lized   March 26, 2013 © 2013 OpenEye Scientific Software
  • 27. Lessons from the mess •  clEnqueueWriteBuffer > clEnqueueMapBuffer •  clEnqueueMapBuffer >> clEnqueueReadBuffer •  CL_MEM_* constants aren’t worth the effort March 26, 2013 © 2013 OpenEye Scientific Software
  • 28. CUDA? •  Serious customers will only use NVidia cards •  Pinned memory •  Better support for binaries and compatibility •  CUDA support >> OpenCL support March 26, 2013 © 2013 OpenEye Scientific Software
  • 29. FastROCS CUDA port High   is   Best   Confomers  per  Second   3,000,000   2,500,000   2,000,000   2xC2075   1,500,000   2xC2090   1,000,000   2xK20   500,000   0   OpenCL   March 26, 2013 CUDA   © 2013 OpenEye Scientific Software CUDA-­‐ pinned  
  • 30. CUDA Scaling? High   is   Best   Conformers  per  Second   8,000,000   7,000,000   6,000,000   5,000,000   4,000,000   CUDA   3,000,000   OpenCL   2,000,000   Ideal   1,000,000   0   1   2   3   4   5   6   7   8   Number  of  individual  K10  GPUs     (Note,  each  K10  has  2  physical  GPUs  on  the  board)   March 26, 2013 © 2013 OpenEye Scientific Software
  • 31. CUDA vs OpenCL: Ding Ding! •  Portability vs Innovation •  NVidia vs Intel and AMD •  Open vs Proprietary •  Customers don’t care… March 26, 2013 © 2013 OpenEye Scientific Software
  • 32. ROCS Implementations •  We only care a little… •  •  •  •  •  •  Fortran code (1995) C code (1999) C++ wrapper code (2003) OpenCL code (2009) CUDA code (2012) C++ thread-safe code (2013) March 26, 2013 © 2013 OpenEye Scientific Software
  • 33. OpenEye Software •  Lots of Software –  14 products –  13 software libraries •  C++ (no SIMD) –  2.5 million lines •  Python –  416 thousand lines •  Java –  63 thousand lines •  C# –  38 thousand lines ©  2012  OpenEye  Scien;fic  So>ware  
  • 34. The People 10   20   Programmers   Hardcore  Scripter   Other  stuff   12   •  GPGPU = ½ of a developer –  Only %2.5 of development effort © 2012 OpenEye Scientific Software
  • 35. Technology Adoption Lifecycle %2.5   %13.5   %34   %34   %16   OpenEye  GPGPU  development   March 26, 2013 © 2013 OpenEye Scientific Software
  • 36. LinkedIn skills %2.2   March 26, 2013 © 2013 OpenEye Scientific Software
  • 37. Technology Adoption Lifecycle %2.5   %13.5   %34   %34   GPGPU  development   March 26, 2013 © 2013 OpenEye Scientific Software %16  
  • 38. I Believe… •  GPGPU computing can become ubiquitous… •  By expressing parallelism everywhere… •  We can make it easy for our customers… –  Pre-installed in every operating system –  Integrated seamlessly into every language –  Then eventually becoming the CPU March 26, 2013 © 2013 OpenEye Scientific Software
  • 39. Acknowledgements •  Nikolai Sakharnykh (NVidia) •  Dave Mullaly (HP) •  Exxact Computing March 26, 2013 © 2013 OpenEye Scientific Software
  • 40. Father of “ROCS” Andrew Grant April 28th 1963 December 29th 2012 March 26, 2013 © 2013 OpenEye Scientific Software
  • 41. March 26, 2013 © 2013 OpenEye Scientific Software
  • 42. DUD  Average  AUC   Dude, where’s my color? 0.9   0.8   0.7   0.6   0.5   0.4   0.3   0.2   0.1   0   Shape  Only   With  Color   ROCS   March 26, 2013 FastROCS   © 2010 OpenEye Scientific Software
  • 43. 0   March 26, 2013 © 2010 OpenEye Scientific Software Kendall  Tau  Correla5on  Coefficient   1.00   0.95   0.90   0.85   0.80   0.75   0.70   0.65   0.60   0.55   0.50   0.45   0.40   0.35   0.30   0.25   0.20   0.15   0.10   Number  of  Targets   ROCS vs FastROCS Histogram 12   10   8   6   4   2