SlideShare a Scribd company logo
(ATS03-PLAT08) Optimizing Protocol
             Performance
Andrew LeBeau                             Eddy Vande Water
Advisory Product Manager     Director, EMEA Field Application
andrew.lebeau@accelrys.com   eddy.vandewater@accelrys.com
The information on the roadmap and future software development efforts are
intended to outline general product direction and should not be relied on in making
a purchasing decision.
Agenda

•   Profiling and Refactoring
•   Data Access
•   Data Computing
•   Others key T&T
•   Server optimization
•   Summary
Protocol Refactoring

• Consider the first version (V1) of a “complete” protocol…perhaps
  ~30 components
• Protocol building is typically an incremental process with much
  iterative design. Therefore, completion of V1 represents the
  documentation of an intellectual process
• However, very significant optimizations can be achieved by
  reviewing V1 and considering major (perhaps complete)
  refactoring of the protocol, using the knowledge developed from
  building V1
Component Profiling

• Identify protocol bottlenecks
• Ctrl+T to toggle between options
   – Absolute compute time (sec)
   – Compute time as percentage of total execution time
Demo: Protocol version 01




                            Protocol development flow
                            •   Get big file with activity on several
                                targets and lot of other props.
                            •   Need to pivot data
                            •   Only interested by one target
                            •   Need structure
                            •   Join my activity
                            •   Compute new property
                            •   Need additional data from db
                            •   Only interested by a range of data
                            •   Create nice report
Demo: Protocol version 02




                                              Why?
                                              Because I used some simple principles!



           28 seconds instead of 6 minutes!
Data Access

• Keep the records as small as possible to do what you
  need. Don’t read in things just because they are there in
  the file; only read it what you will use! Don’t pass
  anything further down the pipeline than it is needed.

• If writing to disk to pass information between
  pipelines, caches are faster than delimited text (or any
  other file).
Merge / Join / Group / Sort / Cluster / etc.

•   All create implicit caches
•   Filter before merging/caching
•   Reduce the number of properties
•   Merge on a sub-stream then join back
•   Sort before join – on the primary key
•   Cache Writer: Use Pre-Index options if the cache will later
    be joined on
Database

• Database access should be tuned:
   – See
        • (ATS3-PLAT04) Database Connectivity for Application Development
        • (ATS2-23) Managing Data Source Connections
   –   PP should be located close to the database server
   –   Join in the database if possible
   –   Use batch inserts, etc.
   –   Use batches with the SQL Select for Each Data
When and Where to Calculate Properties

• Think about the order you need to do things




• Compared with…
Parallel Processing in Subprotocols

• Allows parallelization of
  computationally intensive tasks
• Need to pay attention to batch size
  – don’t make it too small
    – Performance can be almost linear with
      number of cores (our numbers and
      customers’)
• Can be problematic for subprotocols
  using R, or other external apps
Demo: Protocol version 2.0
Others key T&T

• Prefer linear pipelines
   – Most efficient memory usage
• Avoid excessive branching
   – Branching pipes causes data cloning. This can be expensive for large data
     records
• Avoid hash tables as caches
   – Use a file cache
• Reduce usage of caches and caching components
   – Merge, Group, Sort and Cluster create unseen caches
   – Be mindful of children nodes
Others key T&T ctd.
• General relative speed of implementations:
   – Components >= Pilot Script >= Java
• Protocol Function
   – Use AJAX to call a protocol within a page
   – Can provide better performance if only needs to update part of a report

• Be careful!
   – Run To Completion (RTC) subprotocol can slow down protocol execution:
     Use sparingly…
   – Check point are very good to debug but should not be kept while protocol is
     finished.
__PoolID


                          • PP Server uses daemons and job pooling
                            to speed up executing jobs




• Setting __PoolID sets which job pool your protocol is executed in
• You CANNOT put the __PoolID parameter on the protocol itself

=> Admin discussion in (ATS3-PLAT11) Advanced Planning for AEP Deployments_Migrations
Job Pooling Illustration
Using __PoolID

• Built-in job pools (Some job pools are configured to run OOTB):
    – Warm-up pool
    – Keep-warm pool
    – Default pool
• Job pools and impersonation
Server optimization
   See (ATS3-PLAT11) Advanced Planning for AEP Deployments_Migrations
   And also (ATS2-07) Solving Large Computing Challenges with Pipeline Pilot

     Cluster                • Built into Pipeline Pilot

                            • Leverages Existing Grid Engine
                              • Sun GridEngine
          Grid                • PBS Pro
                              • LSF
                              • Custom Scripts
Summary

• Protocol Refactoring is a very critical step.
• Application of basic principles can improve dramatically
  performances
• Fine tuning needs good knowledge of the context
• Use a specific job pool for your apps
• Accelrys Enterprise Platform is very scalable.
The information on the roadmap and future software development efforts are
intended to outline general product direction and should not be relied on in making
a purchasing decision.


For more information on the Accelrys Tech Summits and other IT & Developer
information, please visit:
https://community.accelrys.com/groups/it-dev

More Related Content

What's hot (20)

PDF
Interactive Visualization of Streaming Data Powered by Spark
Spark Summit
 
PDF
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
PDF
2016 may-countdown-to-postgres-v96-parallel-query
Ashnikbiz
 
PDF
Managing ADLS gen2 using Apache Spark
Databricks
 
PDF
NoSQL – Data Center Centric Application Enablement
DATAVERSITY
 
PDF
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 
PDF
High Performance Deep learning with Apache Spark
Rui Liu
 
PDF
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Databricks
 
PDF
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
Gary Jackson MBCS
 
PDF
#GeodeSummit - Redis to Geode Adaptor
PivotalOpenSourceHub
 
PPTX
Replacing Oracle with EDB Postgres
EDB
 
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
PDF
All Aboard the Databus
Amy W. Tang
 
KEY
Writing Scalable Software in Java
Ruben Badaró
 
PDF
Introduction to Databus
Amy W. Tang
 
PDF
Supporting Over a Thousand Custom Hive User Defined Functions
Databricks
 
PPTX
Time-oriented event search. A new level of scale
DataWorks Summit/Hadoop Summit
 
PDF
Building Apps with Distributed In-Memory Computing Using Apache Geode
PivotalOpenSourceHub
 
PPTX
Apache spot 系統架構
Hua Chu
 
PDF
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
GlobalLogic Ukraine
 
Interactive Visualization of Streaming Data Powered by Spark
Spark Summit
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
2016 may-countdown-to-postgres-v96-parallel-query
Ashnikbiz
 
Managing ADLS gen2 using Apache Spark
Databricks
 
NoSQL – Data Center Centric Application Enablement
DATAVERSITY
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 
High Performance Deep learning with Apache Spark
Rui Liu
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Databricks
 
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
Gary Jackson MBCS
 
#GeodeSummit - Redis to Geode Adaptor
PivotalOpenSourceHub
 
Replacing Oracle with EDB Postgres
EDB
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
All Aboard the Databus
Amy W. Tang
 
Writing Scalable Software in Java
Ruben Badaró
 
Introduction to Databus
Amy W. Tang
 
Supporting Over a Thousand Custom Hive User Defined Functions
Databricks
 
Time-oriented event search. A new level of scale
DataWorks Summit/Hadoop Summit
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
PivotalOpenSourceHub
 
Apache spot 系統架構
Hua Chu
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
GlobalLogic Ukraine
 

Similar to (ATS3-PLAT08) Optimizing Protocol Performance (20)

PDF
(ATS6-PLAT06) Maximizing AEP Performance
BIOVIA
 
PPTX
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
BIOVIA
 
PDF
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
MongoDB
 
PPTX
(ATS3-DEV04) Introduction to Pipeline Pilot Protocol Development for Developers
BIOVIA
 
PDF
Intro to NoSQL and MongoDB
DATAVERSITY
 
PDF
20111104 s4 overview
Leo Neumeyer
 
PDF
(ATS4-PLAT03) Balancing Security with access for Development
BIOVIA
 
PPTX
(ATS3-DEV05) Coding up Pipeline Pilot Components
BIOVIA
 
PPTX
Application architecture for cloud
Marco Parenzan
 
PDF
Sc12 workshop-writeup
Aaron Zauner
 
PDF
Capstone Report - Industrial Attachment Program (IAP) Evaluation Portal
Akshit Arora
 
PDF
What drives Innovation? Innovations And Technological Solutions for the Distr...
Stefano Fago
 
PPTX
sat_presentation
Mookambika A
 
PPTX
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
ADVA
 
PDF
Building data intensive applications
Amit Kejriwal
 
PDF
Xldb2011 wed 1415_andrew_lamb-buildingblocks
liqiang xu
 
PPT
Agile Data Science: Hadoop Analytics Applications
Russell Jurney
 
PDF
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
LinkedIn
 
PDF
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
adrianionel
 
PDF
Big Data: Movement, Warehousing, & Virtualization
tervela
 
(ATS6-PLAT06) Maximizing AEP Performance
BIOVIA
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
BIOVIA
 
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
MongoDB
 
(ATS3-DEV04) Introduction to Pipeline Pilot Protocol Development for Developers
BIOVIA
 
Intro to NoSQL and MongoDB
DATAVERSITY
 
20111104 s4 overview
Leo Neumeyer
 
(ATS4-PLAT03) Balancing Security with access for Development
BIOVIA
 
(ATS3-DEV05) Coding up Pipeline Pilot Components
BIOVIA
 
Application architecture for cloud
Marco Parenzan
 
Sc12 workshop-writeup
Aaron Zauner
 
Capstone Report - Industrial Attachment Program (IAP) Evaluation Portal
Akshit Arora
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
Stefano Fago
 
sat_presentation
Mookambika A
 
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
ADVA
 
Building data intensive applications
Amit Kejriwal
 
Xldb2011 wed 1415_andrew_lamb-buildingblocks
liqiang xu
 
Agile Data Science: Hadoop Analytics Applications
Russell Jurney
 
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
LinkedIn
 
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
adrianionel
 
Big Data: Movement, Warehousing, & Virtualization
tervela
 
Ad

More from BIOVIA (20)

PPTX
ScienceCloud: Collaborative Workflows in Biologics R&D
BIOVIA
 
PDF
(ATS6-PLAT03) What's behind Discngine collections
BIOVIA
 
PDF
(ATS6-PLAT09) Deploying Applications on load balanced AEP servers for high av...
BIOVIA
 
PDF
(ATS6-PLAT07) Managing AEP in an enterprise environment
BIOVIA
 
PDF
(ATS6-PLAT05) Security enhancements in AEP 9
BIOVIA
 
PDF
(ATS6-PLAT04) Query service
BIOVIA
 
PDF
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
PDF
(ATS6-PLAT01) Chemistry Harmonization: Bringing together the Direct 9 and Pip...
BIOVIA
 
PDF
(ATS6-GS04) Performance Analysis of Accelrys Enterprise Platform 9.0 on IBM’s...
BIOVIA
 
PDF
(ATS6-GS02) Integrating Contur and HEOS
BIOVIA
 
PDF
(ATS6-GS01) Welcome
BIOVIA
 
PDF
(ATS6-DEV09) Deep Dive into REST and SOAP Integration for Protocol Authors
BIOVIA
 
PDF
(ATS6-DEV08) Integrating Contur ELN with other systems using a RESTful API
BIOVIA
 
PDF
(ATS6-DEV07) Building widgets for ELN home page
BIOVIA
 
PDF
(ATS6-DEV06) Using Packages for Protocol, Component, and Application Delivery
BIOVIA
 
PDF
(ATS6-DEV05) Building Interactive Web Applications with the Reporting Collection
BIOVIA
 
PDF
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...
BIOVIA
 
PDF
(ATS6-DEV03) Building an Enterprise Web Solution with AEP
BIOVIA
 
PDF
(ATS6-DEV02) Web Application Strategies
BIOVIA
 
PDF
(ATS6-DEV01) What’s new for Protocol and Component Developers in AEP 9.0
BIOVIA
 
ScienceCloud: Collaborative Workflows in Biologics R&D
BIOVIA
 
(ATS6-PLAT03) What's behind Discngine collections
BIOVIA
 
(ATS6-PLAT09) Deploying Applications on load balanced AEP servers for high av...
BIOVIA
 
(ATS6-PLAT07) Managing AEP in an enterprise environment
BIOVIA
 
(ATS6-PLAT05) Security enhancements in AEP 9
BIOVIA
 
(ATS6-PLAT04) Query service
BIOVIA
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
(ATS6-PLAT01) Chemistry Harmonization: Bringing together the Direct 9 and Pip...
BIOVIA
 
(ATS6-GS04) Performance Analysis of Accelrys Enterprise Platform 9.0 on IBM’s...
BIOVIA
 
(ATS6-GS02) Integrating Contur and HEOS
BIOVIA
 
(ATS6-GS01) Welcome
BIOVIA
 
(ATS6-DEV09) Deep Dive into REST and SOAP Integration for Protocol Authors
BIOVIA
 
(ATS6-DEV08) Integrating Contur ELN with other systems using a RESTful API
BIOVIA
 
(ATS6-DEV07) Building widgets for ELN home page
BIOVIA
 
(ATS6-DEV06) Using Packages for Protocol, Component, and Application Delivery
BIOVIA
 
(ATS6-DEV05) Building Interactive Web Applications with the Reporting Collection
BIOVIA
 
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...
BIOVIA
 
(ATS6-DEV03) Building an Enterprise Web Solution with AEP
BIOVIA
 
(ATS6-DEV02) Web Application Strategies
BIOVIA
 
(ATS6-DEV01) What’s new for Protocol and Component Developers in AEP 9.0
BIOVIA
 
Ad

Recently uploaded (20)

PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
July Patch Tuesday
Ivanti
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 

(ATS3-PLAT08) Optimizing Protocol Performance

  • 1. (ATS03-PLAT08) Optimizing Protocol Performance Andrew LeBeau Eddy Vande Water Advisory Product Manager Director, EMEA Field Application [email protected] [email protected]
  • 2. The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision.
  • 3. Agenda • Profiling and Refactoring • Data Access • Data Computing • Others key T&T • Server optimization • Summary
  • 4. Protocol Refactoring • Consider the first version (V1) of a “complete” protocol…perhaps ~30 components • Protocol building is typically an incremental process with much iterative design. Therefore, completion of V1 represents the documentation of an intellectual process • However, very significant optimizations can be achieved by reviewing V1 and considering major (perhaps complete) refactoring of the protocol, using the knowledge developed from building V1
  • 5. Component Profiling • Identify protocol bottlenecks • Ctrl+T to toggle between options – Absolute compute time (sec) – Compute time as percentage of total execution time
  • 6. Demo: Protocol version 01 Protocol development flow • Get big file with activity on several targets and lot of other props. • Need to pivot data • Only interested by one target • Need structure • Join my activity • Compute new property • Need additional data from db • Only interested by a range of data • Create nice report
  • 7. Demo: Protocol version 02 Why? Because I used some simple principles! 28 seconds instead of 6 minutes!
  • 8. Data Access • Keep the records as small as possible to do what you need. Don’t read in things just because they are there in the file; only read it what you will use! Don’t pass anything further down the pipeline than it is needed. • If writing to disk to pass information between pipelines, caches are faster than delimited text (or any other file).
  • 9. Merge / Join / Group / Sort / Cluster / etc. • All create implicit caches • Filter before merging/caching • Reduce the number of properties • Merge on a sub-stream then join back • Sort before join – on the primary key • Cache Writer: Use Pre-Index options if the cache will later be joined on
  • 10. Database • Database access should be tuned: – See • (ATS3-PLAT04) Database Connectivity for Application Development • (ATS2-23) Managing Data Source Connections – PP should be located close to the database server – Join in the database if possible – Use batch inserts, etc. – Use batches with the SQL Select for Each Data
  • 11. When and Where to Calculate Properties • Think about the order you need to do things • Compared with…
  • 12. Parallel Processing in Subprotocols • Allows parallelization of computationally intensive tasks • Need to pay attention to batch size – don’t make it too small – Performance can be almost linear with number of cores (our numbers and customers’) • Can be problematic for subprotocols using R, or other external apps
  • 14. Others key T&T • Prefer linear pipelines – Most efficient memory usage • Avoid excessive branching – Branching pipes causes data cloning. This can be expensive for large data records • Avoid hash tables as caches – Use a file cache • Reduce usage of caches and caching components – Merge, Group, Sort and Cluster create unseen caches – Be mindful of children nodes
  • 15. Others key T&T ctd. • General relative speed of implementations: – Components >= Pilot Script >= Java • Protocol Function – Use AJAX to call a protocol within a page – Can provide better performance if only needs to update part of a report • Be careful! – Run To Completion (RTC) subprotocol can slow down protocol execution: Use sparingly… – Check point are very good to debug but should not be kept while protocol is finished.
  • 16. __PoolID • PP Server uses daemons and job pooling to speed up executing jobs • Setting __PoolID sets which job pool your protocol is executed in • You CANNOT put the __PoolID parameter on the protocol itself => Admin discussion in (ATS3-PLAT11) Advanced Planning for AEP Deployments_Migrations
  • 18. Using __PoolID • Built-in job pools (Some job pools are configured to run OOTB): – Warm-up pool – Keep-warm pool – Default pool • Job pools and impersonation
  • 19. Server optimization See (ATS3-PLAT11) Advanced Planning for AEP Deployments_Migrations And also (ATS2-07) Solving Large Computing Challenges with Pipeline Pilot Cluster • Built into Pipeline Pilot • Leverages Existing Grid Engine • Sun GridEngine Grid • PBS Pro • LSF • Custom Scripts
  • 20. Summary • Protocol Refactoring is a very critical step. • Application of basic principles can improve dramatically performances • Fine tuning needs good knowledge of the context • Use a specific job pool for your apps • Accelrys Enterprise Platform is very scalable.
  • 21. The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision. For more information on the Accelrys Tech Summits and other IT & Developer information, please visit: https://community.accelrys.com/groups/it-dev