SlideShare a Scribd company logo
Adjusting primitives for graph
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list
based graph representation that is
Multiply with different modes (map)
Sequential OpenMP CUDA
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
float bfloat16
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
Sequential OpenMP CUDA (memcpy, in-place)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
sum-loop sum-reduce
one-loop atomic-add
block-loop template, next-pow2 launch one-reduce, next-pow2 launch
block-loop template, prev. pow2 launch one-reduce, prev-pow2 launch
grid-loop
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting primitives for graph : SHORT REPORT / NOTES

More Related Content

Similar to Adjusting primitives for graph : SHORT REPORT / NOTES (7)

PDF
Massive parallelism with gpus for centrality ranking in complex networks
ijcsit
 
PDF
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
NAVER D2
 
PPTX
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Maribel Acosta Deibe
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
PDF
Advances in GPU Computing
Frédéric Parienté
 
PDF
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
MLconf
 
Massive parallelism with gpus for centrality ranking in complex networks
ijcsit
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
NAVER D2
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Maribel Acosta Deibe
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Advances in GPU Computing
Frédéric Parienté
 
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
MLconf
 

More from Subhajit Sahu (20)

PDF
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
Subhajit Sahu
 
PDF
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
PDF
Adjusting Bitset for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
PDF
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
PDF
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Subhajit Sahu
 
PDF
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Subhajit Sahu
 
PDF
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 
PDF
Shared memory Parallelism (NOTES)
Subhajit Sahu
 
PDF
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Subhajit Sahu
 
PDF
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Subhajit Sahu
 
PDF
Application Areas of Community Detection: A Review : NOTES
Subhajit Sahu
 
PDF
Community Detection on the GPU : NOTES
Subhajit Sahu
 
PDF
Survey for extra-child-process package : NOTES
Subhajit Sahu
 
PDF
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
 
PDF
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Subhajit Sahu
 
PDF
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
 
PDF
Can you fix farming by going back 8000 years : NOTES
Subhajit Sahu
 
PDF
HITS algorithm : NOTES
Subhajit Sahu
 
PDF
Basic Computer Architecture and the Case for GPUs : NOTES
Subhajit Sahu
 
PDF
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
 
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Adjusting Bitset for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Subhajit Sahu
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Subhajit Sahu
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 
Shared memory Parallelism (NOTES)
Subhajit Sahu
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Subhajit Sahu
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Subhajit Sahu
 
Application Areas of Community Detection: A Review : NOTES
Subhajit Sahu
 
Community Detection on the GPU : NOTES
Subhajit Sahu
 
Survey for extra-child-process package : NOTES
Subhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Subhajit Sahu
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
 
Can you fix farming by going back 8000 years : NOTES
Subhajit Sahu
 
HITS algorithm : NOTES
Subhajit Sahu
 
Basic Computer Architecture and the Case for GPUs : NOTES
Subhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
 
Ad

Recently uploaded (20)

PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Ad

Adjusting primitives for graph : SHORT REPORT / NOTES

  • 1. Adjusting primitives for graph Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is Multiply with different modes (map) Sequential OpenMP CUDA 1. Performance of sequential execution based vs OpenMP based vector multiply. 2. Comparing various launch configs for CUDA based vector multiply. Sum with different storage types (reduce) float bfloat16 1. Performance of vector element sum using float vs bfloat16 as the storage type. Sum with different modes (reduce) Sequential OpenMP CUDA (memcpy, in-place) 1. Performance of sequential execution based vs OpenMP based vector element sum. 2. Performance of memcpy vs in-place based CUDA based vector element sum. 3. Comparing various launch configs for CUDA based vector element sum (memcpy). 4. Comparing various launch configs for CUDA based vector element sum (in-place). Sum with in-place strategies of CUDA mode (reduce) sum-loop sum-reduce one-loop atomic-add block-loop template, next-pow2 launch one-reduce, next-pow2 launch block-loop template, prev. pow2 launch one-reduce, prev-pow2 launch grid-loop 1. Comparing various launch configs for CUDA based vector element sum (in-place).