SlideShare a Scribd company logo
A Case for Using Value Prediction to Improve Performance of Transactional Memory  Salil Pant Advisor:  Gregory Byrd
Transactional Memory “ Transaction” abstraction for programmers Optimistic speculative execution of critical sections TM can make parallel programming easier Problem - Serializing structures in programs Queues/Lists for memory and task management Linked list traversals for search TM fails to expose concurrency
Conflicts in TM lead to overhead Stalls, Restarts  Longer transactions suffer most Different conflict management schemes perform differently Alternative approaches need programmer intervention Distributed, Hierarchical queue approaches Scalability bottleneck  Amdahl’s law Our solution – data speculation on conflicting accesses
Motivation for prediction Serializing data is updated in a predictable manner Pointers: head, tail etc Sizes: constant increment/decrements Most conflicts come from a few data structures in the program Unlock parallelism with value prediction No change in the program Like a new conflict management system enqueue(elem* newE){  if (tail != NULL) tail->next = newE; else head = newE;  tail = newE;  queue_size++;   }  dequeue() { if(!head) return; elem* temp = head;  head = head->next; free(temp);  queue_size--; if (queue_size==0) { head=NULL; tail=NULL;  } }
Design  Base LogTM model  In-memory logging of old values during Xn stores Eager conflict detection, timestamps for conflict resolution and deadlock detection Commits are easy, aborts need memory + processor rollback.  Identification of predictable addresses  Add address to predictor on a Nack Trap stores for added addresses and create stride Predict on future conflicts Stride Value predictor  Predictor indexed by load address Predictor getting values from multiple processors Predict 32 bit loads  Memory-level and global
Structure of Predictor Conceptual  design VP entry, 1 per VP address List of store values, 2 in our design to create stride (RV1-3) List of predicted values, cpus (SV1-5, P1-5) Fortunately,  max 4 or 5 VP entries needed so far.  Simplifications  VP – address and timing
Allow transactions to run in parallel with predicted values  Need an extra buffer to hold predicted data. Cannot log predicted load value in memory  Needs changes in the coherence protocol More deadlock scenarios  Need messages to indicate prediction success/failure Validate predictions when owner commits Check predictions at commit time Do not commit until prediction verified Successful predictions increase concurrency/speedup Value Prediction
Coherence Protocol Actions Directory M 1 CPU 1 CPU 3 CPU 2 Data GetX GetX FGetX Nack Nack Log TM model
Directory M-1 S-2 Value Predictor CPU 1 CPU 3 CPU 2 Nack Nack Pred Retry GetX FGetX FGetX Generating predictions Nack Nack S-2-3 Pred
Successful predictions Directory M-1 S-2-3 Value Predictor CPU 1 CPU 3 CPU 2 Retry FGetX Unblock Retry FGetX Nack Nack Result M-2 S-3
Directory M-1 S-2-3 Value Predictor CPU 1 CPU 3 CPU 2 Retry FGetX Unblock Retry Result NP S-3 Failed Predictions  Result
Concurrency vs. Aborts Value prediction creates dependent transactions A transaction cannot commit until prediction verified Multiple predictions per processor can lead to deadlocks
How to know all dependent transactions? Global predictor in our work Pass dependents along with Nacks Limiting the number of predictions per address Abort flag Programmer controlled value prediction. VP can predict for all conflicting accesses Implementation Issues
Evaluation Microbenchmarks: Loop-based, 2^10 Xn Queue-based Insert only  Random inserts and deletes Simulation platform: SIMICS in-order processors (1,2,4,8,16)  GEMS (RUBY) memory system  Controlling number of predictions per address Radiosity & Raytrace benchmarks Both contain a linked-list for memory management.  STAMP suite of benchmarks  Labyrinth and Intruder benchmark
Results
Splash benchmarks 16 Processor Results for Splash and STAMP benchmarks
Results table 2 Predictions per address for VP-TM
Observations Value predictor increases concurrency for all benchmarks Factors affecting speedup  Nacks/Stalls Restarts  1 or 2 predictions per address provides best performance in most cases.
Rationale & Complexity VP adds complexity Speedup enough to justify cost ?  Does not degrade performance if not used Guaranteed speedup for all benchmarks ? Tuning for performance Controlling predictions, abort flag  Will help TM adoption for multicore architectures
Conclusion Value prediction with TM shown to improve performance. Reduced conflicts Increased concurrency  Performance improvement comes with modest hardware increase. Questions?
 
Related work TLS Easier to predict values in TLS than TM Similar idea can be used  Value forwarding Broadcast system Forward vector of dependents along with value Needs extensive changes in the coherence protocol
Overall, TCC’s FPGA implementation adds 14% overhead in the control logic, and 29% in on chip memory as compared to a non-speculative incarnation of our cache.

More Related Content

PDF
5 process synchronization
BaliThorat1
 
PPT
advanced computer architesture-conditions of parallelism
Pankaj Kumar Jain
 
PPT
Communication
Pradyut Sanki
 
PPTX
Load Balancing in Cloud Computing Thesis Research Help
Phdtopiccom
 
PPT
Unit 3 part2
Karthik Vivek
 
PDF
Bulk-Synchronous-Parallel - BSP
Md Syed Ahamad
 
PDF
4 threads
BaliThorat1
 
PDF
6 cpu scheduling
BaliThorat1
 
5 process synchronization
BaliThorat1
 
advanced computer architesture-conditions of parallelism
Pankaj Kumar Jain
 
Communication
Pradyut Sanki
 
Load Balancing in Cloud Computing Thesis Research Help
Phdtopiccom
 
Unit 3 part2
Karthik Vivek
 
Bulk-Synchronous-Parallel - BSP
Md Syed Ahamad
 
4 threads
BaliThorat1
 
6 cpu scheduling
BaliThorat1
 

What's hot (11)

PDF
CPU scheduling ppt file
Dwight Sabio
 
PPTX
p2 p grid
Yogeshwari M Yogi
 
PPTX
Flow control in computer
rud_d_rcks
 
PDF
OSMC 2016 - Friends and foes by Heinrich Hartmann
NETWAYS
 
PDF
Latency in storage
Ashwin Pawar
 
PPT
Compiler optimization
Karthik Vivek
 
PPTX
Six sigma-statistical-definition-2
SHASHI P MISHRA
 
PDF
Processing Large Datasets for the National Broadband Map with FME
Safe Software
 
PDF
Processing Large Datasets for the National Broadband Map with FME
Safe Software
 
PPTX
LINQ to HPC: Developing Big Data Applications on Windows HPC Server
Saptak Sen
 
PPT
program partitioning and scheduling IN Advanced Computer Architecture
Pankaj Kumar Jain
 
CPU scheduling ppt file
Dwight Sabio
 
p2 p grid
Yogeshwari M Yogi
 
Flow control in computer
rud_d_rcks
 
OSMC 2016 - Friends and foes by Heinrich Hartmann
NETWAYS
 
Latency in storage
Ashwin Pawar
 
Compiler optimization
Karthik Vivek
 
Six sigma-statistical-definition-2
SHASHI P MISHRA
 
Processing Large Datasets for the National Broadband Map with FME
Safe Software
 
Processing Large Datasets for the National Broadband Map with FME
Safe Software
 
LINQ to HPC: Developing Big Data Applications on Windows HPC Server
Saptak Sen
 
program partitioning and scheduling IN Advanced Computer Architecture
Pankaj Kumar Jain
 
Ad

Viewers also liked (20)

PDF
Hürriyetin ilanı
Chp Aydın
 
PPT
Presentation2
vijendra
 
PPSX
Peace Finder
Shannon Parish
 
PPTX
Project work IPE - Value
IPE Business School
 
PPTX
Serhan batigün 20082571
Serhan Batıgün
 
RTF
Plantilla
Lesmeralda Sanchez
 
PDF
Ingilizce ogreniyorum
zeynep_zyn48
 
DOCX
Direccion 2
oangelajaramillo
 
PPT
Recommender.system.presentation.pjug.05.20.2014
rpbrehm
 
PPTX
Presentazione Progetto Antenne Territoriali Eurodesk
Clarissa Retrosi
 
PDF
Jornadas Juventud Y Genero E!
exclamation
 
PPTX
Faisal
faisal c
 
PPTX
Legge Comunitaria 2009 - Art 19 - e modifiche al D.lgs. 231/01
getsolutionslideshare
 
PDF
We decide ile verimlilik, i̇novasyon ve sosyal medya
We_Decide
 
PPT
Portfolio2
shoeft
 
PPTX
Adlib gebruikersgroep - voorjaarsbijeenkomst 2014 - Adlib en pivot
Jeroen De Meester
 
PPTX
Tour feature manage members data base
CommonFloor.com
 
PPTX
Computerviren Linkvirus
evanggym
 
PPTX
Webinar sulla PEC del Gestore Register.it
Stefano Trojani
 
PPS
Implantación de la carta de servicios en un
directorcra
 
Hürriyetin ilanı
Chp Aydın
 
Presentation2
vijendra
 
Peace Finder
Shannon Parish
 
Project work IPE - Value
IPE Business School
 
Serhan batigün 20082571
Serhan Batıgün
 
Ingilizce ogreniyorum
zeynep_zyn48
 
Direccion 2
oangelajaramillo
 
Recommender.system.presentation.pjug.05.20.2014
rpbrehm
 
Presentazione Progetto Antenne Territoriali Eurodesk
Clarissa Retrosi
 
Jornadas Juventud Y Genero E!
exclamation
 
Faisal
faisal c
 
Legge Comunitaria 2009 - Art 19 - e modifiche al D.lgs. 231/01
getsolutionslideshare
 
We decide ile verimlilik, i̇novasyon ve sosyal medya
We_Decide
 
Portfolio2
shoeft
 
Adlib gebruikersgroep - voorjaarsbijeenkomst 2014 - Adlib en pivot
Jeroen De Meester
 
Tour feature manage members data base
CommonFloor.com
 
Computerviren Linkvirus
evanggym
 
Webinar sulla PEC del Gestore Register.it
Stefano Trojani
 
Implantación de la carta de servicios en un
directorcra
 
Ad

Similar to Smpant Transact09 (20)

PPT
Prelim Slides
smpant
 
PPTX
676.v3
Rajesh M
 
PPT
Parallel Programming Primer
Sri Prasanna
 
PPT
Moving Towards a Streaming Architecture
Gabriele Modena
 
PPTX
Spark Streaming Early Warning Use Case
random_chance
 
PPT
Parallel Programming Primer 1
mobius.cn
 
PPS
Inter Task Communication On Volatile Nodes
nagarajan_ka
 
PDF
Verification Strategy for PCI-Express
DVClub
 
PPTX
Everything comes in 3's
delagoya
 
PPT
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
PPTX
TPC-H Column Store and MPP systems
Mostafa Mokhtar
 
PDF
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
ijceronline
 
PPT
3rd 3DDRESD: ReCPU 4 NIDS
Marco Santambrogio
 
PPSX
Transcend Automation's Kepware OPC Products
Baiju P.S.
 
PDF
Configuration Optimization for Big Data Software
Pooyan Jamshidi
 
PPT
Scalable Data Analysis in R -- Lee Edlefsen
Revolution Analytics
 
PPT
Super Computer
gueste3bbd0
 
PPT
Thaker q3 2008
Obsidian Software
 
PDF
High Performance Engineering - 01-intro.pdf
ss63261
 
PDF
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 
Prelim Slides
smpant
 
676.v3
Rajesh M
 
Parallel Programming Primer
Sri Prasanna
 
Moving Towards a Streaming Architecture
Gabriele Modena
 
Spark Streaming Early Warning Use Case
random_chance
 
Parallel Programming Primer 1
mobius.cn
 
Inter Task Communication On Volatile Nodes
nagarajan_ka
 
Verification Strategy for PCI-Express
DVClub
 
Everything comes in 3's
delagoya
 
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
TPC-H Column Store and MPP systems
Mostafa Mokhtar
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
ijceronline
 
3rd 3DDRESD: ReCPU 4 NIDS
Marco Santambrogio
 
Transcend Automation's Kepware OPC Products
Baiju P.S.
 
Configuration Optimization for Big Data Software
Pooyan Jamshidi
 
Scalable Data Analysis in R -- Lee Edlefsen
Revolution Analytics
 
Super Computer
gueste3bbd0
 
Thaker q3 2008
Obsidian Software
 
High Performance Engineering - 01-intro.pdf
ss63261
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 

Recently uploaded (20)

PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Software Development Methodologies in 2025
KodekX
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
The Future of Artificial Intelligence (AI)
Mukul
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Software Development Methodologies in 2025
KodekX
 

Smpant Transact09

  • 1. A Case for Using Value Prediction to Improve Performance of Transactional Memory Salil Pant Advisor: Gregory Byrd
  • 2. Transactional Memory “ Transaction” abstraction for programmers Optimistic speculative execution of critical sections TM can make parallel programming easier Problem - Serializing structures in programs Queues/Lists for memory and task management Linked list traversals for search TM fails to expose concurrency
  • 3. Conflicts in TM lead to overhead Stalls, Restarts Longer transactions suffer most Different conflict management schemes perform differently Alternative approaches need programmer intervention Distributed, Hierarchical queue approaches Scalability bottleneck Amdahl’s law Our solution – data speculation on conflicting accesses
  • 4. Motivation for prediction Serializing data is updated in a predictable manner Pointers: head, tail etc Sizes: constant increment/decrements Most conflicts come from a few data structures in the program Unlock parallelism with value prediction No change in the program Like a new conflict management system enqueue(elem* newE){ if (tail != NULL) tail->next = newE; else head = newE; tail = newE; queue_size++; } dequeue() { if(!head) return; elem* temp = head; head = head->next; free(temp); queue_size--; if (queue_size==0) { head=NULL; tail=NULL; } }
  • 5. Design Base LogTM model In-memory logging of old values during Xn stores Eager conflict detection, timestamps for conflict resolution and deadlock detection Commits are easy, aborts need memory + processor rollback. Identification of predictable addresses Add address to predictor on a Nack Trap stores for added addresses and create stride Predict on future conflicts Stride Value predictor Predictor indexed by load address Predictor getting values from multiple processors Predict 32 bit loads Memory-level and global
  • 6. Structure of Predictor Conceptual design VP entry, 1 per VP address List of store values, 2 in our design to create stride (RV1-3) List of predicted values, cpus (SV1-5, P1-5) Fortunately, max 4 or 5 VP entries needed so far. Simplifications VP – address and timing
  • 7. Allow transactions to run in parallel with predicted values Need an extra buffer to hold predicted data. Cannot log predicted load value in memory Needs changes in the coherence protocol More deadlock scenarios Need messages to indicate prediction success/failure Validate predictions when owner commits Check predictions at commit time Do not commit until prediction verified Successful predictions increase concurrency/speedup Value Prediction
  • 8. Coherence Protocol Actions Directory M 1 CPU 1 CPU 3 CPU 2 Data GetX GetX FGetX Nack Nack Log TM model
  • 9. Directory M-1 S-2 Value Predictor CPU 1 CPU 3 CPU 2 Nack Nack Pred Retry GetX FGetX FGetX Generating predictions Nack Nack S-2-3 Pred
  • 10. Successful predictions Directory M-1 S-2-3 Value Predictor CPU 1 CPU 3 CPU 2 Retry FGetX Unblock Retry FGetX Nack Nack Result M-2 S-3
  • 11. Directory M-1 S-2-3 Value Predictor CPU 1 CPU 3 CPU 2 Retry FGetX Unblock Retry Result NP S-3 Failed Predictions Result
  • 12. Concurrency vs. Aborts Value prediction creates dependent transactions A transaction cannot commit until prediction verified Multiple predictions per processor can lead to deadlocks
  • 13. How to know all dependent transactions? Global predictor in our work Pass dependents along with Nacks Limiting the number of predictions per address Abort flag Programmer controlled value prediction. VP can predict for all conflicting accesses Implementation Issues
  • 14. Evaluation Microbenchmarks: Loop-based, 2^10 Xn Queue-based Insert only Random inserts and deletes Simulation platform: SIMICS in-order processors (1,2,4,8,16) GEMS (RUBY) memory system Controlling number of predictions per address Radiosity & Raytrace benchmarks Both contain a linked-list for memory management. STAMP suite of benchmarks Labyrinth and Intruder benchmark
  • 16. Splash benchmarks 16 Processor Results for Splash and STAMP benchmarks
  • 17. Results table 2 Predictions per address for VP-TM
  • 18. Observations Value predictor increases concurrency for all benchmarks Factors affecting speedup Nacks/Stalls Restarts 1 or 2 predictions per address provides best performance in most cases.
  • 19. Rationale & Complexity VP adds complexity Speedup enough to justify cost ? Does not degrade performance if not used Guaranteed speedup for all benchmarks ? Tuning for performance Controlling predictions, abort flag Will help TM adoption for multicore architectures
  • 20. Conclusion Value prediction with TM shown to improve performance. Reduced conflicts Increased concurrency Performance improvement comes with modest hardware increase. Questions?
  • 21.  
  • 22. Related work TLS Easier to predict values in TLS than TM Similar idea can be used Value forwarding Broadcast system Forward vector of dependents along with value Needs extensive changes in the coherence protocol
  • 23. Overall, TCC’s FPGA implementation adds 14% overhead in the control logic, and 29% in on chip memory as compared to a non-speculative incarnation of our cache.