SlideShare a Scribd company logo
2
Most read
4
Most read
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 1/12
Centralized Shared­Memory Architectures
The use of large multilevel caches can substantially reduce
memory bandwidth demands of a processor.
 
This has made it possible for several (micro)processors to
share the same memory through a shared bus.
 
Caching supports both private and shared data.
For private data, once cached, it's treatment is
identical to that of a uniprocessor.
For shared data, the shared value may be replicated
in many caches.
 
Replication has several advantages:
Reduced latency and memory bandwidth requirements.
Reduced contention for data items that are read by multiple
processors simultaneously.
 
However, it also introduces a problem: Cache coherence .
Cache Coherence
With multiple caches, one CPU can modify memory at
locations that other CPUs have cached.
 
For example:
CPU A reads location x, getting the value N .
Later, CPU B reads the same location, getting the value N .
Next, CPU A writes location x with the value N ­ 1 .
At this point, any reads from CPU B will get the value N , while
reads from CPU A will get the value N ­ 1 .
 
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 2/12
This problem occurs both with write­through caches and (more
seriously) with write­back caches.
 
Cache coherence : informal definition:
A memory system is coherent if any read of a data
item returns the most recently written value of that
data item.
 
Upon closer inspection, there are several aspects that need to
be addressed.
Cache Coherence
Coherence defines what values can be returned by a read.
 
A memory system is coherent if:
Read after write works for a single processor.
If CPU A writes N to location X, all future reads of
location X will return N if no other processor writes
location X after CPU A.
 
Other processors' writes eventually propagate.
If CPU A writes value N to location X, CPU B will
eventually be able to read value N from location X.
 
Once it does so, it will continue to read value N until
location X is written again.
 
This is our intuitive notion of a coherent view of
memory.
Cache Coherence
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 3/12
Writes to a single location are serialized.
If CPUs A and B both write to location X, all
processors see the same order of the writes.
 
This does not mean that all reads must return the
same value.
If value N1 is written "first" to location X,
followed closely by reads of X and a write of
X with value N2, some reads may return N1
and some N2.
 
However, a processor that reads N2 will
return N2 for all future reads.
 
Consistency :
This indicates when a modification to memory is seen
by other processors (i.e. will be returned by a read).
 
Clearly, this can NOT be "instantaneous" since it may
be that the new value has not even left the processor
when a read occurs.
Cache Coherence
Consistency :
The issue of when a written value MUST be seen by a
reader is defined by a memory consistency model.
 
For now, let's assume that a write is not complete
until all processors have "seen" the effect of the
write.
 
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 4/12
Also, assume that a processor may not reorder
memory accesses to move reads before an
outstanding write.
Reads can be reordered, but reads and writes
can not be interchanged.
 
Coherent caches provide both:
Replication of shared data items (reduces latency and contention).
Here, the purpose is to provide multiple copies of data
so that several processors can access a single piece
of memory without serialization.
 
Migration of data items (reduces latency).
Data items are moved from one processor to another
as needed.
Cache­Coherence Protocols
Small­scale multiprocessor use hardware mechanisms to track
the state of data blocks that are shared.
 
Two classes of protocols:
Directory based.
The sharing status of a block of physical memory is
kept in one location (the directory).
 
Snooping.
The sharing status is distributed and kept with the
block in each cache.
 
The caches are usually on a shared memory bus.
The cache controllers snoop the bus to watch
for transactions that occur on data blocks
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 5/12
that they hold.
Bus Snooping Protocols
Write invalidate.
It is the most common protocol, both for snooping and
for directory schemes.
 
The basic idea behind this protocol is that writes to a
location invalidate other caches' copies of the block.
Reads by other processors on invalidated data
cause cache misses.
 
If two processors write at the same time, one
wins and obtains exclusive access.
Processor
activity
Bus
activity
Contents of
CPU A's
cache
Contents of
CPU B's
cache
Contents of
mem
location X
CPU A
reads X
Cache
miss
0   0
CPU B
reads X
Cache
miss
0 0 0
CPU A
writes 1
Invalidate 1   0
CPU B
reads X
Cache
miss
1 1 1
This example assumes a write­back cache.
Bus Snooping Protocols
Write broadcast (write update).
An alternative is to update all cached copies of the
data item when it is written.
 
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 6/12
To reduce bandwidth requirements, this protocol keeps
track of whether or not a word in the cache is shared.
If not, no broadcast is necessary.
Processor
activity
Bus
activity
Contents of
CPU A's
cache
Contents of
CPU B's
cache
Contents of
mem
location X
CPU A
reads X
Cache
miss
0   0
CPU B
reads X
Cache
miss
0 0 0
CPU A
writes 1
Broadcast 1 1 1
CPU B
reads X
  1 1 1
This example also assumes a write­back
cache.
Performance Differences between Bus Snooping Protocols
Write invalidate is much more popular.
 
This is due primarily to the performance differences.
Multiple writes to the same word with no intervening reads require
multiple broadcasts.
With multiword cache blocks, each word written requires a
broadcast.
For write invalidate, the first word written
invalidates.
Also write invalidate works on blocks , while write
broadcast must work on individual words or bytes.
The delay between writing by one processor and reading by another
is lower in the write broadcast scheme.
For write invalidate, the read causes a miss.
 
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 7/12
Since bus and memory bandwidth are more important in a bus­
based multiprocessor, write invalidation performs better.
 
Therefore, we focus on implementation of the write invalidate
protocol.
Implementation of Write Invalidate Protocols
Write invalidate is simple in bus­based schemes.
Acquire the bus and broadcast the address to be
invalidated.
 
Since all processors snoop the bus, they can check the address
against items in their cache.
 
Bus acquisition also serializes write operations to the same
memory location.
Writes to a shared data item cannot complete until the
bus is acquired.
 
What about locating a data item when a cache miss occurs ?
For write­through , it's in memory.
For write­back , snooping can be used.
If a processor finds that it has a dirty copy of
the requested cache block, it provides the
block instead of memory.
 
Note, write­back caches are greatly preferred in a
multiprocessor environment since they reduce memory
bandwidth.
Implementation of Write Invalidate Protocol on Write­Back caches.
Writes are the issue here.
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 8/12
 
We would like to know if any other caches contain the block to
be written by a processor.
If there are none, then the write need not be placed on
the bus.
This reduces the time to complete the write and
reduces memory bandwidth.
 
This can be tracked by adding an extra state bit (in addition to
the valid and dirty bits) that indicates if the block is shared.
 
If the bit is set (the block is shared), the cache
generates an invalidation on the bus and marks the
block as private.
 
If another processor later requests the block, the miss
is snooped and the "owner" sets the state bit to
shared.
Implementation of Write Invalidate Protocol on Write­Back caches.
Note that every bus transaction checks cache­address tags.
This could potentially interfere with CPU cache
access.
 
This interference can be reduced by:
Duplicating the tags.
Bus access can proceed in parallel with CPU access.
 
On misses, the processor must arbitrate for and
update both sets of tags.
The same is true for the snoop (to perform an
invalidate or to update the shared bit).
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 9/12
 
However, a snoop may require fetching a block.
This is the only instance that may cause a
stall.
Implementation of Write Invalidate Protocol on Write­Back caches.
Employing a multilevel cache with inclusion.
Every entry in L1 is in L2.
Therefore, snooping can be directed to L2,
where there are fewer processor accesses.
 
If a snoop gets a hit, then it must arbitrate for L1 to
update state and possibly retrieve data.
This usually stalls the processor.
 
Since it is popular to use multi­level caches in
multiprocessors (to reduce memory bandwidth), this
solution is usually adopted.
 
It is also possible to duplicate the tags in L2 to further
reduce contention.
An Example Centralized Shared­Memory Snooping Protocol
Implemented by incorporating a finite state controller in each
node.
 
The controller responds to requests from the processor and
bus: 
To simplify the controller, write hits and write misses to shared
blocks are treated as write misses.
Request Source Function
Read hit Processor Read data in cache.
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 10/12
Write
hit
Processor Write data in cache.
Read
miss
Bus Request data from cache or memory.
Write
miss
Bus
Request data from cache or memory (perform
any needed invalidates).
This causes processors with copies to invalidate them.
An Example Centralized Shared­Memory Snooping Protocol
Write invalidation and a write­back cache assumed:
An Example Centralized Shared­Memory Snooping Protocol
These state transitions have no analog in a uniprocessor cache
controller.
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 11/12
An Example Centralized Shared­Memory Snooping Protocol
Complications we have ignored:
Assumes that operations are atomic .
In reality, a write miss is not atomic ­­ just too much
work to do.
Also, read misses on a split transaction bus are not
atomic.
Nonatomic actions introduce the possibility that the protocol can
deadlock .
See Appendix E for a fix.
 
Two major simplifications:
Real protocols distinguish between write hits and write misses.
From the shared state, a write miss would require the
action shown previously.
However, a write hit does not require that the data be
fetched since it is up­to­date.
All that is needed is an invalidate operation.
Real protocols distinguish between shared and clean data in exactly
one cache.
11/9/2016 Centralized Shared­Memory Architectures
http://ece­research.unm.edu/jimp/611/slides/chap8_2.html 12/12
A "clean and private" state eliminates the need to
generate a bus transaction on a write to a "clean and
private" block.

More Related Content

What's hot (20)

PPTX
Message and Stream Oriented Communication
Dilum Bandara
 
PPTX
Cache coherence
Priyam Pandey
 
PPTX
Virtual memory management in Operating System
Rashmi Bhat
 
PPTX
Eucalyptus, Nimbus & OpenNebula
Amar Myana
 
PPT
Parallel Computing
Ameya Waghmare
 
PPTX
System calls
Bernard Senam
 
PPT
deadlock avoidance
wahab13
 
PPTX
Message passing in Distributed Computing Systems
Alagappa Govt Arts College, Karaikudi
 
PPTX
Memory Management in OS
Kumar Pritam
 
PPTX
Demand paging
Trinity Dwarka
 
DOC
Distributed Mutual exclusion algorithms
MNM Jain Engineering College
 
PPT
Basic MIPS implementation
kavitha2009
 
PPT
Cache coherence
Employee
 
PDF
8. mutual exclusion in Distributed Operating Systems
Dr Sandeep Kumar Poonia
 
PPTX
Free Space Management, Efficiency & Performance, Recovery and NFS
United International University
 
PPTX
Data Parallel and Object Oriented Model
Nikhil Sharma
 
DOC
Unit 1 architecture of distributed systems
karan2190
 
PPTX
Semophores and it's types
Nishant Joshi
 
PPTX
Models of Distributed System
Ashish KC
 
PDF
Interconnection Network
Heman Pathak
 
Message and Stream Oriented Communication
Dilum Bandara
 
Cache coherence
Priyam Pandey
 
Virtual memory management in Operating System
Rashmi Bhat
 
Eucalyptus, Nimbus & OpenNebula
Amar Myana
 
Parallel Computing
Ameya Waghmare
 
System calls
Bernard Senam
 
deadlock avoidance
wahab13
 
Message passing in Distributed Computing Systems
Alagappa Govt Arts College, Karaikudi
 
Memory Management in OS
Kumar Pritam
 
Demand paging
Trinity Dwarka
 
Distributed Mutual exclusion algorithms
MNM Jain Engineering College
 
Basic MIPS implementation
kavitha2009
 
Cache coherence
Employee
 
8. mutual exclusion in Distributed Operating Systems
Dr Sandeep Kumar Poonia
 
Free Space Management, Efficiency & Performance, Recovery and NFS
United International University
 
Data Parallel and Object Oriented Model
Nikhil Sharma
 
Unit 1 architecture of distributed systems
karan2190
 
Semophores and it's types
Nishant Joshi
 
Models of Distributed System
Ashish KC
 
Interconnection Network
Heman Pathak
 

Similar to Centralized shared memory architectures (20)

PPT
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
PPTX
Multiprocessors and Thread-Level Parallelism.pptx
aliali240367
 
DOCX
Cache memory
Muhammad Imran
 
PDF
Week5
student
 
PPTX
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Zena Abo-Altaheen
 
PPT
Executing Multiple Thread on Modern Processor
NurHadisukmana3
 
PPT
module4.ppt
Subhasis Dash
 
PPTX
Introduction to Thread Level Parallelism
Dilum Bandara
 
PPTX
NUMA
Pallab Ray
 
PPT
Distributed shared memory in distributed systems.ppt
lasmonkapota201
 
PPTX
Unit 6 cache coherence
Dipesh Vaya
 
PPTX
Cache Coherence.pptx
SamyakJain710491
 
PPT
chapter-6-multiprocessors-and-thread-level (1).ppt
harishM874937
 
PPTX
Cache coherence problem and its solutions
Majid Saleem
 
PDF
1
ijcseit
 
PDF
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
ijcseit
 
PDF
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
ijad journal
 
PDF
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
csandit
 
PPT
Distributed Shared memory architecture.ppt
Balasubramanian699229
 
PDF
Memory and Cache Coherence in Multiprocessor System.pdf
rajaratna4
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
Multiprocessors and Thread-Level Parallelism.pptx
aliali240367
 
Cache memory
Muhammad Imran
 
Week5
student
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Zena Abo-Altaheen
 
Executing Multiple Thread on Modern Processor
NurHadisukmana3
 
module4.ppt
Subhasis Dash
 
Introduction to Thread Level Parallelism
Dilum Bandara
 
Distributed shared memory in distributed systems.ppt
lasmonkapota201
 
Unit 6 cache coherence
Dipesh Vaya
 
Cache Coherence.pptx
SamyakJain710491
 
chapter-6-multiprocessors-and-thread-level (1).ppt
harishM874937
 
Cache coherence problem and its solutions
Majid Saleem
 
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
ijcseit
 
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
ijad journal
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
csandit
 
Distributed Shared memory architecture.ppt
Balasubramanian699229
 
Memory and Cache Coherence in Multiprocessor System.pdf
rajaratna4
 
Ad

More from Gokuldhev mony (6)

PDF
Bar plots.ipynb colaboratory
Gokuldhev mony
 
PPTX
Lecture no 2 resource sharing
Gokuldhev mony
 
PDF
Flowerpollination 141114212025-conversion-gate02 (1)
Gokuldhev mony
 
DOCX
Introduction to embedded c
Gokuldhev mony
 
PPTX
Wireless sensor networks
Gokuldhev mony
 
PDF
Important hr questions
Gokuldhev mony
 
Bar plots.ipynb colaboratory
Gokuldhev mony
 
Lecture no 2 resource sharing
Gokuldhev mony
 
Flowerpollination 141114212025-conversion-gate02 (1)
Gokuldhev mony
 
Introduction to embedded c
Gokuldhev mony
 
Wireless sensor networks
Gokuldhev mony
 
Important hr questions
Gokuldhev mony
 
Ad

Recently uploaded (20)

PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Design Thinking basics for Engineers.pdf
CMR University
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Thermal runway and thermal stability.pptx
godow93766
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
MRRS Strength and Durability of Concrete
CivilMythili
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 

Centralized shared memory architectures