SlideShare a Scribd company logo
Multi-Core
Computing
Osama Awwad
Department of Computer Science
Western Michigan University
Thursday, March 2, 2023
2
3/2/2023
Multi-Core Computer
 A multi-core microprocessor is one that
combines two or more independent processors
into a single package, often a single integrated
circuit (IC).
 A dual-core device contains two independent
microprocessors.
 In general, multi-core microprocessors allow a
computing device to exhibit some form of thread-
level parallelism (TLP) without including multiple
microprocessors in separate physical packages.
3
3/2/2023
Major Technology Providers
 The latest versions of many architectures use multi-core, including PA-
RISC (PA-8800), IBM POWER (POWER7), SPARC (UltraSPARC IV), and
various processors from Intel and AMD.
 There is some controversy as to whether multiple cores on a chip is the
same thing as multiple processors. Major technology providers are divided
on this issue.
 IBM considers its dual-core POWER4 and POWER5 to be two processors,
just packaged together.
 Sun Microsystems, in contrast, considers its UltraSPARC IV to be a multi-
threaded rather than multi-processor chip.
 Intel considers their multi-core designs to be a single processor.
 This is not an idle debate, because software is often more expensive when
licensed for more processors.
Microsoft, Red Hat Linux, Suse Linux will license their OS per chip, not per core
4
3/2/2023
Single-core computer
5
3/2/2023
Multi-core architectures
 Replicate multiple processor cores on a
single die.
Core 1 Core 2 Core 3 Core 4
Multi-core CPU chip
6
3/2/2023
Multi-core CPU chip
 The cores fit on a single processor socket
 Also called CMP (Chip Multi-Processor)
c
o
r
e
1
c
o
r
e
2
c
o
r
e
3
c
o
r
e
4
7
3/2/2023
The cores run in parallel
c
o
r
e
1
c
o
r
e
2
c
o
r
e
3
c
o
r
e
4
thread 1 thread 2 thread 3 thread 4
8
3/2/2023
Within each core, threads are time-sliced
(just like on a uniprocessor)
c
o
r
e
1
c
o
r
e
2
c
o
r
e
3
c
o
r
e
4
several
threads
several
threads
several
threads
several
threads
9
3/2/2023
Interaction with OS
 OS perceives each core as a separate
processor
 OS scheduler maps threads/processes
to different cores
 Most major OS support multi-core today
10
3/2/2023
Why multi-core ?
 Difficult to make single-core
clock frequencies even higher
 Many new applications are
multithreaded
 General trend in computer
architecture (shift towards
more parallelism)
11
3/2/2023
Instruction-level parallelism
 Parallelism at the machine-instruction level
 The processor can re-order, pipeline
instructions, split them into
microinstructions, do aggressive branch
prediction, etc.
 Instruction-level parallelism enabled rapid
increases in processor speeds over the
last 15 years
12
3/2/2023
Thread-level parallelism (TLP)
 This is parallelism on a more coarser scale
 Server can serve each client in a separate
thread (Web server, database server)
 A computer game can do AI, graphics, and
physics in three separate threads
 Single-core superscalar processors cannot
fully exploit TLP
 Multi-core architectures are the next step in
processor evolution: explicitly exploiting TLP
13
3/2/2023
General context: Multiprocessors
 Multiprocessor is any
computer with several
processors
 SIMD
Single instruction, multiple data
Modern graphics cards
 MIMD
Multiple instructions, multiple data
Lemieux cluster,
Pittsburgh
supercomputing
center
14
3/2/2023
Multiprocessor memory types
 Shared memory:
In this model, there is one (large) common
shared memory for all processors
 Distributed memory:
In this model, each processor has its own
(small) local memory, and its content is not
replicated anywhere else
15
3/2/2023
Multi-core processor is a special
kind of a multiprocessor:
All processors are on the same chip
 Multi-core processors are MIMD:
Different cores execute different threads
(Multiple Instructions), operating on different
parts of memory (Multiple Data).
 Multi-core is a shared memory multiprocessor:
All cores share the same memory
16
3/2/2023
What applications benefit
from multi-core?
 Database servers
 Web servers (Web commerce)
 Telecommuncation markets:
6WINDGate (datapath and
control plane)
 Multimedia applications
 Scientific applications,
CAD/CAM
 In general, applications with
Thread-level parallelism
(as opposed to instruction-
level parallelism)
Each can
run on its
own core
17
3/2/2023
More examples
 Editing a photo while recording a TV show
through a digital video recorder
 Downloading software while running an
anti-virus program
 “Anything that can be threaded today will
map efficiently to multi-core”
 BUT: some applications difficult to
parallelize
18
3/2/2023
Simultaneous multithreading (SMT)
 Permits multiple independent threads to execute
SIMULTANEOUSLY on the SAME core
 Weaving together multiple “threads”
on the same core
 Example: if one thread is waiting for a floating
point operation to complete, another thread can
use the integer units
19
3/2/2023
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTB
L2
Cache
and
Control
Bus
Thread 1: floating point
Without SMT, only a single thread
can run at any given time
20
3/2/2023
Without SMT, only a single thread
can run at any given time
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTB
L2
Cache
and
Control
Bus
Thread 2:
integer operation
21
3/2/2023
SMT processor: both threads can
run concurrently
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTB
L2
Cache
and
Control
Bus
Thread 1: floating point
Thread 2:
integer operation
22
3/2/2023
But: Can’t simultaneously use the
same functional unit
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTB
L2
Cache
and
Control
Bus
Thread 1 Thread 2
This scenario is
impossible with SMT
on a single core
(assuming a single
integer unit)
IMPOSSIBLE
23
3/2/2023
SMT not a “true” parallel processor
 Enables better threading (e.g. up to 30%)
 OS and applications perceive each
simultaneous thread as a separate
“virtual processor”
 The chip has only a single copy
of each resource
 Compare to multi-core:
each core has its own copy of resources
24
3/2/2023
Multi-core:
threads can run on separate cores
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode
ROM
BTB
L2
Cache
and
Control
Bus
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode
ROM
BTB
L2
Cache
and
Control
Bus
Thread 1 Thread 3
25
3/2/2023
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode
ROM
BTB
L2
Cache
and
Control
Bus
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode
ROM
BTB
L2
Cache
and
Control
Bus
Thread 2 Thread 4
Multi-core:
threads can run on separate cores
26
3/2/2023
Combining Multi-core and SMT
 Cores can be SMT-enabled (or not)
 The different combinations:
Single-core, non-SMT: standard uniprocessor
Single-core, with SMT
Multi-core, non-SMT
Multi-core, with SMT:
 The number of SMT threads:
2, 4, or sometimes 8 simultaneous threads
 Intel calls them “hyper-threads”
27
3/2/2023
SMT Dual-core: all four threads can
run concurrently
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode
ROM
BTB
L2
Cache
and
Control
Bus
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode
ROM
BTB
L2
Cache
and
Control
Bus
Thread 1 Thread 2 Thread 3 Thread 4
28
3/2/2023
Comparison: multi-core vs SMT
 Multi-core:
Since there are several cores,
each is smaller and not as powerful
(but also easier to design and manufacture)
However, great with thread-level parallelism
 SMT
Can have one large and fast superscalar core
Great performance on a single thread
Mostly still only exploits instruction-level
parallelism
29
3/2/2023
The memory hierarchy
 If simultaneous multithreading only:
all caches shared
 Multi-core chips:
L1 caches private
L2 caches private in some architectures
and shared in others
 Memory is always shared
30
3/2/2023
 Dual-core
Intel Xeon processors
 Each core is
hyper-threaded
 Private L1 caches
 Shared L2 caches
memory
L2 cache
L1 cache L1 cache
C
O
R
E
1
C
O
R
E
0
hyper-threads
31
3/2/2023
Designs with private L2 caches
memory
L2 cache
L1 cache L1 cache
C
O
R
E
1
C
O
R
E
0
L2 cache
memory
L2 cache
L1 cache L1 cache
C
O
R
E
1
C
O
R
E
0
L2 cache
Both L1 and L2 are private
Examples: AMD Opteron,
AMD Athlon, Intel Pentium D
L3 cache L3 cache
A design with L3 caches
Example: Intel Itanium 2
32
3/2/2023
Windows Task Manager
core 2
core 1
33
3/2/2023
Advantages /Disadvantages
34
3/2/2023
Advantages
 Cache coherency circuitry can operate at a much higher
clock rate than is possible if the signals have to travel
off-chip
 Signals between different CPUs travel shorter distances,
those signals degrade less
 These higher quality signals allow more data to be sent
in a given time period since individual signals can be
shorter and do not need to be repeated as often
 A dual-core processor uses slightly less power than two
coupled single-core processors
35
3/2/2023
Disadvantages
 Ability of multi-core processors to increase application
performance depends on the use of multiple threads
within applications.
 Most Current video games will run faster on a 3 GHz
single-core processor than on a 2GHz dual-core
processor (of the same core architecture
 Two processing cores sharing the same system bus and
memory bandwidth limits the real-world performance
advantage.
 If a single core is close to being memory bandwidth
limited, going to dual-core might only give 30% to 70%
improvement
 If memory bandwidth is not a problem, a 90%
improvement can be expected
36
3/2/2023
Conclusion
 Multi-core chips an
important new trend in
computer architecture
 Several new multi-core
chips in design phases
 Parallel programming techniques
likely to gain importance
37
3/2/2023
References
 http://en.wikipedia.org/wiki/Multi-
core_(computing)
 www.princeton.edu/~jdonald/research/hyp
erthreading/garg_report.pdf
 www.cs.cmu.edu/~barbic/multi-core.ppt

More Related Content

PPT
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
MKKhaing
 
PDF
27 multicore
ssuser47ae65
 
PDF
27 multicore
Rishabh Jain
 
PPT
Multi-core architectures
nextlib
 
PDF
fundamentals of digital communication Unit 5_microprocessor.pdf
shubhangisonawane6
 
PPTX
Computer architecture multi core processor
Mazin Alwaaly
 
PPTX
Multicore processor by Ankit Raj and Akash Prajapati
Ankit Raj
 
PDF
Multicore processor.pdf
rajaratna4
 
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
MKKhaing
 
27 multicore
ssuser47ae65
 
27 multicore
Rishabh Jain
 
Multi-core architectures
nextlib
 
fundamentals of digital communication Unit 5_microprocessor.pdf
shubhangisonawane6
 
Computer architecture multi core processor
Mazin Alwaaly
 
Multicore processor by Ankit Raj and Akash Prajapati
Ankit Raj
 
Multicore processor.pdf
rajaratna4
 

Similar to Osa-multi-core.ppt (20)

PPTX
Multi_Core_Processor_2015_(Download it!)
Sudip Roy
 
PPTX
Multicore Processor Technology
Venkata Raja Paruchuru
 
PPTX
Multiprocessor.pptx
Muhammad54342
 
PPT
Multi core processors
Ummiya Mohammedi
 
PPTX
Multi core processors
Nipun Sharma
 
PPTX
29092013042656 multicore-processor-technology
Sindhu Nathan
 
DOCX
Multi-Core on Chip Architecture *doc - IK
Ilgın Kavaklıoğulları
 
DOC
Introduction to multi core
mukul bhardwaj
 
PPT
Multiprocessor_YChen.ppt
AberaZeleke1
 
PPTX
Processors
Laxman Puri
 
PPTX
multithread in multiprocessor architecture
myjuni04
 
PPTX
CA presentation of multicore processor
Zeeshan Aslam
 
PPTX
PARALLELISM IN MULTICORE PROCESSORS
Amirthavalli Senthil
 
PPTX
Modern processor art
waqasjadoon11
 
PPTX
processor struct
waqasjadoon11
 
PPTX
Single &Multi Core processor
Justify Shadap
 
PPTX
Slot29-CH18-MultiCoreComputers-18-slides (1).pptx
vun24122002
 
PPTX
Lecture 4.pptx
infomerlin
 
PPTX
Multicore Processsors
Aveen Meena
 
PPTX
Modern processor art
waqasjadoon11
 
Multi_Core_Processor_2015_(Download it!)
Sudip Roy
 
Multicore Processor Technology
Venkata Raja Paruchuru
 
Multiprocessor.pptx
Muhammad54342
 
Multi core processors
Ummiya Mohammedi
 
Multi core processors
Nipun Sharma
 
29092013042656 multicore-processor-technology
Sindhu Nathan
 
Multi-Core on Chip Architecture *doc - IK
Ilgın Kavaklıoğulları
 
Introduction to multi core
mukul bhardwaj
 
Multiprocessor_YChen.ppt
AberaZeleke1
 
Processors
Laxman Puri
 
multithread in multiprocessor architecture
myjuni04
 
CA presentation of multicore processor
Zeeshan Aslam
 
PARALLELISM IN MULTICORE PROCESSORS
Amirthavalli Senthil
 
Modern processor art
waqasjadoon11
 
processor struct
waqasjadoon11
 
Single &Multi Core processor
Justify Shadap
 
Slot29-CH18-MultiCoreComputers-18-slides (1).pptx
vun24122002
 
Lecture 4.pptx
infomerlin
 
Multicore Processsors
Aveen Meena
 
Modern processor art
waqasjadoon11
 
Ad

Recently uploaded (20)

PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PDF
All chapters of Strength of materials.ppt
girmabiniyam1234
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
PPT
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
All chapters of Strength of materials.ppt
girmabiniyam1234
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Inventory management chapter in automation and robotics.
atisht0104
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
Zero Carbon Building Performance standard
BassemOsman1
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
Information Retrieval and Extraction - Module 7
premSankar19
 
Ad

Osa-multi-core.ppt

  • 1. Multi-Core Computing Osama Awwad Department of Computer Science Western Michigan University Thursday, March 2, 2023
  • 2. 2 3/2/2023 Multi-Core Computer  A multi-core microprocessor is one that combines two or more independent processors into a single package, often a single integrated circuit (IC).  A dual-core device contains two independent microprocessors.  In general, multi-core microprocessors allow a computing device to exhibit some form of thread- level parallelism (TLP) without including multiple microprocessors in separate physical packages.
  • 3. 3 3/2/2023 Major Technology Providers  The latest versions of many architectures use multi-core, including PA- RISC (PA-8800), IBM POWER (POWER7), SPARC (UltraSPARC IV), and various processors from Intel and AMD.  There is some controversy as to whether multiple cores on a chip is the same thing as multiple processors. Major technology providers are divided on this issue.  IBM considers its dual-core POWER4 and POWER5 to be two processors, just packaged together.  Sun Microsystems, in contrast, considers its UltraSPARC IV to be a multi- threaded rather than multi-processor chip.  Intel considers their multi-core designs to be a single processor.  This is not an idle debate, because software is often more expensive when licensed for more processors. Microsoft, Red Hat Linux, Suse Linux will license their OS per chip, not per core
  • 5. 5 3/2/2023 Multi-core architectures  Replicate multiple processor cores on a single die. Core 1 Core 2 Core 3 Core 4 Multi-core CPU chip
  • 6. 6 3/2/2023 Multi-core CPU chip  The cores fit on a single processor socket  Also called CMP (Chip Multi-Processor) c o r e 1 c o r e 2 c o r e 3 c o r e 4
  • 7. 7 3/2/2023 The cores run in parallel c o r e 1 c o r e 2 c o r e 3 c o r e 4 thread 1 thread 2 thread 3 thread 4
  • 8. 8 3/2/2023 Within each core, threads are time-sliced (just like on a uniprocessor) c o r e 1 c o r e 2 c o r e 3 c o r e 4 several threads several threads several threads several threads
  • 9. 9 3/2/2023 Interaction with OS  OS perceives each core as a separate processor  OS scheduler maps threads/processes to different cores  Most major OS support multi-core today
  • 10. 10 3/2/2023 Why multi-core ?  Difficult to make single-core clock frequencies even higher  Many new applications are multithreaded  General trend in computer architecture (shift towards more parallelism)
  • 11. 11 3/2/2023 Instruction-level parallelism  Parallelism at the machine-instruction level  The processor can re-order, pipeline instructions, split them into microinstructions, do aggressive branch prediction, etc.  Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years
  • 12. 12 3/2/2023 Thread-level parallelism (TLP)  This is parallelism on a more coarser scale  Server can serve each client in a separate thread (Web server, database server)  A computer game can do AI, graphics, and physics in three separate threads  Single-core superscalar processors cannot fully exploit TLP  Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP
  • 13. 13 3/2/2023 General context: Multiprocessors  Multiprocessor is any computer with several processors  SIMD Single instruction, multiple data Modern graphics cards  MIMD Multiple instructions, multiple data Lemieux cluster, Pittsburgh supercomputing center
  • 14. 14 3/2/2023 Multiprocessor memory types  Shared memory: In this model, there is one (large) common shared memory for all processors  Distributed memory: In this model, each processor has its own (small) local memory, and its content is not replicated anywhere else
  • 15. 15 3/2/2023 Multi-core processor is a special kind of a multiprocessor: All processors are on the same chip  Multi-core processors are MIMD: Different cores execute different threads (Multiple Instructions), operating on different parts of memory (Multiple Data).  Multi-core is a shared memory multiprocessor: All cores share the same memory
  • 16. 16 3/2/2023 What applications benefit from multi-core?  Database servers  Web servers (Web commerce)  Telecommuncation markets: 6WINDGate (datapath and control plane)  Multimedia applications  Scientific applications, CAD/CAM  In general, applications with Thread-level parallelism (as opposed to instruction- level parallelism) Each can run on its own core
  • 17. 17 3/2/2023 More examples  Editing a photo while recording a TV show through a digital video recorder  Downloading software while running an anti-virus program  “Anything that can be threaded today will map efficiently to multi-core”  BUT: some applications difficult to parallelize
  • 18. 18 3/2/2023 Simultaneous multithreading (SMT)  Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core  Weaving together multiple “threads” on the same core  Example: if one thread is waiting for a floating point operation to complete, another thread can use the integer units
  • 19. 19 3/2/2023 BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Without SMT, only a single thread can run at any given time
  • 20. 20 3/2/2023 Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
  • 21. 21 3/2/2023 SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
  • 22. 22 3/2/2023 But: Can’t simultaneously use the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
  • 23. 23 3/2/2023 SMT not a “true” parallel processor  Enables better threading (e.g. up to 30%)  OS and applications perceive each simultaneous thread as a separate “virtual processor”  The chip has only a single copy of each resource  Compare to multi-core: each core has its own copy of resources
  • 24. 24 3/2/2023 Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 3
  • 25. 25 3/2/2023 BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2 Thread 4 Multi-core: threads can run on separate cores
  • 26. 26 3/2/2023 Combining Multi-core and SMT  Cores can be SMT-enabled (or not)  The different combinations: Single-core, non-SMT: standard uniprocessor Single-core, with SMT Multi-core, non-SMT Multi-core, with SMT:  The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads  Intel calls them “hyper-threads”
  • 27. 27 3/2/2023 SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 Thread 3 Thread 4
  • 28. 28 3/2/2023 Comparison: multi-core vs SMT  Multi-core: Since there are several cores, each is smaller and not as powerful (but also easier to design and manufacture) However, great with thread-level parallelism  SMT Can have one large and fast superscalar core Great performance on a single thread Mostly still only exploits instruction-level parallelism
  • 29. 29 3/2/2023 The memory hierarchy  If simultaneous multithreading only: all caches shared  Multi-core chips: L1 caches private L2 caches private in some architectures and shared in others  Memory is always shared
  • 30. 30 3/2/2023  Dual-core Intel Xeon processors  Each core is hyper-threaded  Private L1 caches  Shared L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 hyper-threads
  • 31. 31 3/2/2023 Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
  • 34. 34 3/2/2023 Advantages  Cache coherency circuitry can operate at a much higher clock rate than is possible if the signals have to travel off-chip  Signals between different CPUs travel shorter distances, those signals degrade less  These higher quality signals allow more data to be sent in a given time period since individual signals can be shorter and do not need to be repeated as often  A dual-core processor uses slightly less power than two coupled single-core processors
  • 35. 35 3/2/2023 Disadvantages  Ability of multi-core processors to increase application performance depends on the use of multiple threads within applications.  Most Current video games will run faster on a 3 GHz single-core processor than on a 2GHz dual-core processor (of the same core architecture  Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage.  If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement  If memory bandwidth is not a problem, a 90% improvement can be expected
  • 36. 36 3/2/2023 Conclusion  Multi-core chips an important new trend in computer architecture  Several new multi-core chips in design phases  Parallel programming techniques likely to gain importance