SlideShare a Scribd company logo
William Stallings
Computer Organization
and Architecture
8th Edition


Chapter 18
Multicore Computers
Hardware Performance Issues
• Microprocessors have seen an exponential
  increase in performance
  —Improved organization
  —Increased clock frequency
• Increase in Parallelism
  —Pipelining
  —Superscalar
  —Simultaneous multithreading (SMT)
• Diminishing returns
  —More complexity requires more logic
  —Increasing chip area for coordinating and
   signal transfer logic
     – Harder to design, make and debug
Alternative Chip
Organizations
Intel Hardware
Trends
Increased Complexity
• Power requirements grow exponentially with chip
  density and clock frequency
   — Can use more chip area for cache
      – Smaller
      – Order of magnitude lower power requirements
• By 2015
   — 100 billion transistors on 300mm2 die
      – Cache of 100MB
      – 1 billion transistors for logic
• Pollack’s rule:
   — Performance is roughly proportional to square root of
     increase in complexity
      – Double complexity gives 40% more performance
• Multicore has potential for near-linear
  improvement
• Unlikely that one core can use all cache
  effectively
Power and Memory Considerations
Chip Utilization of Transistors
Software Performance Issues
• Performance benefits dependent on
  effective exploitation of parallel resources
• Even small amounts of serial code impact
  performance
  —10% inherently serial on 8 processor system
   gives only 4.7 times performance
• Communication, distribution of work and
  cache coherence overheads
• Some applications effectively exploit
  multicore processors
Effective Applications for Multicore Processors
• Database
• Servers handling independent transactions
• Multi-threaded native applications
   — Lotus Domino, Siebel CRM
• Multi-process applications
   — Oracle, SAP, PeopleSoft
• Java applications
   — Java VM is multi-thread with scheduling and memory
     management
   — Sun’s Java Application Server, BEA’s Weblogic, IBM
     Websphere, Tomcat
• Multi-instance applications
   — One application running multiple times
• E.g. Value Game Software
Multicore Organization
•   Number of core processors on chip
•   Number of levels of cache on chip
•   Amount of shared cache
•   Next slide examples of each organization:
•   (a) ARM11 MPCore
•   (b) AMD Opteron
•   (c) Intel Core Duo
•   (d) Intel Core i7
Multicore Organization Alternatives
Advantages of shared L2 Cache
• Constructive interference reduces overall miss
  rate
• Data shared by multiple cores not replicated at
  cache level
• With proper frame replacement algorithms mean
  amount of shared cache dedicated to each core is
  dynamic
  — Threads with less locality can have more cache
• Easy inter-process communication through
  shared memory
• Cache coherency confined to L1
• Dedicated L2 cache gives each core more rapid
  access
  — Good for threads with strong locality
• Shared L3 cache may also improve performance
Individual Core Architecture
• Intel Core Duo uses superscalar cores
• Intel Core i7 uses simultaneous multi-
  threading (SMT)
  —Scales up number of threads supported
     – 4 SMT cores, each supporting 4 threads appears as
       16 core
Intel x86 Multicore Organization -
Core Duo (1)
• 2006
• Two x86 superscalar, shared L2 cache
• Dedicated L1 cache per core
   —32KB instruction and 32KB data
• Thermal control unit per core
   —Manages chip heat dissipation
   —Maximize performance within constraints
   —Improved ergonomics
• Advanced Programmable Interrupt
  Controlled (APIC)
   —Inter-process interrupts between cores
   —Routes interrupts to appropriate core
   —Includes timer so OS can interrupt core
Intel x86 Multicore Organization -
Core Duo (2)
• Power Management Logic
   —Monitors thermal conditions and CPU activity
   —Adjusts voltage and power consumption
   —Can switch individual logic subsystems
• 2MB shared L2 cache
   —Dynamic allocation
   —MESI support for L1 caches
   —Extended to support multiple Core Duo in SMP
     – L2 data shared between local cores or external
• Bus interface
Intel x86 Multicore Organization -
Core i7
•   November 2008
•   Four x86 SMT processors
•   Dedicated L2, shared L3 cache
•   Speculative pre-fetch for caches
•   On chip DDR3 memory controller
    — Three 8 byte channels (192 bits) giving 32GB/s
    — No front side bus
• QuickPath Interconnection
    — Cache coherent point-to-point link
    — High speed communications between processor chips
    — 6.4G transfers per second, 16 bits per transfer
    — Dedicated bi-directional pairs
    — Total bandwidth 25.6GB/s
ARM11 MPCore
• Up to 4 processors each with own L1 instruction and data
  cache
• Distributed interrupt controller
• Timer per CPU
• Watchdog
   — Warning alerts for software failures
   — Counts down from predetermined values
   — Issues warning at zero
• CPU interface
   — Interrupt acknowledgement, masking and completion
     acknowledgement
• CPU
   — Single ARM11 called MP11
• Vector floating-point unit
   — FP co-processor
• L1 cache
• Snoop control unit
   — L1 cache coherency
ARM11
MPCore
Block
Diagram
ARM11 MPCore Interrupt Handling
• Distributed Interrupt Controller (DIC) collates
  from many sources
• Masking
• Prioritization
• Distribution to target MP11 CPUs
• Status tracking
• Software interrupt generation
• Number of interrupts independent of MP11 CPU
  design
• Memory mapped
• Accessed by CPUs via private interface through
  SCU
• Can route interrupts to single or multiple CPUs
• Provides inter-process communication
  — Thread on one CPU can cause activity by thread on
    another CPU
DIC Routing
•   Direct to specific CPU
•   To defined group of CPUs
•   To all CPUs
•   OS can generate interrupt to:
    —All but self
    —Self
    —Other specific CPU
• Typically combined with shared memory
  for inter-process communication
• 16 interrupt ids available for inter-process
  communication
Interrupt States
• Inactive
  —Non-asserted
  —Completed by that CPU but pending or active
   in others
• Pending
  —Asserted
  —Processing not started on that CPU
• Active
  —Started on that CPU but not complete
  —Can be pre-empted by higher priority interrupt
Interrupt Sources
• Inter-process Interrupts (IPI)
   — Private to CPU
   — ID0-ID15
   — Software triggered
   — Priority depends on target CPU not source
• Private timer and/or watchdog interrupt
   — ID29 and ID30
• Legacy FIQ line
   — Legacy FIQ pin, per CPU, bypasses interrupt distributor
   — Directly drives interrupts to CPU
• Hardware
   — Triggered by programmable events on associated
     interrupt lines
   — Up to 224 lines
   — Start at ID32
ARM11 MPCore Interrupt Distributor
Cache Coherency
• Snoop Control Unit (SCU) resolves most shared
  data bottleneck issues
• L1 cache coherency based on MESI
• Direct data Intervention
  — Copying clean entries between L1 caches without
    accessing external memory
  — Reduces read after write from L1 to L2
  — Can resolve local L1 miss from rmote L1 rather than L2
• Duplicated tag RAMs
  — Cache tags implemented as separate block of RAM
  — Same length as number of lines in cache
  — Duplicates used by SCU to check data availability before
    sending coherency commands
  — Only send to CPUs that must update coherent data
    cache
• Migratory lines
  — Allows moving dirty data between CPUs without writing
    to L2 and reading back from external memory
Recommended Reading
• Stallings chapter 18
• ARM web site
Intel Core i& Block Diagram
Intel Core Duo Block Diagram
Performance Effect of Multiple Cores
Recommended Reading
• Multicore Association web site
• ARM web site

More Related Content

What's hot (20)

PDF
Multicore Computers
Dr. A. B. Shinde
 
PPT
Multicore computers
Syed Zaid Irshad
 
PPTX
Multi core processor
Muhammad Ishaq
 
PDF
High Performance Computer Architecture
Subhasis Dash
 
PPTX
Symmetric multiprocessing
Mohammad Ali Khan
 
PPTX
Communication in Distributed Systems
Dilum Bandara
 
PPTX
ARM Architecture in Details
GlobalLogic Ukraine
 
PPTX
Storage interface sata_pata
PREMAL GAJJAR
 
PDF
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Raid Data Recovery
 
PPT
Introduction to embedded systems
Amr Ali (ISTQB CTAL Full, CSM, ITIL Foundation)
 
PPT
Chapter 2 - Computer Evolution and Performance
César de Souza
 
PPTX
UNIT 3 - General Purpose Processors
ButtaRajasekhar2
 
PPTX
Broadcom PCIe & CXL Switches OCP Final.pptx
Memory Fabric Forum
 
PDF
Introduction to High-Performance Computing
Umarudin Zaenuri
 
PPTX
RISC-V Introduction
Yi-Hsiu Hsu
 
PPTX
Draw and explain the architecture of general purpose microprocessor
Pundra university Science and technology
 
PPTX
Multithreading computer architecture
Haris456
 
PPTX
Multicore processor by Ankit Raj and Akash Prajapati
Ankit Raj
 
PDF
Zynq architecture
Nguyen Le Hung Nguyen
 
PPT
6 multiprogramming & time sharing
myrajendra
 
Multicore Computers
Dr. A. B. Shinde
 
Multicore computers
Syed Zaid Irshad
 
Multi core processor
Muhammad Ishaq
 
High Performance Computer Architecture
Subhasis Dash
 
Symmetric multiprocessing
Mohammad Ali Khan
 
Communication in Distributed Systems
Dilum Bandara
 
ARM Architecture in Details
GlobalLogic Ukraine
 
Storage interface sata_pata
PREMAL GAJJAR
 
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Raid Data Recovery
 
Introduction to embedded systems
Amr Ali (ISTQB CTAL Full, CSM, ITIL Foundation)
 
Chapter 2 - Computer Evolution and Performance
César de Souza
 
UNIT 3 - General Purpose Processors
ButtaRajasekhar2
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Memory Fabric Forum
 
Introduction to High-Performance Computing
Umarudin Zaenuri
 
RISC-V Introduction
Yi-Hsiu Hsu
 
Draw and explain the architecture of general purpose microprocessor
Pundra university Science and technology
 
Multithreading computer architecture
Haris456
 
Multicore processor by Ankit Raj and Akash Prajapati
Ankit Raj
 
Zynq architecture
Nguyen Le Hung Nguyen
 
6 multiprogramming & time sharing
myrajendra
 

Similar to chap 18 multicore computers (20)

PPTX
Multiprocessor.pptx
Muhammad54342
 
PPTX
CPU Caches
shinolajla
 
PPT
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
Kadri20
 
PPTX
Mces MOD 1.pptx
RadhaC10
 
PPTX
Intel® hyper threading technology
Amirali Sharifian
 
PPTX
Introduction to embedded System.pptx
Pratik Gohel
 
PDF
Motivation for multithreaded architectures
Young Alista
 
PPT
parallel-processing.ppt
MohammedAbdelgader2
 
PPT
Lecture 1 (distributed systems)
Fazli Amin
 
PPT
18 parallel processing
dilip kumar
 
PDF
5_Embedded Systems مختصر.pdf
aliamjd
 
PPTX
Architecture of high end processors
University of Gujrat, Pakistan
 
PPT
Multiprocessor_YChen.ppt
AberaZeleke1
 
PPTX
Computer Organization: Introduction to Microprocessor and Microcontroller
AmrutaMehata
 
PPT
Dsp ajal
AJAL A J
 
PPT
Cpu
abinarkt
 
PDF
OS-Part-01.pdf
NguyenTienDungK17HL
 
PPTX
Parallel Processors (SIMD)
Ali Raza
 
PPTX
Parallel Processors (SIMD)
Ali Raza
 
PPTX
Embedded systems 101 final
Khalid Elmeadawy
 
Multiprocessor.pptx
Muhammad54342
 
CPU Caches
shinolajla
 
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
Kadri20
 
Mces MOD 1.pptx
RadhaC10
 
Intel® hyper threading technology
Amirali Sharifian
 
Introduction to embedded System.pptx
Pratik Gohel
 
Motivation for multithreaded architectures
Young Alista
 
parallel-processing.ppt
MohammedAbdelgader2
 
Lecture 1 (distributed systems)
Fazli Amin
 
18 parallel processing
dilip kumar
 
5_Embedded Systems مختصر.pdf
aliamjd
 
Architecture of high end processors
University of Gujrat, Pakistan
 
Multiprocessor_YChen.ppt
AberaZeleke1
 
Computer Organization: Introduction to Microprocessor and Microcontroller
AmrutaMehata
 
Dsp ajal
AJAL A J
 
OS-Part-01.pdf
NguyenTienDungK17HL
 
Parallel Processors (SIMD)
Ali Raza
 
Parallel Processors (SIMD)
Ali Raza
 
Embedded systems 101 final
Khalid Elmeadawy
 
Ad

More from Sher Shah Merkhel (13)

PPT
13 risc
Sher Shah Merkhel
 
PPT
12 processor structure and function
Sher Shah Merkhel
 
PPT
11 instruction sets addressing modes
Sher Shah Merkhel
 
PPT
10 instruction sets characteristics
Sher Shah Merkhel
 
PPT
09 arithmetic 2
Sher Shah Merkhel
 
PPT
09 arithmetic
Sher Shah Merkhel
 
PPT
08 operating system support
Sher Shah Merkhel
 
PPT
07 input output
Sher Shah Merkhel
 
PPT
06 external memory
Sher Shah Merkhel
 
PPT
04 cache memory
Sher Shah Merkhel
 
PPT
03 top level view of computer function and interconnection
Sher Shah Merkhel
 
PPT
02 computer evolution and performance
Sher Shah Merkhel
 
PPT
01 introduction
Sher Shah Merkhel
 
12 processor structure and function
Sher Shah Merkhel
 
11 instruction sets addressing modes
Sher Shah Merkhel
 
10 instruction sets characteristics
Sher Shah Merkhel
 
09 arithmetic 2
Sher Shah Merkhel
 
09 arithmetic
Sher Shah Merkhel
 
08 operating system support
Sher Shah Merkhel
 
07 input output
Sher Shah Merkhel
 
06 external memory
Sher Shah Merkhel
 
04 cache memory
Sher Shah Merkhel
 
03 top level view of computer function and interconnection
Sher Shah Merkhel
 
02 computer evolution and performance
Sher Shah Merkhel
 
01 introduction
Sher Shah Merkhel
 
Ad

Recently uploaded (20)

PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
PPTX
How to Set Maximum Difference Odoo 18 POS
Celine George
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PDF
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PDF
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
PDF
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
community health nursing question paper 2.pdf
Prince kumar
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
How to Set Maximum Difference Odoo 18 POS
Celine George
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
Dimensions of Societal Planning in Commonism
StefanMz
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 

chap 18 multicore computers

  • 1. William Stallings Computer Organization and Architecture 8th Edition Chapter 18 Multicore Computers
  • 2. Hardware Performance Issues • Microprocessors have seen an exponential increase in performance —Improved organization —Increased clock frequency • Increase in Parallelism —Pipelining —Superscalar —Simultaneous multithreading (SMT) • Diminishing returns —More complexity requires more logic —Increasing chip area for coordinating and signal transfer logic – Harder to design, make and debug
  • 5. Increased Complexity • Power requirements grow exponentially with chip density and clock frequency — Can use more chip area for cache – Smaller – Order of magnitude lower power requirements • By 2015 — 100 billion transistors on 300mm2 die – Cache of 100MB – 1 billion transistors for logic • Pollack’s rule: — Performance is roughly proportional to square root of increase in complexity – Double complexity gives 40% more performance • Multicore has potential for near-linear improvement • Unlikely that one core can use all cache effectively
  • 6. Power and Memory Considerations
  • 7. Chip Utilization of Transistors
  • 8. Software Performance Issues • Performance benefits dependent on effective exploitation of parallel resources • Even small amounts of serial code impact performance —10% inherently serial on 8 processor system gives only 4.7 times performance • Communication, distribution of work and cache coherence overheads • Some applications effectively exploit multicore processors
  • 9. Effective Applications for Multicore Processors • Database • Servers handling independent transactions • Multi-threaded native applications — Lotus Domino, Siebel CRM • Multi-process applications — Oracle, SAP, PeopleSoft • Java applications — Java VM is multi-thread with scheduling and memory management — Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat • Multi-instance applications — One application running multiple times • E.g. Value Game Software
  • 10. Multicore Organization • Number of core processors on chip • Number of levels of cache on chip • Amount of shared cache • Next slide examples of each organization: • (a) ARM11 MPCore • (b) AMD Opteron • (c) Intel Core Duo • (d) Intel Core i7
  • 12. Advantages of shared L2 Cache • Constructive interference reduces overall miss rate • Data shared by multiple cores not replicated at cache level • With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic — Threads with less locality can have more cache • Easy inter-process communication through shared memory • Cache coherency confined to L1 • Dedicated L2 cache gives each core more rapid access — Good for threads with strong locality • Shared L3 cache may also improve performance
  • 13. Individual Core Architecture • Intel Core Duo uses superscalar cores • Intel Core i7 uses simultaneous multi- threading (SMT) —Scales up number of threads supported – 4 SMT cores, each supporting 4 threads appears as 16 core
  • 14. Intel x86 Multicore Organization - Core Duo (1) • 2006 • Two x86 superscalar, shared L2 cache • Dedicated L1 cache per core —32KB instruction and 32KB data • Thermal control unit per core —Manages chip heat dissipation —Maximize performance within constraints —Improved ergonomics • Advanced Programmable Interrupt Controlled (APIC) —Inter-process interrupts between cores —Routes interrupts to appropriate core —Includes timer so OS can interrupt core
  • 15. Intel x86 Multicore Organization - Core Duo (2) • Power Management Logic —Monitors thermal conditions and CPU activity —Adjusts voltage and power consumption —Can switch individual logic subsystems • 2MB shared L2 cache —Dynamic allocation —MESI support for L1 caches —Extended to support multiple Core Duo in SMP – L2 data shared between local cores or external • Bus interface
  • 16. Intel x86 Multicore Organization - Core i7 • November 2008 • Four x86 SMT processors • Dedicated L2, shared L3 cache • Speculative pre-fetch for caches • On chip DDR3 memory controller — Three 8 byte channels (192 bits) giving 32GB/s — No front side bus • QuickPath Interconnection — Cache coherent point-to-point link — High speed communications between processor chips — 6.4G transfers per second, 16 bits per transfer — Dedicated bi-directional pairs — Total bandwidth 25.6GB/s
  • 17. ARM11 MPCore • Up to 4 processors each with own L1 instruction and data cache • Distributed interrupt controller • Timer per CPU • Watchdog — Warning alerts for software failures — Counts down from predetermined values — Issues warning at zero • CPU interface — Interrupt acknowledgement, masking and completion acknowledgement • CPU — Single ARM11 called MP11 • Vector floating-point unit — FP co-processor • L1 cache • Snoop control unit — L1 cache coherency
  • 19. ARM11 MPCore Interrupt Handling • Distributed Interrupt Controller (DIC) collates from many sources • Masking • Prioritization • Distribution to target MP11 CPUs • Status tracking • Software interrupt generation • Number of interrupts independent of MP11 CPU design • Memory mapped • Accessed by CPUs via private interface through SCU • Can route interrupts to single or multiple CPUs • Provides inter-process communication — Thread on one CPU can cause activity by thread on another CPU
  • 20. DIC Routing • Direct to specific CPU • To defined group of CPUs • To all CPUs • OS can generate interrupt to: —All but self —Self —Other specific CPU • Typically combined with shared memory for inter-process communication • 16 interrupt ids available for inter-process communication
  • 21. Interrupt States • Inactive —Non-asserted —Completed by that CPU but pending or active in others • Pending —Asserted —Processing not started on that CPU • Active —Started on that CPU but not complete —Can be pre-empted by higher priority interrupt
  • 22. Interrupt Sources • Inter-process Interrupts (IPI) — Private to CPU — ID0-ID15 — Software triggered — Priority depends on target CPU not source • Private timer and/or watchdog interrupt — ID29 and ID30 • Legacy FIQ line — Legacy FIQ pin, per CPU, bypasses interrupt distributor — Directly drives interrupts to CPU • Hardware — Triggered by programmable events on associated interrupt lines — Up to 224 lines — Start at ID32
  • 23. ARM11 MPCore Interrupt Distributor
  • 24. Cache Coherency • Snoop Control Unit (SCU) resolves most shared data bottleneck issues • L1 cache coherency based on MESI • Direct data Intervention — Copying clean entries between L1 caches without accessing external memory — Reduces read after write from L1 to L2 — Can resolve local L1 miss from rmote L1 rather than L2 • Duplicated tag RAMs — Cache tags implemented as separate block of RAM — Same length as number of lines in cache — Duplicates used by SCU to check data availability before sending coherency commands — Only send to CPUs that must update coherent data cache • Migratory lines — Allows moving dirty data between CPUs without writing to L2 and reading back from external memory
  • 25. Recommended Reading • Stallings chapter 18 • ARM web site
  • 26. Intel Core i& Block Diagram
  • 27. Intel Core Duo Block Diagram
  • 28. Performance Effect of Multiple Cores
  • 29. Recommended Reading • Multicore Association web site • ARM web site