Advanced Computer Architecture
The Architecture of
Parallel Computers
Computer Systems
Hardware
Architecture
Operating
System
Application
Software
No Component
Can be Treated
In Isolation
From the Others
Hardware Issues
• Number and Type of Processors
• Processor Control
• Memory Hierarchy
• I/O devices and Peripherals
• Operating System Support
• Applications Software Compatibility
Operating System Issues
• Allocating and Managing Resources
• Access to Hardware Features
– Multi-Processing
– Multi-Threading
• I/O Management
• Access to Peripherals
• Efficiency
Applications Issues
• Compiler/Linker Support
• Programmability
• OS/Hardware Feature Availability
• Compatibility
• Parallel Compilers
– Preprocessor
– Precompiler
– Parallelizing Compiler
Architecture Evolution
• Scalar Architecture
• Prefetch Fetch/Execute Overlap
• Multiple Functional Units
• Pipelining
• Vector Processors
• Lock-Step Processors
• Multi-Processor
Flynn’s Classification
• Consider Instruction Streams and Data
Streams Separately.
• SISD - Single Instruction, Single Data
Stream
• SIMD - Single Instruction, Multiple Data
Streams
• MIMD - Multiple Instruction, Multiple Data
Streams.
• MISD - (rare) Multiple Instruction, Single
Data Stream
SISD
• Conventional Computers.
• Pipelined Systems
• Multiple-Functional Unit Systems
• Pipelined Vector Processors
• Includes most computers encountered in
everyday life
SIMD
• Multiple Processors Execute a Single
Program
• Each Processor operates on its own data
• Vector Processors
• Array Processors
• PRAM Theoretical Model
MIMD
• Multiple Processors cooperate on a single
task
• Each Processor runs a different program
• Each Processor operates on different data
• Many Commercial Examples Exist
MISD
• A Single Data Stream passes through
multiple processors
• Different operations are triggered on
different processors
• Systolic Arrays
• Wave-Front Arrays
Programming Issues
• Parallel Computers are Difficult to Program
• Automatic Parallelization Techniques are
only Partially Successful
• Programming languages are few, not well
supported, and difficult to use.
• Parallel Algorithms are difficult to design.
Performance Issues
• Clock Rate / Cycle Time = τ
• Cycles Per Instruction (Average) = CPI
• Instruction Count = Ic
• Time, T = Ic × CPI × τ
• p = Processor Cycles, m = Memory Cycles,
k = Memory/Processor cycle ratio
• T = Ic × (p + m × k) × τ
Performance Issues II
• Ic & p affected by processor design and
compiler technology.
• m affected mainly by compiler technology
τ affected by processor design
• k affected by memory hierarchy structure
and design
Other Measures
• MIPS rate - Millions of instructions per
second
• Clock Rate for similar processors
• MFLOPS rate - Millions of floating point
operations per second.
• These measures are not neccessarily directly
comparable between different types of
processors.
Parallelizing Code
• Implicitly
– Write Sequential Algorithms
– Use a Parallelizing Compiler
– Rely on compiler to find parallelism
• Explicitly
– Design Parallel Algorithms
– Write in a Parallel Language
– Rely on Human to find Parallelism
Multi-Processors
• Multi-Processors generally share memory,
while multi-computers do not.
– Uniform memory model
– Non-Uniform Memory Model
– Cache-Only
• MIMD Machines
Multi-Computers
• Independent Computers that Don’t Share
Memory.
• Connected by High-Speed Communication
Network
• More tightly coupled than a collection of
independent computers
• Cooperate on a single problem
Vector Computers
• Independent Vector Hardware
• May be an attached processor
• Has both scalar and vector instructions
• Vector instructions operate in highly
pipelined mode
• Can be Memory-to-Memory or Register-to-
Register
SIMD Computers
• One Control Processor
• Several Processing Elements
• All Processing Elements execute the same
instruction at the same time
• Interconnection network between PEs
determines memory access and PE
interaction
The PRAM Model
• SIMD Style Programming
• Uniform Global Memory
• Local Memory in Each PE
• Memory Conflict Resolution
– CRCW - Common Read, Common Write
– CREW - Common Read, Exclusive Write
– EREW - Exclusive Read, Exclusive Write
– ERCW - (rare) Exclusive Read, Common Write
The VLSI Model
• Implement Algorithm as a mostly
combinational circuit
• Determine the area required for
implementation
• Determine the depth of the circuit
Advanced Computer Architecture
The Architecture of
Parallel Computers
Computer Systems
Hardware
Architecture
Operating
System
Application
Software
No Component
Can be Treated
In Isolation
From the Others
Hardware Issues
• Number and Type of Processors
• Processor Control
• Memory Hierarchy
• I/O devices and Peripherals
• Operating System Support
• Applications Software Compatibility
Operating System Issues
• Allocating and Managing Resources
• Access to Hardware Features
– Multi-Processing
– Multi-Threading
• I/O Management
• Access to Peripherals
• Efficiency
Applications Issues
• Compiler/Linker Support
• Programmability
• OS/Hardware Feature Availability
• Compatibility
• Parallel Compilers
– Preprocessor
– Precompiler
– Parallelizing Compiler
Architecture Evolution
• Scalar Architecture
• Prefetch Fetch/Execute Overlap
• Multiple Functional Units
• Pipelining
• Vector Processors
• Lock-Step Processors
• Multi-Processor
Flynn’s Classification
• Consider Instruction Streams and Data
Streams Separately.
• SISD - Single Instruction, Single Data
Stream
• SIMD - Single Instruction, Multiple Data
Streams
• MIMD - Multiple Instruction, Multiple Data
Streams.
• MISD - (rare) Multiple Instruction, Single
Data Stream
SISD
• Conventional Computers.
• Pipelined Systems
• Multiple-Functional Unit Systems
• Pipelined Vector Processors
• Includes most computers encountered in
everyday life
SIMD
• Multiple Processors Execute a Single
Program
• Each Processor operates on its own data
• Vector Processors
• Array Processors
• PRAM Theoretical Model
MIMD
• Multiple Processors cooperate on a single
task
• Each Processor runs a different program
• Each Processor operates on different data
• Many Commercial Examples Exist
MISD
• A Single Data Stream passes through
multiple processors
• Different operations are triggered on
different processors
• Systolic Arrays
• Wave-Front Arrays
Programming Issues
• Parallel Computers are Difficult to Program
• Automatic Parallelization Techniques are
only Partially Successful
• Programming languages are few, not well
supported, and difficult to use.
• Parallel Algorithms are difficult to design.
Performance Issues
• Clock Rate / Cycle Time = τ
• Cycles Per Instruction (Average) = CPI
• Instruction Count = Ic
• Time, T = Ic × CPI × τ
• p = Processor Cycles, m = Memory Cycles,
k = Memory/Processor cycle ratio
• T = Ic × (p + m × k) × τ
Performance Issues II
• Ic & p affected by processor design and
compiler technology.
• m affected mainly by compiler technology
τ affected by processor design
• k affected by memory hierarchy structure
and design
Other Measures
• MIPS rate - Millions of instructions per
second
• Clock Rate for similar processors
• MFLOPS rate - Millions of floating point
operations per second.
• These measures are not neccessarily directly
comparable between different types of
processors.
Parallelizing Code
• Implicitly
– Write Sequential Algorithms
– Use a Parallelizing Compiler
– Rely on compiler to find parallelism
• Explicitly
– Design Parallel Algorithms
– Write in a Parallel Language
– Rely on Human to find Parallelism
Multi-Processors
• Multi-Processors generally share memory,
while multi-computers do not.
– Uniform memory model
– Non-Uniform Memory Model
– Cache-Only
• MIMD Machines
Multi-Computers
• Independent Computers that Don’t Share
Memory.
• Connected by High-Speed Communication
Network
• More tightly coupled than a collection of
independent computers
• Cooperate on a single problem
Vector Computers
• Independent Vector Hardware
• May be an attached processor
• Has both scalar and vector instructions
• Vector instructions operate in highly
pipelined mode
• Can be Memory-to-Memory or Register-to-
Register
SIMD Computers
• One Control Processor
• Several Processing Elements
• All Processing Elements execute the same
instruction at the same time
• Interconnection network between PEs
determines memory access and PE
interaction
The PRAM Model
• SIMD Style Programming
• Uniform Global Memory
• Local Memory in Each PE
• Memory Conflict Resolution
– CRCW - Common Read, Common Write
– CREW - Common Read, Exclusive Write
– EREW - Exclusive Read, Exclusive Write
– ERCW - (rare) Exclusive Read, Common Write
The VLSI Model
• Implement Algorithm as a mostly
combinational circuit
• Determine the area required for
implementation
• Determine the depth of the circuit

More Related Content

PPTX
Parallel architecture-programming
PPTX
CA UNIT IV.pptx
PPTX
Parallel architecture &programming
DOC
Aca module 1
PPT
Lecture 1 (distributed systems)
PDF
5_Parallel & Distributed , Computing.pdf
PPTX
aca mod1.pptx
PPTX
CSE_17CS72_U1_S1_Pr.pptxx"xxxxxxxxxxxx"xx
Parallel architecture-programming
CA UNIT IV.pptx
Parallel architecture &programming
Aca module 1
Lecture 1 (distributed systems)
5_Parallel & Distributed , Computing.pdf
aca mod1.pptx
CSE_17CS72_U1_S1_Pr.pptxx"xxxxxxxxxxxx"xx

Similar to archintro.pdf (20)

PPTX
Introduction & Background(Operating Systems).pptx
PPTX
High performance computing
PDF
Introduction to embedded system design
PPTX
Parallel Processors (SIMD)
PPTX
Parallel Processors (SIMD)
PPT
chapter-18-parallel-processing-multiprocessing (1).ppt
PPT
parallel processing.ppt
PPT
Chap4.ppt
PPT
Chap4.ppt
PPT
The Central Processing Unit(CPU) for Chapter 4
PPT
Chap4.ppt
PPT
Computer !
PPT
Chapter4 Data Processing
PPT
Embeddedsystem basic for Engineering Students
PPTX
Computer system organization
PPT
Ch 2
PPTX
CSA unit5.pptx
PDF
Hpc lunch and learn
PPTX
Week 13-14 Parrallel Processing-new.pptx
PPT
Parallel processing
Introduction & Background(Operating Systems).pptx
High performance computing
Introduction to embedded system design
Parallel Processors (SIMD)
Parallel Processors (SIMD)
chapter-18-parallel-processing-multiprocessing (1).ppt
parallel processing.ppt
Chap4.ppt
Chap4.ppt
The Central Processing Unit(CPU) for Chapter 4
Chap4.ppt
Computer !
Chapter4 Data Processing
Embeddedsystem basic for Engineering Students
Computer system organization
Ch 2
CSA unit5.pptx
Hpc lunch and learn
Week 13-14 Parrallel Processing-new.pptx
Parallel processing
Ad

Recently uploaded (20)

PPT
Comprehensive Java Training Deck - Advanced topics
PPTX
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
DOCX
An investigation of the use of recycled crumb rubber as a partial replacement...
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PDF
Cryptography and Network Security-Module-I.pdf
PPTX
Solar energy pdf of gitam songa hemant k
PPTX
WN UNIT-II CH4_MKaruna_BapatlaEngineeringCollege.pptx
PPTX
Unit IImachinemachinetoolopeartions.pptx
PPTX
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
PPT
Programmable Logic Controller PLC and Industrial Automation
PDF
Research on ultrasonic sensor for TTU.pdf
PDF
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)
PDF
Artificial Intelligence_ Basics .Artificial Intelligence_ Basics .
PPTX
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
PPT
UNIT-I Machine Learning Essentials for 2nd years
PPTX
chapter 1.pptx dotnet technology introduction
PPTX
Agentic Artificial Intelligence (Agentic AI).pptx
PDF
electrical machines course file-anna university
PDF
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
Comprehensive Java Training Deck - Advanced topics
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
An investigation of the use of recycled crumb rubber as a partial replacement...
Beginners-Guide-to-Artificial-Intelligence.pdf
Cryptography and Network Security-Module-I.pdf
Solar energy pdf of gitam songa hemant k
WN UNIT-II CH4_MKaruna_BapatlaEngineeringCollege.pptx
Unit IImachinemachinetoolopeartions.pptx
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
Programmable Logic Controller PLC and Industrial Automation
Research on ultrasonic sensor for TTU.pdf
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)
Artificial Intelligence_ Basics .Artificial Intelligence_ Basics .
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
UNIT-I Machine Learning Essentials for 2nd years
chapter 1.pptx dotnet technology introduction
Agentic Artificial Intelligence (Agentic AI).pptx
electrical machines course file-anna university
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
Ad

archintro.pdf

  • 1. Advanced Computer Architecture The Architecture of Parallel Computers
  • 3. Hardware Issues • Number and Type of Processors • Processor Control • Memory Hierarchy • I/O devices and Peripherals • Operating System Support • Applications Software Compatibility
  • 4. Operating System Issues • Allocating and Managing Resources • Access to Hardware Features – Multi-Processing – Multi-Threading • I/O Management • Access to Peripherals • Efficiency
  • 5. Applications Issues • Compiler/Linker Support • Programmability • OS/Hardware Feature Availability • Compatibility • Parallel Compilers – Preprocessor – Precompiler – Parallelizing Compiler
  • 6. Architecture Evolution • Scalar Architecture • Prefetch Fetch/Execute Overlap • Multiple Functional Units • Pipelining • Vector Processors • Lock-Step Processors • Multi-Processor
  • 7. Flynn’s Classification • Consider Instruction Streams and Data Streams Separately. • SISD - Single Instruction, Single Data Stream • SIMD - Single Instruction, Multiple Data Streams • MIMD - Multiple Instruction, Multiple Data Streams. • MISD - (rare) Multiple Instruction, Single Data Stream
  • 8. SISD • Conventional Computers. • Pipelined Systems • Multiple-Functional Unit Systems • Pipelined Vector Processors • Includes most computers encountered in everyday life
  • 9. SIMD • Multiple Processors Execute a Single Program • Each Processor operates on its own data • Vector Processors • Array Processors • PRAM Theoretical Model
  • 10. MIMD • Multiple Processors cooperate on a single task • Each Processor runs a different program • Each Processor operates on different data • Many Commercial Examples Exist
  • 11. MISD • A Single Data Stream passes through multiple processors • Different operations are triggered on different processors • Systolic Arrays • Wave-Front Arrays
  • 12. Programming Issues • Parallel Computers are Difficult to Program • Automatic Parallelization Techniques are only Partially Successful • Programming languages are few, not well supported, and difficult to use. • Parallel Algorithms are difficult to design.
  • 13. Performance Issues • Clock Rate / Cycle Time = τ • Cycles Per Instruction (Average) = CPI • Instruction Count = Ic • Time, T = Ic × CPI × τ • p = Processor Cycles, m = Memory Cycles, k = Memory/Processor cycle ratio • T = Ic × (p + m × k) × τ
  • 14. Performance Issues II • Ic & p affected by processor design and compiler technology. • m affected mainly by compiler technology τ affected by processor design • k affected by memory hierarchy structure and design
  • 15. Other Measures • MIPS rate - Millions of instructions per second • Clock Rate for similar processors • MFLOPS rate - Millions of floating point operations per second. • These measures are not neccessarily directly comparable between different types of processors.
  • 16. Parallelizing Code • Implicitly – Write Sequential Algorithms – Use a Parallelizing Compiler – Rely on compiler to find parallelism • Explicitly – Design Parallel Algorithms – Write in a Parallel Language – Rely on Human to find Parallelism
  • 17. Multi-Processors • Multi-Processors generally share memory, while multi-computers do not. – Uniform memory model – Non-Uniform Memory Model – Cache-Only • MIMD Machines
  • 18. Multi-Computers • Independent Computers that Don’t Share Memory. • Connected by High-Speed Communication Network • More tightly coupled than a collection of independent computers • Cooperate on a single problem
  • 19. Vector Computers • Independent Vector Hardware • May be an attached processor • Has both scalar and vector instructions • Vector instructions operate in highly pipelined mode • Can be Memory-to-Memory or Register-to- Register
  • 20. SIMD Computers • One Control Processor • Several Processing Elements • All Processing Elements execute the same instruction at the same time • Interconnection network between PEs determines memory access and PE interaction
  • 21. The PRAM Model • SIMD Style Programming • Uniform Global Memory • Local Memory in Each PE • Memory Conflict Resolution – CRCW - Common Read, Common Write – CREW - Common Read, Exclusive Write – EREW - Exclusive Read, Exclusive Write – ERCW - (rare) Exclusive Read, Common Write
  • 22. The VLSI Model • Implement Algorithm as a mostly combinational circuit • Determine the area required for implementation • Determine the depth of the circuit
  • 23. Advanced Computer Architecture The Architecture of Parallel Computers
  • 25. Hardware Issues • Number and Type of Processors • Processor Control • Memory Hierarchy • I/O devices and Peripherals • Operating System Support • Applications Software Compatibility
  • 26. Operating System Issues • Allocating and Managing Resources • Access to Hardware Features – Multi-Processing – Multi-Threading • I/O Management • Access to Peripherals • Efficiency
  • 27. Applications Issues • Compiler/Linker Support • Programmability • OS/Hardware Feature Availability • Compatibility • Parallel Compilers – Preprocessor – Precompiler – Parallelizing Compiler
  • 28. Architecture Evolution • Scalar Architecture • Prefetch Fetch/Execute Overlap • Multiple Functional Units • Pipelining • Vector Processors • Lock-Step Processors • Multi-Processor
  • 29. Flynn’s Classification • Consider Instruction Streams and Data Streams Separately. • SISD - Single Instruction, Single Data Stream • SIMD - Single Instruction, Multiple Data Streams • MIMD - Multiple Instruction, Multiple Data Streams. • MISD - (rare) Multiple Instruction, Single Data Stream
  • 30. SISD • Conventional Computers. • Pipelined Systems • Multiple-Functional Unit Systems • Pipelined Vector Processors • Includes most computers encountered in everyday life
  • 31. SIMD • Multiple Processors Execute a Single Program • Each Processor operates on its own data • Vector Processors • Array Processors • PRAM Theoretical Model
  • 32. MIMD • Multiple Processors cooperate on a single task • Each Processor runs a different program • Each Processor operates on different data • Many Commercial Examples Exist
  • 33. MISD • A Single Data Stream passes through multiple processors • Different operations are triggered on different processors • Systolic Arrays • Wave-Front Arrays
  • 34. Programming Issues • Parallel Computers are Difficult to Program • Automatic Parallelization Techniques are only Partially Successful • Programming languages are few, not well supported, and difficult to use. • Parallel Algorithms are difficult to design.
  • 35. Performance Issues • Clock Rate / Cycle Time = τ • Cycles Per Instruction (Average) = CPI • Instruction Count = Ic • Time, T = Ic × CPI × τ • p = Processor Cycles, m = Memory Cycles, k = Memory/Processor cycle ratio • T = Ic × (p + m × k) × τ
  • 36. Performance Issues II • Ic & p affected by processor design and compiler technology. • m affected mainly by compiler technology τ affected by processor design • k affected by memory hierarchy structure and design
  • 37. Other Measures • MIPS rate - Millions of instructions per second • Clock Rate for similar processors • MFLOPS rate - Millions of floating point operations per second. • These measures are not neccessarily directly comparable between different types of processors.
  • 38. Parallelizing Code • Implicitly – Write Sequential Algorithms – Use a Parallelizing Compiler – Rely on compiler to find parallelism • Explicitly – Design Parallel Algorithms – Write in a Parallel Language – Rely on Human to find Parallelism
  • 39. Multi-Processors • Multi-Processors generally share memory, while multi-computers do not. – Uniform memory model – Non-Uniform Memory Model – Cache-Only • MIMD Machines
  • 40. Multi-Computers • Independent Computers that Don’t Share Memory. • Connected by High-Speed Communication Network • More tightly coupled than a collection of independent computers • Cooperate on a single problem
  • 41. Vector Computers • Independent Vector Hardware • May be an attached processor • Has both scalar and vector instructions • Vector instructions operate in highly pipelined mode • Can be Memory-to-Memory or Register-to- Register
  • 42. SIMD Computers • One Control Processor • Several Processing Elements • All Processing Elements execute the same instruction at the same time • Interconnection network between PEs determines memory access and PE interaction
  • 43. The PRAM Model • SIMD Style Programming • Uniform Global Memory • Local Memory in Each PE • Memory Conflict Resolution – CRCW - Common Read, Common Write – CREW - Common Read, Exclusive Write – EREW - Exclusive Read, Exclusive Write – ERCW - (rare) Exclusive Read, Common Write
  • 44. The VLSI Model • Implement Algorithm as a mostly combinational circuit • Determine the area required for implementation • Determine the depth of the circuit