SlideShare a Scribd company logo
Parallel Processing: Architecture and System Overview Rajkumar Buyya Gri d Computing and  D istributed  S ystems (GRIDS) Lab .  The University of Melbourne, Australia www.gridbus.org/~raj WW Grid
Serial Vs. Parallel COUNTER COUNTER 1 COUNTER 2 Q Please
Overview of the Talk  Introduction  Why Parallel Processing ? Parallel System H/W Architecture Parallel Operating Systems Parallel Programming Models Summary
Computing Elements Hardware Operating System Applications Programming paradigms P P P P P P   Microkernel Multi-Processor Computing System Threads Interface Process Processor Thread P
Two Eras of Computing Architectures System Software/Compiler  Applications  P.S.Es Architectures  System Software Applications  P.S.Es  Sequential Era Parallel Era 1940  50  60  70  80  90  2000  2030 Commercialization  R & D  Commodity
History of Parallel Processing The notion of parallel processing can be traced to a tablet dated around 100 BC. Tablet has 3 calculating positions capable of operating simultaneously. From this we can infer that: They were aimed at “speed” or “reliability”.
Motivating Factor: Human Brain The human brain consists of a large number (more than a billion) of neural cells that process information. Each cell works like a simple processor and only the massive interaction between all cells and their parallel processing makes the brain's abilities possible.  Individual neuron response speed is slow (ms) Aggregated speed with which complex calculations carried out by (billions of) neurons demonstrate feasibility of parallel processing.
Why Parallel Processing? Computation requirements are ever increasing: simulations, scientific prediction (earthquake), distributed databases, weather forecasting (will it rain tomorrow?), search engines, e-commerce, Internet service applications, Data Center applications, Finance (investment risk analysis), Oil Exploration, Mining, etc. Silicon based (sequential) architectures reaching their limits in processing capabilities (clock speed) as they are constrained by: the speed of light, thermodynamics
Human Architecture! Growth Performance Age Growth 5  10  15  20  25  30  35  40  45   . . . .  Vertical Horizontal
Computational Power Improvement No. of Processors C.P.I 1  2 .  .  .  . Multiprocessor Uniprocessor
Why Parallel Processing? Hardware improvements like pipelining, superscalar are not scaling well and require sophisticated compiler technology to exploit performance out of them. Techniques such as vector processing works well for certain kind of problems.
Why Parallel Processing? Significant development in networking technology is paving a way for network-based cost-effective parallel computing. The parallel processing technology is mature and is being exploited commercially.
Processing Elements Architecture
Processing Elements Flynn proposed a classification of computer systems based on a number of instruction and data streams that can be processed simultaneously. They are: SISD (Single Instruction and Single Data) Conventional computers SIMD (Single Instruction and Multiple Data) Data parallel, vector computing machines MISD (Multiple Instruction and Single Data) Systolic arrays MIMD (Multiple Instruction and Multiple Data) General purpose machine
SISD : A Conventional Computer Speed is limited by the rate at which computer can transfer information internally. Ex: PCs, Workstations Processor Data Input Data Output Instructions
The MISD Architecture More of an intellectual exercise than a practical configuration. Few built, but commercially not available Data  Input Stream Data  Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction  Stream C
SIMD Architecture Ex: CRAY machine vector processing, Thinking machine cm* Intel MMX (multimedia support) C i <= A i  * B i Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C
Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream  A Instruction Stream B Instruction Stream C
Shared Memory MIMD machine Comm:  Source PE writes data to GM & destination retrieves it  Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandibility.  A memory component or any processor failure affects the whole system. Increase of processors leads to memory contention. Ex. : Silicon graphics supercomputers.... Global Memory System Processor A Processor B Processor C MEMORY BUS MEMORY BUS MEMORY BUS
Distributed Memory MIMD Communication : IPC (Inter-Process Communication) via  High Speed Network. Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD easily/ readily expandable Highly reliable (any CPU  failure does not affect the whole system) Processor A Processor B Processor C IPC channel IPC channel MEMORY BUS MEMORY BUS MEMORY BUS Memory System  A Memory System  B Memory System C
Types of Parallel Systems Tightly Couple Systems: Shared Memory Parallel Smallest extension to existing systems Program conversion is incremental Distributed Memory Parallel Completely new systems Programs must be reconstructed Loosely Coupled Systems: Clusters (now Clouds) Built using commodity systems Centralised management Grids Aggregation of distributed systems Decentralized management
Laws of caution..... Speed of computation is proportional to the  square root  of system cost.  i.e. Speed =  Cost Speedup by a parallel computer increases as the logarithm of the number of processors. Speedup = log2(no. of processors) C S S P log 2 P
Caution.... Very fast development in network computing and related area have blurred concept boundaries, causing lot of terminological confusion : concurrent computing, parallel computing, multiprocessing, supercomputing, massively parallel processing, cluster computing, distributed computing, Internet computing, grid computing, Cloud computing, etc. At the user level, even well-defined distinctions such as shared memory and distributed memory are disappearing due to new advances in technology. Good tools for parallel program development and debugging are yet to emerge.
Caution.... There is no strict delimiters for contributors to the area of parallel processing:  computer architecture, operating systems, high-level languages, algorithms, databases, computer networks, …  All have a role to play.
Operating Systems for High Performance Computing
Operating Systems for PP MPP systems having thousands of processors requires OS radically different from current ones. Every CPU needs OS : to manage its resources to hide its details Traditional systems are heavy, complex and not suitable for MPP
Operating System Models Frame work that unifies features, services and tasks performed Three approaches to building OS.... Monolithic OS Layered OS Microkernel based OS   Client server OS   Suitable for MPP systems Simplicity, flexibility and high performance are crucial for OS.
Monolithic Operating System Application Programs Application Programs System Services Hardware Better application Performance Difficult to extend Ex: MS-DOS User Mode Kernel Mode
Layered OS Easier to enhance Each layer of code access lower level interface Low-application performance Application Programs System Services User Mode Kernel Mode Memory & I/O Device Mgmt Hardware Process Schedule Application Programs Ex : UNIX
Traditional OS OS Designer OS Hardware User Mode Kernel Mode Application Programs Application Programs
New trend in OS design User Mode Kernel Mode Hardware Microkernel Servers Application Programs Application Programs
Microkernel/Client Server OS (for MPP Systems) Tiny OS kernel providing basic primitive (process, memory, IPC) Traditional services becomes  subsystems Monolithic Application Perf. Competence OS =  Microkernel + User Subsystems Client Application Thread  lib. File Server Network Server Display Server Microkernel Hardware Send Reply User Kernel
Few Popular Microkernel Systems MACH, CMU PARAS, C-DAC Chorus QNX  (Windows)
Parallel Programs Consist of multiple active “processes”  simultaneously solving a given problem. And the communication and synchronization between them (parallel processes) forms the core of parallel programming efforts.
Parallel Programming Models Shared Memory Model DSM Threads/OpenMP (enabled for clusters) Java threads (HKU JESSICA, IBM cJVM) Message Passing Model PVM  MPI Hybrid Model Mixing shared and distributed memory model Using OpenMP and MPI together Object and Service Oriented Models Wide area distributed computing technologies OO: CORBA, DCOM, etc. Services: Web Services-based service composition
Summary/Conclusions Parallel processing has become a reality: E.g., SMPs are used as (Web) Servers extensively. Threads concept utilized everywhere. Clusters have emerged as popular data centers and processing engine: E.g., Google search engine. The emergence of commodity high-performance CPU, networks, and OSs have made parallel computing applicable to enterprise applications. E.g., Oracle {9i,10g} database on Clusters/Grids.

More Related Content

What's hot (20)

PPTX
Advanced computer architecture
krishnaviswambharan
 
PPTX
Parallel computing and its applications
Burhan Ahmed
 
PDF
Reduce course notes class xi
Syed Zaid Irshad
 
PPTX
network ram parallel computing
Niranjana Ambadi
 
PPT
Parallel Processing Concepts
Dr Shashikant Athawale
 
PPTX
Introduction to Parallel Computing
Roshan Karunarathna
 
PDF
Intro to parallel computing
Piyush Mittal
 
PPTX
Parallel & Distributed processing
Syed Zaid Irshad
 
PPT
Parallel computing
Vinay Gupta
 
DOCX
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
PPT
multiprocessors and multicomputers
Pankaj Kumar Jain
 
PPT
Introduction to parallel_computing
Mehul Patel
 
PPSX
Research Scope in Parallel Computing And Parallel Programming
Shitalkumar Sukhdeve
 
PPT
Lecture 1
Mr SMAK
 
PPTX
Parallel computing in india
Preeti Chauhan
 
PPTX
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Zena Abo-Altaheen
 
PPTX
Applications of paralleL processing
Page Maker
 
PPTX
Hpc 4 5
Yasir Khan
 
DOCX
Mainframe Computers
Muhammad Ahtisham
 
PDF
Introduction to Parallel Computing
Akhila Prabhakaran
 
Advanced computer architecture
krishnaviswambharan
 
Parallel computing and its applications
Burhan Ahmed
 
Reduce course notes class xi
Syed Zaid Irshad
 
network ram parallel computing
Niranjana Ambadi
 
Parallel Processing Concepts
Dr Shashikant Athawale
 
Introduction to Parallel Computing
Roshan Karunarathna
 
Intro to parallel computing
Piyush Mittal
 
Parallel & Distributed processing
Syed Zaid Irshad
 
Parallel computing
Vinay Gupta
 
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
multiprocessors and multicomputers
Pankaj Kumar Jain
 
Introduction to parallel_computing
Mehul Patel
 
Research Scope in Parallel Computing And Parallel Programming
Shitalkumar Sukhdeve
 
Lecture 1
Mr SMAK
 
Parallel computing in india
Preeti Chauhan
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Zena Abo-Altaheen
 
Applications of paralleL processing
Page Maker
 
Hpc 4 5
Yasir Khan
 
Mainframe Computers
Muhammad Ahtisham
 
Introduction to Parallel Computing
Akhila Prabhakaran
 

Viewers also liked (17)

PPT
Southeast Asia Technology and Transparency Initiative (SEATTI) dan #OpenData
Shita Laksmi
 
PPT
SoTM US Routing
MapQuest
 
PPT
Indonesia, Internet Governance Forum and Multistakeholder
Shita Laksmi
 
PPT
#HACKJAK
Shita Laksmi
 
PPTX
Hivos dan Forum Data Terbuka Jakarta
Shita Laksmi
 
PDF
Tata Kelola Internet, Global dan Regional
Shita Laksmi
 
PPT
Using Technology, Improving Communities - SEATTI
Shita Laksmi
 
PPT
Indonesia Internet Governance Forum, a presentation in IGF Turkey, Istanbul
Shita Laksmi
 
PDF
NetInfo SRL Company Brochure
NetInfo SRL
 
PPT
Teknologi untuk Transparansi -- Information Camp
Shita Laksmi
 
PPTX
Evaluation post mortems
agermuth
 
PDF
Usulan Tata Cara Penanganan Konten Negatif - Maret 2016
Shita Laksmi
 
PPT
Manajemen dan Pengembangan Progam
Shita Laksmi
 
PPSX
Antes durante y despues de un desastre
Francis
 
PPTX
Improving Survey Questions and Responses
agermuth
 
PPT
Tata Kelola Internet Global
Shita Laksmi
 
PDF
Input Masyarakat Sipil - Prosedur Penanganan Konten Negatif
Shita Laksmi
 
Southeast Asia Technology and Transparency Initiative (SEATTI) dan #OpenData
Shita Laksmi
 
SoTM US Routing
MapQuest
 
Indonesia, Internet Governance Forum and Multistakeholder
Shita Laksmi
 
#HACKJAK
Shita Laksmi
 
Hivos dan Forum Data Terbuka Jakarta
Shita Laksmi
 
Tata Kelola Internet, Global dan Regional
Shita Laksmi
 
Using Technology, Improving Communities - SEATTI
Shita Laksmi
 
Indonesia Internet Governance Forum, a presentation in IGF Turkey, Istanbul
Shita Laksmi
 
NetInfo SRL Company Brochure
NetInfo SRL
 
Teknologi untuk Transparansi -- Information Camp
Shita Laksmi
 
Evaluation post mortems
agermuth
 
Usulan Tata Cara Penanganan Konten Negatif - Maret 2016
Shita Laksmi
 
Manajemen dan Pengembangan Progam
Shita Laksmi
 
Antes durante y despues de un desastre
Francis
 
Improving Survey Questions and Responses
agermuth
 
Tata Kelola Internet Global
Shita Laksmi
 
Input Masyarakat Sipil - Prosedur Penanganan Konten Negatif
Shita Laksmi
 
Ad

Similar to Par com (20)

ODP
Distributed Computing
Sudarsun Santhiappan
 
PPT
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
KRamasamy2
 
PPTX
intro, definitions, basic laws+.pptx
ssuser413a98
 
PPT
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
VAISHNAVI MADHAN
 
PDF
Lecture 1 introduction to parallel and distributed computing
Vajira Thambawita
 
PPTX
unit 4.pptx
SUBHAMSHARANRA211100
 
PPTX
unit 4.pptx
SUBHAMSHARANRA211100
 
PPTX
Computer organisation and architecture unit 5, SRM
sameerkrdbg
 
PPTX
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
aravym456
 
PDF
unit_1.pdf
JyotiChoudhary469897
 
PPTX
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
PPTX
Lecture 04 chapter 2 - Parallel Programming Platforms
National College of Business Administration & Economics ( NCBA&E)
 
PDF
Advanced processor Principles
Vinit Raut
 
PPTX
High performance computing
punjab engineering college, chandigarh
 
PPT
Parallel Computing
Mr. Vikram Singh Slathia
 
PPT
Parallel architecture
Mr SMAK
 
PPT
Parallel processing
rajshreemuthiah
 
PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
Carlos701746
 
PPTX
Parallel processing (simd and mimd)
Bhavik Vashi
 
PPTX
CA UNIT IV.pptx
ssuser9dbd7e
 
Distributed Computing
Sudarsun Santhiappan
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
KRamasamy2
 
intro, definitions, basic laws+.pptx
ssuser413a98
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
VAISHNAVI MADHAN
 
Lecture 1 introduction to parallel and distributed computing
Vajira Thambawita
 
Computer organisation and architecture unit 5, SRM
sameerkrdbg
 
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
aravym456
 
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
Lecture 04 chapter 2 - Parallel Programming Platforms
National College of Business Administration & Economics ( NCBA&E)
 
Advanced processor Principles
Vinit Raut
 
High performance computing
punjab engineering college, chandigarh
 
Parallel Computing
Mr. Vikram Singh Slathia
 
Parallel architecture
Mr SMAK
 
Parallel processing
rajshreemuthiah
 
KA 5 - Lecture 1 - Parallel Processing.pdf
Carlos701746
 
Parallel processing (simd and mimd)
Bhavik Vashi
 
CA UNIT IV.pptx
ssuser9dbd7e
 
Ad

Par com

  • 1. Parallel Processing: Architecture and System Overview Rajkumar Buyya Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne, Australia www.gridbus.org/~raj WW Grid
  • 2. Serial Vs. Parallel COUNTER COUNTER 1 COUNTER 2 Q Please
  • 3. Overview of the Talk Introduction Why Parallel Processing ? Parallel System H/W Architecture Parallel Operating Systems Parallel Programming Models Summary
  • 4. Computing Elements Hardware Operating System Applications Programming paradigms P P P P P P   Microkernel Multi-Processor Computing System Threads Interface Process Processor Thread P
  • 5. Two Eras of Computing Architectures System Software/Compiler Applications P.S.Es Architectures System Software Applications P.S.Es Sequential Era Parallel Era 1940 50 60 70 80 90 2000 2030 Commercialization R & D Commodity
  • 6. History of Parallel Processing The notion of parallel processing can be traced to a tablet dated around 100 BC. Tablet has 3 calculating positions capable of operating simultaneously. From this we can infer that: They were aimed at “speed” or “reliability”.
  • 7. Motivating Factor: Human Brain The human brain consists of a large number (more than a billion) of neural cells that process information. Each cell works like a simple processor and only the massive interaction between all cells and their parallel processing makes the brain's abilities possible. Individual neuron response speed is slow (ms) Aggregated speed with which complex calculations carried out by (billions of) neurons demonstrate feasibility of parallel processing.
  • 8. Why Parallel Processing? Computation requirements are ever increasing: simulations, scientific prediction (earthquake), distributed databases, weather forecasting (will it rain tomorrow?), search engines, e-commerce, Internet service applications, Data Center applications, Finance (investment risk analysis), Oil Exploration, Mining, etc. Silicon based (sequential) architectures reaching their limits in processing capabilities (clock speed) as they are constrained by: the speed of light, thermodynamics
  • 9. Human Architecture! Growth Performance Age Growth 5 10 15 20 25 30 35 40 45 . . . . Vertical Horizontal
  • 10. Computational Power Improvement No. of Processors C.P.I 1 2 . . . . Multiprocessor Uniprocessor
  • 11. Why Parallel Processing? Hardware improvements like pipelining, superscalar are not scaling well and require sophisticated compiler technology to exploit performance out of them. Techniques such as vector processing works well for certain kind of problems.
  • 12. Why Parallel Processing? Significant development in networking technology is paving a way for network-based cost-effective parallel computing. The parallel processing technology is mature and is being exploited commercially.
  • 14. Processing Elements Flynn proposed a classification of computer systems based on a number of instruction and data streams that can be processed simultaneously. They are: SISD (Single Instruction and Single Data) Conventional computers SIMD (Single Instruction and Multiple Data) Data parallel, vector computing machines MISD (Multiple Instruction and Single Data) Systolic arrays MIMD (Multiple Instruction and Multiple Data) General purpose machine
  • 15. SISD : A Conventional Computer Speed is limited by the rate at which computer can transfer information internally. Ex: PCs, Workstations Processor Data Input Data Output Instructions
  • 16. The MISD Architecture More of an intellectual exercise than a practical configuration. Few built, but commercially not available Data Input Stream Data Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction Stream C
  • 17. SIMD Architecture Ex: CRAY machine vector processing, Thinking machine cm* Intel MMX (multimedia support) C i <= A i * B i Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C
  • 18. Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C
  • 19. Shared Memory MIMD machine Comm: Source PE writes data to GM & destination retrieves it Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandibility. A memory component or any processor failure affects the whole system. Increase of processors leads to memory contention. Ex. : Silicon graphics supercomputers.... Global Memory System Processor A Processor B Processor C MEMORY BUS MEMORY BUS MEMORY BUS
  • 20. Distributed Memory MIMD Communication : IPC (Inter-Process Communication) via High Speed Network. Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD easily/ readily expandable Highly reliable (any CPU failure does not affect the whole system) Processor A Processor B Processor C IPC channel IPC channel MEMORY BUS MEMORY BUS MEMORY BUS Memory System A Memory System B Memory System C
  • 21. Types of Parallel Systems Tightly Couple Systems: Shared Memory Parallel Smallest extension to existing systems Program conversion is incremental Distributed Memory Parallel Completely new systems Programs must be reconstructed Loosely Coupled Systems: Clusters (now Clouds) Built using commodity systems Centralised management Grids Aggregation of distributed systems Decentralized management
  • 22. Laws of caution..... Speed of computation is proportional to the square root of system cost. i.e. Speed = Cost Speedup by a parallel computer increases as the logarithm of the number of processors. Speedup = log2(no. of processors) C S S P log 2 P
  • 23. Caution.... Very fast development in network computing and related area have blurred concept boundaries, causing lot of terminological confusion : concurrent computing, parallel computing, multiprocessing, supercomputing, massively parallel processing, cluster computing, distributed computing, Internet computing, grid computing, Cloud computing, etc. At the user level, even well-defined distinctions such as shared memory and distributed memory are disappearing due to new advances in technology. Good tools for parallel program development and debugging are yet to emerge.
  • 24. Caution.... There is no strict delimiters for contributors to the area of parallel processing: computer architecture, operating systems, high-level languages, algorithms, databases, computer networks, … All have a role to play.
  • 25. Operating Systems for High Performance Computing
  • 26. Operating Systems for PP MPP systems having thousands of processors requires OS radically different from current ones. Every CPU needs OS : to manage its resources to hide its details Traditional systems are heavy, complex and not suitable for MPP
  • 27. Operating System Models Frame work that unifies features, services and tasks performed Three approaches to building OS.... Monolithic OS Layered OS Microkernel based OS Client server OS Suitable for MPP systems Simplicity, flexibility and high performance are crucial for OS.
  • 28. Monolithic Operating System Application Programs Application Programs System Services Hardware Better application Performance Difficult to extend Ex: MS-DOS User Mode Kernel Mode
  • 29. Layered OS Easier to enhance Each layer of code access lower level interface Low-application performance Application Programs System Services User Mode Kernel Mode Memory & I/O Device Mgmt Hardware Process Schedule Application Programs Ex : UNIX
  • 30. Traditional OS OS Designer OS Hardware User Mode Kernel Mode Application Programs Application Programs
  • 31. New trend in OS design User Mode Kernel Mode Hardware Microkernel Servers Application Programs Application Programs
  • 32. Microkernel/Client Server OS (for MPP Systems) Tiny OS kernel providing basic primitive (process, memory, IPC) Traditional services becomes subsystems Monolithic Application Perf. Competence OS = Microkernel + User Subsystems Client Application Thread lib. File Server Network Server Display Server Microkernel Hardware Send Reply User Kernel
  • 33. Few Popular Microkernel Systems MACH, CMU PARAS, C-DAC Chorus QNX (Windows)
  • 34. Parallel Programs Consist of multiple active “processes” simultaneously solving a given problem. And the communication and synchronization between them (parallel processes) forms the core of parallel programming efforts.
  • 35. Parallel Programming Models Shared Memory Model DSM Threads/OpenMP (enabled for clusters) Java threads (HKU JESSICA, IBM cJVM) Message Passing Model PVM MPI Hybrid Model Mixing shared and distributed memory model Using OpenMP and MPI together Object and Service Oriented Models Wide area distributed computing technologies OO: CORBA, DCOM, etc. Services: Web Services-based service composition
  • 36. Summary/Conclusions Parallel processing has become a reality: E.g., SMPs are used as (Web) Servers extensively. Threads concept utilized everywhere. Clusters have emerged as popular data centers and processing engine: E.g., Google search engine. The emergence of commodity high-performance CPU, networks, and OSs have made parallel computing applicable to enterprise applications. E.g., Oracle {9i,10g} database on Clusters/Grids.

Editor's Notes