SlideShare a Scribd company logo
Introduction: 1-1
CS 3006 Parallel and Distributed Computer
Fall 2022
Week # 1 – Lecture # 1, 2, 3
22nd,23rd, 24th August 2022
23rd, 24th, 25th Muḥarram ul Haram, 1444
Dr. Nadeem Kafi Khan
Lecture # 1 - Topics
• Introduction
• Definition and Architecture block diagram
• Shared Memory Systems
• Distributed Memory Systems
CS 3006
• Key course for BS (CS)
• Most of the computation parallel and distributed and large-scale storage is now
distributed.
• Course Instructor:
• Dr. Nadeem Kafi Khan, Assistant Professor (CS)
• Office: Main Campus, CS Block. Ext. 131.
• Email: nadeem.kafi@nu.edu.pk (pls. send email from your @nu.edu.pk)
• Please pay attention to my emails and Google classroom posts.
• Course slides and other materials will be posted on Google Classroom.
• Participation in class and on Google Classroom
CS 3006
• Textbook
• Introductin to Parallel Computing 2nd Ed.
By Ananth Grama, Anshul Gupta, George Karypis,
Vipin Kumar
• Reference Materials
• Will be posted on Google classroom
CS 3006
• Contact Hours
• Lecture See timetable
• Consultancy Hours Will be posted later
• Interactions Google Classroom and/or Email
• Course Pre-requisites
• Programming, data structures and Operating Systems
• Computer Organization and Assembly Language
Week # 1.pdf
CS 3006
• Evaluation Criteria
• Assignments (14%) – Lab based (Last two will constitute Semester Project)
• Quiz (6%)
• Mid Term (15+15=30%)
• Final (50%)
• Active reading of textbook REQUIRED.
• Plagiarism will be marked as Zero.
• Late submissions are not allowed.
• Required Attendancee 80%
Week # 1.pdf
Week # 1.pdf
PDC topics Discussed in Lecture # 1
• Motivation for Parallel and Distributed Computing
• Why we need PDC? …Real world example(s).
• Parallel Computing paradigm
• Shared Memory architecture exploited by multi-threaded.
• Distributed Memory paradigm
• Distributed memory architecture exploited by multi-processes.
• The computational task submitted to a master process, which distributed work
(execution of code or processing of data) to other slave processes running on
different computers of the cluster. The slave processes will execute the task in
parallel and send results to master which is responsible to display results.
• Why a cluster of 60 computers is a distributed system?
BSC-5C
BSC-5E
Lecture # 2 - Topics
• Parallel Execution Terms and their definitions
• Scalability
PDC CLOs as per FAST-NU official document
Week # 1.pdf
Some General Parallel Terminology
• Task
• A logically discrete section of computational work. A task is typically a
program or program-like set of instructions that is executed by a processor.
• Parallel Task
• A task that can be executed by multiple processors safely (yields correct
results)
• Serial Execution
• Execution of a program sequentially, one statement at a time. In the simplest
sense, this is what happens on a one processor machine. However, virtually
all parallel tasks will have sections of a parallel program that must be
executed serially.
Symmetric vs. Asymmetric Multiprocessing Architecture
• Same type of processing elements vs Different type of processors element used in computations.
• Same type of Computation vs Different type of computations done of same processing elements.
• Parallel Execution
• Execution of a program by more than one task, with each task being able to
execute the same or different statement at the same moment in time.
• Shared Memory
• From a strictly hardware point of view, describes a computer architecture
where all processors have direct (usually bus based) access to common
physical memory. In a programming sense, it describes a model where parallel
tasks all have the same "picture" of memory and can directly address and
access the same logical memory locations regardless of where the physical
memory actually exists.
• Distributed Memory
• In hardware, refers to network based memory access for physical memory
that is not common. As a programming model, tasks can only logically "see"
local machine memory and must use communications to access memory on
other machines where other tasks are executing.
Some General Parallel Terminology
Shared Memory vs. Distributed Memory
a) Shared Memory b) Distributed Memory
This network interconnect is
either very high-speed
Ethernet switch ~10Gbit or
even higher Infiniband or
other switches
• Communications
• Parallel tasks typically need to exchange data. There are several ways this can
be accomplished, such as through a shared memory bus or over a network,
however the actual event of data exchange is commonly referred to as
communications regardless of the method employed.
• Synchronization
• The coordination of parallel tasks in real time, very often associated with
communications. Often implemented by establishing a synchronization point
within an application where a task may not proceed further until another
task(s) reaches the same or logically equivalent point.
• Synchronization usually involves waiting by at least one task, and can
therefore cause a parallel application's wall clock execution time to increase.
Some General Parallel Terminology
• Scalability
• Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the
addition of more processors. Factors that contribute to scalability
include:
• Hardware - particularly memory-cpu bandwidths and network
communications
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application and coding
Some General Parallel Terminology
BSC-5E
BSC-5E
BSC-5C
Lecture # 3 - Topics
• Overhead in Parallel and Distributed Computing
• Speed-up and Amdahl Law
• Flynn’s Taxonomy
• Granularity
• Parallel Overhead
• The amount of time required to coordinate parallel tasks, as opposed to doing
useful work. Parallel overhead can include factors such as:
• Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.
• Task termination time
• Massively Parallel
• Refers to the hardware that comprises a given parallel system - having many
processors. The meaning of many keeps increasing, but currently IBM Blue
Gene/L pushes this number to 6 digits.
Some General Parallel Terminology
• Observed Speedup
• Observed speedup of a code which has been parallelized, defined as:
wall-clock time of serial execution
wall-clock time of parallel execution
• One of the simplest and most widely used indicators for a parallel program's
performance.
Some General Parallel Terminology
Week # 1.pdf
Flynn’s Taxonomy
PU = Processing Unit
Single Instruction, Single Data (SISD)
• A serial (non-parallel) computer
• Single instruction: only one instruction stream is
being acted on by the CPU during any one clock
cycle
• Single data: only one data stream is being used as
input during any one clock cycle
• Deterministic execution
• This is the oldest and until recently, the most
prevalent form of computer
• Examples: most PCs, single CPU workstations and
mainframes
Single Instruction, Multiple Data (SIMD)
• A type of parallel computer
• Single instruction: All processing units execute the same instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a different data element
• This type of machine typically has an instruction dispatcher, a very high-bandwidth internal
network, and a very large array of very small-capacity instruction units.
• Best suited for specialized problems characterized by a high degree of regularity, such as image
processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
• Examples: Vectorization is a prime example of SIMD in
which the same instruction is performed across
multiple data. A variant of SIMD is single instruction,
multi-thread (SIMT), which is commonly used to
describe GPU workgroups.
Multiple Instruction, Single Data (MISD)
• A single data stream is fed into multiple processing units.
• Each processing unit operates on the data independently via independent
instruction streams. This is not a common architecture.
• Few actual examples of this class of parallel computer have ever existed.
One is the expérimental Carnegie-Mellon C.mmp computer (1971).
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded message.
• Redundant computation on the same data. This is used in
highly fault-tolerant approaches such as spacecraft
controllers. Because spacecraft are in high radiation
environments, these often run two copies of each
calculation and compare the output of the two.
Multiple Instruction, Multiple Data (MIMD)
• Currently, the most common type of parallel computer. Most modern
computers fall into this category.
• Multiple Instruction: every processor may be executing a different
instruction stream
• Multiple Data: every processor may be working with a different data
stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
• Examples: most current supercomputers, networked parallel computer
"grids" and multi-processor SMP computers - including some types of
PCs.
• The final category has parallelization in both
instructions and data and is referred to as MIMD.
This category describes multi-core parallel
architectures that comprise the majority of large
parallel systems.
Week # 1.pdf

More Related Content

PPTX
Parallel & Distributed processing
Syed Zaid Irshad
 
PPT
Lecture 2
Mr SMAK
 
PDF
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
ssuser034ce1
 
PPTX
Parallel architecture-programming
Shaveta Banda
 
PPTX
Parallel architecture &programming
Ismail El Gayar
 
PPTX
unit 4.pptx
SUBHAMSHARANRA211100
 
PPTX
unit 4.pptx
SUBHAMSHARANRA211100
 
PPTX
High performance computing
punjab engineering college, chandigarh
 
Parallel & Distributed processing
Syed Zaid Irshad
 
Lecture 2
Mr SMAK
 
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
ssuser034ce1
 
Parallel architecture-programming
Shaveta Banda
 
Parallel architecture &programming
Ismail El Gayar
 
High performance computing
punjab engineering college, chandigarh
 

Similar to Week # 1.pdf (20)

PPT
Lecture1
tt_aljobory
 
PPTX
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
PDF
Cloud Computing notes ccomputing paradigms UNIT 1.pdf
nawaz65
 
PPTX
CC unit 1.pptx
DivyaRadharapu1
 
PPTX
Parallel processing
Praveen Kumar
 
DOC
Aca module 1
Avinash_N Rao
 
PPTX
CSA unit5.pptx
AbcvDef
 
PPTX
Module 3 - DBMS System Architecture Principles
KEERTHANAR250835
 
PPTX
Thread
Syed Zaid Irshad
 
PPTX
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
aravym456
 
PPT
01-MessagePassingFundamentals.ppt
HarshitPal37
 
PPT
EMBEDDED OS
AJAL A J
 
PDF
aca pdf
anandmahto1820
 
PDF
Lecture 2 more about parallel computing
Vajira Thambawita
 
PPTX
2Chapter Two- Process Management(2) (1).pptx
jamsibro140
 
PPTX
Parallel computing
Engr Zardari Saddam
 
PDF
Unit 5 Advanced Computer Architecture
Balaji Vignesh
 
PPTX
Parallel Computing-Part-1.pptx
krnaween
 
PDF
CS471- Parallel Processing-Lecture 0-Introduction&Plan.pdf
abdullahgamed75
 
PPTX
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Sachintha Gunasena
 
Lecture1
tt_aljobory
 
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
Cloud Computing notes ccomputing paradigms UNIT 1.pdf
nawaz65
 
CC unit 1.pptx
DivyaRadharapu1
 
Parallel processing
Praveen Kumar
 
Aca module 1
Avinash_N Rao
 
CSA unit5.pptx
AbcvDef
 
Module 3 - DBMS System Architecture Principles
KEERTHANAR250835
 
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
aravym456
 
01-MessagePassingFundamentals.ppt
HarshitPal37
 
EMBEDDED OS
AJAL A J
 
Lecture 2 more about parallel computing
Vajira Thambawita
 
2Chapter Two- Process Management(2) (1).pptx
jamsibro140
 
Parallel computing
Engr Zardari Saddam
 
Unit 5 Advanced Computer Architecture
Balaji Vignesh
 
Parallel Computing-Part-1.pptx
krnaween
 
CS471- Parallel Processing-Lecture 0-Introduction&Plan.pdf
abdullahgamed75
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Sachintha Gunasena
 
Ad

Recently uploaded (20)

DOCX
This is a security test for Inspectiv test
Firsts Lasts
 
PDF
Portfolio_Hu Jiayue .pdf
ahujiayue
 
PDF
FEDEX Crystal Plaques at Clazz Trophy Malaysia | #1 Reliable Trophy Supplier ...
Clazz Trophy Malaysia
 
PDF
Trapped Movie trailer (New Media Tools Presentation)
marebecams
 
PPTX
2 Human Resource Planning.pptxkkkkkkkkkkkkkkkkkkkk
khushigulati2325
 
PDF
iTop VPN Crack New Version Latest 2025?
itskinga12
 
PPTX
Welcome chiku new07.pptxrfgretgregbdffgfdgfd
xoviva2925
 
PPTX
• Hinduism is not founded by any particular prophet. Buddhism was founded by ...
BeshoyGirgis2
 
PPTX
Roadmap Infographics by Slidesgo.pptx qwerty
canigethellomoto
 
PDF
ARTIFICIAL intelligence ............,....
kasimnagori121
 
PPSX
Eagle in Paintings .ppsx
guimera
 
DOCX
Nutrition about ICT AND TLE.09111001991981
CharmicahOmayan3
 
PDF
Zero no Tsukaima 1 - Zero_s Familiar.pdf
WaldeckFlugelWallens
 
PDF
Zero no Tsukaima 2 - Albion of the Wind.pdf
WaldeckFlugelWallens
 
PPT
preposition-powerpoint-aus-ver1_ver_4.ppt
NhungNgLHoi
 
PDF
Presentación San Patricio Ilustrativo Verde (1).pdf
andressuarezaraya912
 
PPTX
Eco-Friendly Living.ppt BIRLA OPEN MIND INTERNATIONAL SCHOOL
aartidevi7613
 
PPTX
Q1_Music and Arts_Week 3-4 [Autosaved].pptx
MelissaJeanBayobay1
 
PDF
Two Worlds, One Vision: Joseph Kim’s Transformation Story
Joseph Kim Nolensville Tennessee
 
PPTX
MtA文凭办理|办理蒙特埃里森大学毕业证成绩单复刻学分无法毕业怎么办?
ekwp9g1k
 
This is a security test for Inspectiv test
Firsts Lasts
 
Portfolio_Hu Jiayue .pdf
ahujiayue
 
FEDEX Crystal Plaques at Clazz Trophy Malaysia | #1 Reliable Trophy Supplier ...
Clazz Trophy Malaysia
 
Trapped Movie trailer (New Media Tools Presentation)
marebecams
 
2 Human Resource Planning.pptxkkkkkkkkkkkkkkkkkkkk
khushigulati2325
 
iTop VPN Crack New Version Latest 2025?
itskinga12
 
Welcome chiku new07.pptxrfgretgregbdffgfdgfd
xoviva2925
 
• Hinduism is not founded by any particular prophet. Buddhism was founded by ...
BeshoyGirgis2
 
Roadmap Infographics by Slidesgo.pptx qwerty
canigethellomoto
 
ARTIFICIAL intelligence ............,....
kasimnagori121
 
Eagle in Paintings .ppsx
guimera
 
Nutrition about ICT AND TLE.09111001991981
CharmicahOmayan3
 
Zero no Tsukaima 1 - Zero_s Familiar.pdf
WaldeckFlugelWallens
 
Zero no Tsukaima 2 - Albion of the Wind.pdf
WaldeckFlugelWallens
 
preposition-powerpoint-aus-ver1_ver_4.ppt
NhungNgLHoi
 
Presentación San Patricio Ilustrativo Verde (1).pdf
andressuarezaraya912
 
Eco-Friendly Living.ppt BIRLA OPEN MIND INTERNATIONAL SCHOOL
aartidevi7613
 
Q1_Music and Arts_Week 3-4 [Autosaved].pptx
MelissaJeanBayobay1
 
Two Worlds, One Vision: Joseph Kim’s Transformation Story
Joseph Kim Nolensville Tennessee
 
MtA文凭办理|办理蒙特埃里森大学毕业证成绩单复刻学分无法毕业怎么办?
ekwp9g1k
 
Ad

Week # 1.pdf

  • 1. Introduction: 1-1 CS 3006 Parallel and Distributed Computer Fall 2022 Week # 1 – Lecture # 1, 2, 3 22nd,23rd, 24th August 2022 23rd, 24th, 25th Muḥarram ul Haram, 1444 Dr. Nadeem Kafi Khan
  • 2. Lecture # 1 - Topics • Introduction • Definition and Architecture block diagram • Shared Memory Systems • Distributed Memory Systems
  • 3. CS 3006 • Key course for BS (CS) • Most of the computation parallel and distributed and large-scale storage is now distributed. • Course Instructor: • Dr. Nadeem Kafi Khan, Assistant Professor (CS) • Office: Main Campus, CS Block. Ext. 131. • Email: [email protected] (pls. send email from your @nu.edu.pk) • Please pay attention to my emails and Google classroom posts. • Course slides and other materials will be posted on Google Classroom. • Participation in class and on Google Classroom
  • 4. CS 3006 • Textbook • Introductin to Parallel Computing 2nd Ed. By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar • Reference Materials • Will be posted on Google classroom
  • 5. CS 3006 • Contact Hours • Lecture See timetable • Consultancy Hours Will be posted later • Interactions Google Classroom and/or Email • Course Pre-requisites • Programming, data structures and Operating Systems • Computer Organization and Assembly Language
  • 7. CS 3006 • Evaluation Criteria • Assignments (14%) – Lab based (Last two will constitute Semester Project) • Quiz (6%) • Mid Term (15+15=30%) • Final (50%) • Active reading of textbook REQUIRED. • Plagiarism will be marked as Zero. • Late submissions are not allowed. • Required Attendancee 80%
  • 10. PDC topics Discussed in Lecture # 1 • Motivation for Parallel and Distributed Computing • Why we need PDC? …Real world example(s). • Parallel Computing paradigm • Shared Memory architecture exploited by multi-threaded. • Distributed Memory paradigm • Distributed memory architecture exploited by multi-processes. • The computational task submitted to a master process, which distributed work (execution of code or processing of data) to other slave processes running on different computers of the cluster. The slave processes will execute the task in parallel and send results to master which is responsible to display results. • Why a cluster of 60 computers is a distributed system?
  • 13. Lecture # 2 - Topics • Parallel Execution Terms and their definitions • Scalability
  • 14. PDC CLOs as per FAST-NU official document
  • 16. Some General Parallel Terminology • Task • A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. • Parallel Task • A task that can be executed by multiple processors safely (yields correct results) • Serial Execution • Execution of a program sequentially, one statement at a time. In the simplest sense, this is what happens on a one processor machine. However, virtually all parallel tasks will have sections of a parallel program that must be executed serially.
  • 17. Symmetric vs. Asymmetric Multiprocessing Architecture • Same type of processing elements vs Different type of processors element used in computations. • Same type of Computation vs Different type of computations done of same processing elements.
  • 18. • Parallel Execution • Execution of a program by more than one task, with each task being able to execute the same or different statement at the same moment in time. • Shared Memory • From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists. • Distributed Memory • In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing. Some General Parallel Terminology
  • 19. Shared Memory vs. Distributed Memory a) Shared Memory b) Distributed Memory This network interconnect is either very high-speed Ethernet switch ~10Gbit or even higher Infiniband or other switches
  • 20. • Communications • Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as through a shared memory bus or over a network, however the actual event of data exchange is commonly referred to as communications regardless of the method employed. • Synchronization • The coordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. • Synchronization usually involves waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase. Some General Parallel Terminology
  • 21. • Scalability • Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more processors. Factors that contribute to scalability include: • Hardware - particularly memory-cpu bandwidths and network communications • Application algorithm • Parallel overhead related • Characteristics of your specific application and coding Some General Parallel Terminology
  • 25. Lecture # 3 - Topics • Overhead in Parallel and Distributed Computing • Speed-up and Amdahl Law • Flynn’s Taxonomy • Granularity
  • 26. • Parallel Overhead • The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel overhead can include factors such as: • Task start-up time • Synchronizations • Data communications • Software overhead imposed by parallel compilers, libraries, tools, operating system, etc. • Task termination time • Massively Parallel • Refers to the hardware that comprises a given parallel system - having many processors. The meaning of many keeps increasing, but currently IBM Blue Gene/L pushes this number to 6 digits. Some General Parallel Terminology
  • 27. • Observed Speedup • Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution wall-clock time of parallel execution • One of the simplest and most widely used indicators for a parallel program's performance. Some General Parallel Terminology
  • 29. Flynn’s Taxonomy PU = Processing Unit
  • 30. Single Instruction, Single Data (SISD) • A serial (non-parallel) computer • Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle • Single data: only one data stream is being used as input during any one clock cycle • Deterministic execution • This is the oldest and until recently, the most prevalent form of computer • Examples: most PCs, single CPU workstations and mainframes
  • 31. Single Instruction, Multiple Data (SIMD) • A type of parallel computer • Single instruction: All processing units execute the same instruction at any given clock cycle • Multiple data: Each processing unit can operate on a different data element • This type of machine typically has an instruction dispatcher, a very high-bandwidth internal network, and a very large array of very small-capacity instruction units. • Best suited for specialized problems characterized by a high degree of regularity, such as image processing. • Synchronous (lockstep) and deterministic execution • Two varieties: Processor Arrays and Vector Pipelines • Examples: Vectorization is a prime example of SIMD in which the same instruction is performed across multiple data. A variant of SIMD is single instruction, multi-thread (SIMT), which is commonly used to describe GPU workgroups.
  • 32. Multiple Instruction, Single Data (MISD) • A single data stream is fed into multiple processing units. • Each processing unit operates on the data independently via independent instruction streams. This is not a common architecture. • Few actual examples of this class of parallel computer have ever existed. One is the expérimental Carnegie-Mellon C.mmp computer (1971). • Some conceivable uses might be: • multiple frequency filters operating on a single signal stream • multiple cryptography algorithms attempting to crack a single coded message. • Redundant computation on the same data. This is used in highly fault-tolerant approaches such as spacecraft controllers. Because spacecraft are in high radiation environments, these often run two copies of each calculation and compare the output of the two.
  • 33. Multiple Instruction, Multiple Data (MIMD) • Currently, the most common type of parallel computer. Most modern computers fall into this category. • Multiple Instruction: every processor may be executing a different instruction stream • Multiple Data: every processor may be working with a different data stream • Execution can be synchronous or asynchronous, deterministic or non- deterministic • Examples: most current supercomputers, networked parallel computer "grids" and multi-processor SMP computers - including some types of PCs. • The final category has parallelization in both instructions and data and is referred to as MIMD. This category describes multi-core parallel architectures that comprise the majority of large parallel systems.