SlideShare a Scribd company logo
CSC 203 1.5
Computer System Architecture
Budditha Hettige
Department of Statistics and Computer Science
University of Sri Jayewardenepura
Performance of ComputersPerformance of Computers
Budditha Hettige 2
Improving Performance of ComputersImproving Performance of Computers
• Increasing clock speed
– Physical limitation (Need new hardware)
• Parallelism (Doing more things at once)
– Instruction-level parallelism
• Getting more instruction per second
– Processor-level parallelism
• Having multiple CPUs working on the same problem
Budditha Hettige 3
Instruction-level parallelismInstruction-level parallelism
• Pipelining
– Instruction execution speed is affected by time taken
to fetch instruction from memory
– Early Computers fetch instructions in advance and
stored in registers (Prefetch buffer)
• Prefetching divides instruction execution into two parts
– Fetching
– Actual execution
– Pipelining divides instruction in to many parts; each
handled by different hardware and can run in parallel
Budditha Hettige 4
Pipelining examplePipelining example
• Packaging cakes
– W1: Place an empty box on the belt every 10 second
– W2: Place the cake in the empty box
– W3: Close and seal the box
– W4: Label the box
– W5: Remove the box and place it in the large container
Budditha Hettige 5
Computer PipelinesComputer Pipelines
• S1: Fetch instruction from memory and place it in a buffer
until it is needed
• S2: Decode the instruction; determine it type and operands it
needs
• S3: locate the fetch operands from memory (or registers)
• S4: Execute instruction
• S5: Write back result in a register
Budditha Hettige 6
ExampleExample
T - Cycle time
N - Number of stages in the pipeline
Latency:
Time taken to execute an instruction = N x T
Processor Bandwidth:
No. of MIPS the CPU has = 1000 MIPS
T
Budditha Hettige 7
Processor - pipeline depthProcessor - pipeline depth
Budditha Hettige 8
Dual pipelinesDual pipelines
• Instruction fetch unit fetches a pair of instructions and puts
each one into own pipeline
• Pentium has two five-stage pipelines
– U pipeline (main) executes an arbitrary Pentium instructions
– V pipeline (second) executes inter instructions, one simple
floating point instruction
• If instructions in a pair conflict, instruction in u pipeline is
executed. Other instruction is held and is paired with next
instruction
Budditha Hettige 9
Superscalar architectureSuperscalar architecture
• Single pipeline with multiple functional units
Budditha Hettige 10
Processor level parallelismProcessor level parallelism
• High bus traffic
• Low bus traffic
Budditha Hettige 11
Measuring PerformanceMeasuring Performance
Budditha Hettige 12
Moore’s lawMoore’s law
• Describes a long-term trend in the history of
computing hardware
• Defined by Dr. Gordon Moore during the
sixties.
• Predicts an exponential increase in component
density over time, with a doubling time of 18
months.
• Applicable to microprocessors, DRAMs ,
DSPs and other microelectronics.
Budditha Hettige 13
Budditha Hettige 14
Moore's Law and PerformanceMoore's Law and Performance
• The performance of computers is determined
by architecture and clock speed.
• Clock speed doubles over a 3 year period due
to the scaling laws on chip.
• Processors using identical or similar
architectures gain performance directly as a
function of Moore's Law.
• Improvements in internal architecture can
yield better gains than predicted by Moore's
Law.
Budditha Hettige 15
Budditha Hettige 16
Measuring PerformanceMeasuring Performance
• Execution time:
– Time between start and completion of a task
(including disk accesses, memory accesses )
• Throughput:
– Total amount of work dome a given time
Budditha Hettige 17
Performance of a ComputerPerformance of a Computer
Two Computer X and Y;
Performance of (X) > Performance of (Y)
Execution Time (Y) > Execution Time (X)
Budditha Hettige 18
Performance of difference 2 ComputerPerformance of difference 2 Computer
X is n Time faster than Y
Budditha Hettige 19
CPU TimeCPU Time
• Time CPU spends on a task
• User CPU time
– CPU time spent in the program
• System CPU time
– CPU time spent in OS performing tasks on behalf
of the program
Budditha Hettige 20
CPU Time (Example)CPU Time (Example)
• User CPU time = 90.7s
• System CPU time 12.9s
• Execution time 2m 39 s 159s
• % of CPU time =
User CPU Time + System CPU Time
X 100 %
Execution time
Budditha Hettige 21
CPU TimeCPU Time
% CPU time = (90.7 + 12.9 ) x 100
159
= 65 %
Budditha Hettige 22
Clock RateClock Rate
• Computer clock runs at the constant rate and
determines when events take place in the
hardware
Clock Rate = 1
Clock Cycle
Budditha Hettige 23
Amdahl’s lawAmdahl’s law
• Performance improvement that can be gained
from some faster mode of execution is limited
by fraction of the time the faster mode can be
used
Budditha Hettige 24
Amdahl’s lawAmdahl’s law
• Speedup depends on
– Fraction of computation time in original machine
that can be converted to take advantage of the
enhancement
(Fraction Enhanced)
– Improvement gains by enhanced execution mode
(Speedup Enhanced)
Budditha Hettige 25
ExampleExample
Total execution time of a Program = 50 s
Execution time that can be enhanced = 30 s
FractionEnhanced = 30 /50
= 0.6
Budditha Hettige 26
SpeedupSpeedup
Budditha Hettige 27
ExampleExample
Normal mode execution time for some portion of
a program = 6s
Enhances mode execution time for the same
program = 2s
Speedup Enhanced = 6/2
= 3
Budditha Hettige 28
Execution TimeExecution Time
Budditha Hettige 29
ExampleExample
• Suppose we consider an enhancement to the processor of a
server system used for Web serving. New CPU is 10 times
faster on computation in Web application than original CPU.
Assume original CPU is busy with computation 40% of the
time and is waiting for I/O 60% of time.
What is the overall speedup gained from
enhancement?
Budditha Hettige 30
AnswerAnswer
Budditha Hettige 31
RemarkRemark
• If an enhancement is only usable for fraction
of a task, we cannot speedup by more than
Budditha Hettige 32
ExampleExample
• A common transformation required in graphics
engines is square root. Implementation of floating-
point (FP) square root vary significantly in
performance, especially among processors designed
graphics
• Suppose FP square root (FPSQR) is responsible for
20% of execution tine of a critical graphics program
• Design alternative
1. Enhance EPSQR hardware and speed up this operation by
a factor of 10
2. Make all FP instruction run faster by a factor of 1.6
Budditha Hettige 33
ExampleExample
• FP instruction are responsible for a total of
50% of execution time. Design team believes
they can make all fp instruction run 1.6 times
faster with same effort as required for fast
square root.
Compare these two design alternatives
Budditha Hettige 34
Budditha Hettige 35
CPU performance equationCPU performance equation
CPU time = CPU clock cycles for a program x Clock cycle time
= CPU clock cycles / Clock rate
Budditha Hettige 36
ExampleExample
A program runs in 10s on computer A having
400 MHz clock. A new machine B, which
could run the same program in 6s, has to be
designed. Further, B should have 1.2 times as
many clock cycles as A.
What should be the clock rate of B?
Budditha Hettige 37
AnswerAnswer
Budditha Hettige 38
CPU Clock CyclesCPU Clock Cycles
CPI (clock cycles per instruction)
average no. of clock cycles each instruction takes to
execute
IC (instruction count)
no. of instructions executed in the program
CPU clock cycles = CPI x IC
Note: CPI can be used to compare two different
implementations of the same instruction set architecture
(as IC required for a program is same)
Budditha Hettige 39
ExampleExample
• Consider two implementations of same instruction set
architecture. For a certain program, details of time
measurements of two machines are given below
• Which machine is faster for this program and by how
much?
Budditha Hettige 40
AnswerAnswer
Budditha Hettige 41
Measuring componentsMeasuring components
of CPU performance equationof CPU performance equation
• CPU Time: by running the program
• Clock Cycle Time: published in documentation
• IC: by a software tools/simulator of the architecture
((more difficult to obtain)
• CPI: by simulation of an implementation (more
difficult to obtain)
Budditha Hettige 42
CPU clock cyclesCPU clock cycles
Suppose n different types of instruction
Let
ICi – No. of times instruction i is executed in a program
CPIi – Avg. no. of clock cycles for instruction i
Budditha Hettige 43
ExampleExample
Suppose we have made the following measurements:
– Frequency of FP operations (other than FPSQR) = 25%
– Average CPI of FP operations = 4.0
– Average CPI of other instructions = 1.33
– Frequency of FPSQR= 2%
– CPI of FPSQR = 20
Design alternatives:
1. decrease CPI of FPSQR to 2
2. decrease average CPI of all FP operation to 2.5
Compare these two design alternatives using CPU performance
equation
Budditha Hettige 44
AnswersAnswers
• Note that only CPI changes; clock rate; IC remain identical
Budditha Hettige 45
MIPS as a performance measureMIPS as a performance measure
Budditha Hettige 46
ProblemsProblems
MIPS as a performance measure
• MIPS is dependant on instruction set
– difficult to compare MIPS of computers with
different instruction sets
• MIPS can vary inversely to performance
Budditha Hettige 47
MFLOPS as a performance measureMFLOPS as a performance measure
Budditha Hettige 48
ProblemsProblems
MIPS as a performance measure
• MFLOPS is not dependable
– Cray C90 has no divide instructions while Pentium
has
• MFLOPS depends on the mixture of fast and
slow floating point operations
– add (fast) and divide (slow) operations
Budditha Hettige 49

More Related Content

What's hot (20)

DOCX
GPA calculator and grading program in c++
Taimur Muhammad
 
PDF
Ludo game using c++ with documentation
Mauryasuraj98
 
PPTX
cc ppt
Anurag Pandey
 
PPT
Stack Data Structure & It's Application
Tech_MX
 
DOCX
Pacman game computer investigatory project
meenaloshiniG
 
PPTX
Bcd
Talha Fazal
 
PPTX
Call by value or call by reference in C++
Sachin Yadav
 
PDF
PROGRAM BOOK FOR COMMUNITY SERVICE PROJECT (3) (1) (1)_removed (1).pdf
GANDLAGOWTHAMI
 
PPT
Basics of Engineering Drawing and Graphics
Jeyapoovan Thangasamy
 
PPT
2s complement arithmetic
Sanjay Saluth
 
PPTX
Python basics
RANAALIMAJEEDRAJPUT
 
PPTX
BRESENHAM’S LINE DRAWING ALGORITHM
St Mary's College,Thrissur,Kerala
 
PPTX
scientific calculator using c
Anuj Kumar
 
PPSX
8085 Interfacing with I/O Devices or Memory
Saumay Paul
 
PPT
Development of surfaces of solids -ENGINEERING DRAWING - RGPV,BHOPAL
Abhishek Kandare
 
PPTX
Flowshop scheduling
Kunal Goswami
 
PPT
Finite automata(For college Seminars)
Naman Joshi
 
PPT
pipelining
Siddique Ibrahim
 
PDF
report on internshala python training
surabhimalviya1
 
PDF
Python functions
Prof. Dr. K. Adisesha
 
GPA calculator and grading program in c++
Taimur Muhammad
 
Ludo game using c++ with documentation
Mauryasuraj98
 
Stack Data Structure & It's Application
Tech_MX
 
Pacman game computer investigatory project
meenaloshiniG
 
Call by value or call by reference in C++
Sachin Yadav
 
PROGRAM BOOK FOR COMMUNITY SERVICE PROJECT (3) (1) (1)_removed (1).pdf
GANDLAGOWTHAMI
 
Basics of Engineering Drawing and Graphics
Jeyapoovan Thangasamy
 
2s complement arithmetic
Sanjay Saluth
 
Python basics
RANAALIMAJEEDRAJPUT
 
BRESENHAM’S LINE DRAWING ALGORITHM
St Mary's College,Thrissur,Kerala
 
scientific calculator using c
Anuj Kumar
 
8085 Interfacing with I/O Devices or Memory
Saumay Paul
 
Development of surfaces of solids -ENGINEERING DRAWING - RGPV,BHOPAL
Abhishek Kandare
 
Flowshop scheduling
Kunal Goswami
 
Finite automata(For college Seminars)
Naman Joshi
 
pipelining
Siddique Ibrahim
 
report on internshala python training
surabhimalviya1
 
Python functions
Prof. Dr. K. Adisesha
 

Similar to Computer System Architecture Lecture Note 6: hardware performance (20)

PDF
Computer architecture short note (version 8)
Nimmi Weeraddana
 
PPT
Computer Organization Design ch2Slides.ppt
rajesshs31r
 
PPT
COMPUTER ARCHITECTURE BASIC CONCEPT
Azizul Mamun
 
PPT
2 CPU Performance (1) by computer organization
wewiv47743
 
PPTX
Book for general presentation for computer science
festustobi18
 
PPT
L-2 (Computer Performance).ppt
ImranKhan997082
 
PPTX
Evaluation of computer performance
Prasenjit Dey
 
PPTX
L07_performance and cost in advanced hardware- computer architecture.pptx
Isaac383415
 
PDF
Computer Architecture Performance and Energy
Jason J Pulikkottil
 
PPT
Chapter 1 computer abstractions and technology
BATMUNHMUNHZAYA
 
PPTX
2. Module_1_Computer Performance, Metrics, Measurement, & Evaluation (1).pptx
americanguncontrolre
 
PPS
Measuringperformance 090527015748-phpapp01
manishajadhav13j
 
PPT
Evaluation of morden computer & system attributes in ACA
Pankaj Kumar Jain
 
PPT
1571 mean
Dr Fereidoun Dejahang
 
PPT
lect1.ppt of a lot of things like computer
btlimhzjanolufattx
 
PDF
Computer performance
Amit Kumar Rathi
 
PDF
02 performance
marangburu42
 
PDF
Parallel Computing - Lec 6
Shah Zaib
 
Computer architecture short note (version 8)
Nimmi Weeraddana
 
Computer Organization Design ch2Slides.ppt
rajesshs31r
 
COMPUTER ARCHITECTURE BASIC CONCEPT
Azizul Mamun
 
2 CPU Performance (1) by computer organization
wewiv47743
 
Book for general presentation for computer science
festustobi18
 
L-2 (Computer Performance).ppt
ImranKhan997082
 
Evaluation of computer performance
Prasenjit Dey
 
L07_performance and cost in advanced hardware- computer architecture.pptx
Isaac383415
 
Computer Architecture Performance and Energy
Jason J Pulikkottil
 
Chapter 1 computer abstractions and technology
BATMUNHMUNHZAYA
 
2. Module_1_Computer Performance, Metrics, Measurement, & Evaluation (1).pptx
americanguncontrolre
 
Measuringperformance 090527015748-phpapp01
manishajadhav13j
 
Evaluation of morden computer & system attributes in ACA
Pankaj Kumar Jain
 
lect1.ppt of a lot of things like computer
btlimhzjanolufattx
 
Computer performance
Amit Kumar Rathi
 
02 performance
marangburu42
 
Parallel Computing - Lec 6
Shah Zaib
 
Ad

More from Budditha Hettige (20)

PDF
Algorithm analysis
Budditha Hettige
 
PDF
Sorting
Budditha Hettige
 
PDF
Link List
Budditha Hettige
 
PDF
02 Stack
Budditha Hettige
 
PDF
Data Structures 01
Budditha Hettige
 
PDF
Drawing Fonts
Budditha Hettige
 
PDF
Texture Mapping
Budditha Hettige
 
PDF
Lighting
Budditha Hettige
 
PDF
Viewing
Budditha Hettige
 
PDF
OpenGL 3D Drawing
Budditha Hettige
 
PDF
2D Drawing
Budditha Hettige
 
PDF
Graphics Programming OpenGL & GLUT in Code::Blocks
Budditha Hettige
 
PDF
Introduction to Computer Graphics
Budditha Hettige
 
PPTX
Computer System Architecture Lecture Note 9 IO fundamentals
Budditha Hettige
 
PPTX
Computer System Architecture Lecture Note 8.1 primary Memory
Budditha Hettige
 
PPTX
Computer System Architecture Lecture Note 8.2 Cache Memory
Budditha Hettige
 
PPTX
Computer System Architecture Lecture Note 7 addressing
Budditha Hettige
 
PPT
Computer System Architecture Lecture Note 5: microprocessor technology
Budditha Hettige
 
PPT
Computer System Architecture Lecture Note 3: computer architecture
Budditha Hettige
 
Algorithm analysis
Budditha Hettige
 
Link List
Budditha Hettige
 
Data Structures 01
Budditha Hettige
 
Drawing Fonts
Budditha Hettige
 
Texture Mapping
Budditha Hettige
 
OpenGL 3D Drawing
Budditha Hettige
 
2D Drawing
Budditha Hettige
 
Graphics Programming OpenGL & GLUT in Code::Blocks
Budditha Hettige
 
Introduction to Computer Graphics
Budditha Hettige
 
Computer System Architecture Lecture Note 9 IO fundamentals
Budditha Hettige
 
Computer System Architecture Lecture Note 8.1 primary Memory
Budditha Hettige
 
Computer System Architecture Lecture Note 8.2 Cache Memory
Budditha Hettige
 
Computer System Architecture Lecture Note 7 addressing
Budditha Hettige
 
Computer System Architecture Lecture Note 5: microprocessor technology
Budditha Hettige
 
Computer System Architecture Lecture Note 3: computer architecture
Budditha Hettige
 
Ad

Recently uploaded (20)

PPTX
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
PDF
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
PPTX
PPT on the Development of Education in the Victorian England
Beena E S
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPTX
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
LEGAL ASPECTS OF PSYCHIATRUC NURSING.pptx
PoojaSen20
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
PPTX
How to Manage Promotions in Odoo 18 Sales
Celine George
 
PPTX
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPTX
Presentation: Climate Citizenship Digital Education
Karl Donert
 
PDF
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
PDF
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
PDF
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
PPTX
How to Define Translation to Custom Module And Add a new language in Odoo 18
Celine George
 
PPTX
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
PPTX
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
PPT on the Development of Education in the Victorian England
Beena E S
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
LEGAL ASPECTS OF PSYCHIATRUC NURSING.pptx
PoojaSen20
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
How to Manage Promotions in Odoo 18 Sales
Celine George
 
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
Presentation: Climate Citizenship Digital Education
Karl Donert
 
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
How to Define Translation to Custom Module And Add a new language in Odoo 18
Celine George
 
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 

Computer System Architecture Lecture Note 6: hardware performance

  • 1. CSC 203 1.5 Computer System Architecture Budditha Hettige Department of Statistics and Computer Science University of Sri Jayewardenepura
  • 2. Performance of ComputersPerformance of Computers Budditha Hettige 2
  • 3. Improving Performance of ComputersImproving Performance of Computers • Increasing clock speed – Physical limitation (Need new hardware) • Parallelism (Doing more things at once) – Instruction-level parallelism • Getting more instruction per second – Processor-level parallelism • Having multiple CPUs working on the same problem Budditha Hettige 3
  • 4. Instruction-level parallelismInstruction-level parallelism • Pipelining – Instruction execution speed is affected by time taken to fetch instruction from memory – Early Computers fetch instructions in advance and stored in registers (Prefetch buffer) • Prefetching divides instruction execution into two parts – Fetching – Actual execution – Pipelining divides instruction in to many parts; each handled by different hardware and can run in parallel Budditha Hettige 4
  • 5. Pipelining examplePipelining example • Packaging cakes – W1: Place an empty box on the belt every 10 second – W2: Place the cake in the empty box – W3: Close and seal the box – W4: Label the box – W5: Remove the box and place it in the large container Budditha Hettige 5
  • 6. Computer PipelinesComputer Pipelines • S1: Fetch instruction from memory and place it in a buffer until it is needed • S2: Decode the instruction; determine it type and operands it needs • S3: locate the fetch operands from memory (or registers) • S4: Execute instruction • S5: Write back result in a register Budditha Hettige 6
  • 7. ExampleExample T - Cycle time N - Number of stages in the pipeline Latency: Time taken to execute an instruction = N x T Processor Bandwidth: No. of MIPS the CPU has = 1000 MIPS T Budditha Hettige 7
  • 8. Processor - pipeline depthProcessor - pipeline depth Budditha Hettige 8
  • 9. Dual pipelinesDual pipelines • Instruction fetch unit fetches a pair of instructions and puts each one into own pipeline • Pentium has two five-stage pipelines – U pipeline (main) executes an arbitrary Pentium instructions – V pipeline (second) executes inter instructions, one simple floating point instruction • If instructions in a pair conflict, instruction in u pipeline is executed. Other instruction is held and is paired with next instruction Budditha Hettige 9
  • 10. Superscalar architectureSuperscalar architecture • Single pipeline with multiple functional units Budditha Hettige 10
  • 11. Processor level parallelismProcessor level parallelism • High bus traffic • Low bus traffic Budditha Hettige 11
  • 13. Moore’s lawMoore’s law • Describes a long-term trend in the history of computing hardware • Defined by Dr. Gordon Moore during the sixties. • Predicts an exponential increase in component density over time, with a doubling time of 18 months. • Applicable to microprocessors, DRAMs , DSPs and other microelectronics. Budditha Hettige 13
  • 15. Moore's Law and PerformanceMoore's Law and Performance • The performance of computers is determined by architecture and clock speed. • Clock speed doubles over a 3 year period due to the scaling laws on chip. • Processors using identical or similar architectures gain performance directly as a function of Moore's Law. • Improvements in internal architecture can yield better gains than predicted by Moore's Law. Budditha Hettige 15
  • 17. Measuring PerformanceMeasuring Performance • Execution time: – Time between start and completion of a task (including disk accesses, memory accesses ) • Throughput: – Total amount of work dome a given time Budditha Hettige 17
  • 18. Performance of a ComputerPerformance of a Computer Two Computer X and Y; Performance of (X) > Performance of (Y) Execution Time (Y) > Execution Time (X) Budditha Hettige 18
  • 19. Performance of difference 2 ComputerPerformance of difference 2 Computer X is n Time faster than Y Budditha Hettige 19
  • 20. CPU TimeCPU Time • Time CPU spends on a task • User CPU time – CPU time spent in the program • System CPU time – CPU time spent in OS performing tasks on behalf of the program Budditha Hettige 20
  • 21. CPU Time (Example)CPU Time (Example) • User CPU time = 90.7s • System CPU time 12.9s • Execution time 2m 39 s 159s • % of CPU time = User CPU Time + System CPU Time X 100 % Execution time Budditha Hettige 21
  • 22. CPU TimeCPU Time % CPU time = (90.7 + 12.9 ) x 100 159 = 65 % Budditha Hettige 22
  • 23. Clock RateClock Rate • Computer clock runs at the constant rate and determines when events take place in the hardware Clock Rate = 1 Clock Cycle Budditha Hettige 23
  • 24. Amdahl’s lawAmdahl’s law • Performance improvement that can be gained from some faster mode of execution is limited by fraction of the time the faster mode can be used Budditha Hettige 24
  • 25. Amdahl’s lawAmdahl’s law • Speedup depends on – Fraction of computation time in original machine that can be converted to take advantage of the enhancement (Fraction Enhanced) – Improvement gains by enhanced execution mode (Speedup Enhanced) Budditha Hettige 25
  • 26. ExampleExample Total execution time of a Program = 50 s Execution time that can be enhanced = 30 s FractionEnhanced = 30 /50 = 0.6 Budditha Hettige 26
  • 28. ExampleExample Normal mode execution time for some portion of a program = 6s Enhances mode execution time for the same program = 2s Speedup Enhanced = 6/2 = 3 Budditha Hettige 28
  • 30. ExampleExample • Suppose we consider an enhancement to the processor of a server system used for Web serving. New CPU is 10 times faster on computation in Web application than original CPU. Assume original CPU is busy with computation 40% of the time and is waiting for I/O 60% of time. What is the overall speedup gained from enhancement? Budditha Hettige 30
  • 32. RemarkRemark • If an enhancement is only usable for fraction of a task, we cannot speedup by more than Budditha Hettige 32
  • 33. ExampleExample • A common transformation required in graphics engines is square root. Implementation of floating- point (FP) square root vary significantly in performance, especially among processors designed graphics • Suppose FP square root (FPSQR) is responsible for 20% of execution tine of a critical graphics program • Design alternative 1. Enhance EPSQR hardware and speed up this operation by a factor of 10 2. Make all FP instruction run faster by a factor of 1.6 Budditha Hettige 33
  • 34. ExampleExample • FP instruction are responsible for a total of 50% of execution time. Design team believes they can make all fp instruction run 1.6 times faster with same effort as required for fast square root. Compare these two design alternatives Budditha Hettige 34
  • 36. CPU performance equationCPU performance equation CPU time = CPU clock cycles for a program x Clock cycle time = CPU clock cycles / Clock rate Budditha Hettige 36
  • 37. ExampleExample A program runs in 10s on computer A having 400 MHz clock. A new machine B, which could run the same program in 6s, has to be designed. Further, B should have 1.2 times as many clock cycles as A. What should be the clock rate of B? Budditha Hettige 37
  • 39. CPU Clock CyclesCPU Clock Cycles CPI (clock cycles per instruction) average no. of clock cycles each instruction takes to execute IC (instruction count) no. of instructions executed in the program CPU clock cycles = CPI x IC Note: CPI can be used to compare two different implementations of the same instruction set architecture (as IC required for a program is same) Budditha Hettige 39
  • 40. ExampleExample • Consider two implementations of same instruction set architecture. For a certain program, details of time measurements of two machines are given below • Which machine is faster for this program and by how much? Budditha Hettige 40
  • 42. Measuring componentsMeasuring components of CPU performance equationof CPU performance equation • CPU Time: by running the program • Clock Cycle Time: published in documentation • IC: by a software tools/simulator of the architecture ((more difficult to obtain) • CPI: by simulation of an implementation (more difficult to obtain) Budditha Hettige 42
  • 43. CPU clock cyclesCPU clock cycles Suppose n different types of instruction Let ICi – No. of times instruction i is executed in a program CPIi – Avg. no. of clock cycles for instruction i Budditha Hettige 43
  • 44. ExampleExample Suppose we have made the following measurements: – Frequency of FP operations (other than FPSQR) = 25% – Average CPI of FP operations = 4.0 – Average CPI of other instructions = 1.33 – Frequency of FPSQR= 2% – CPI of FPSQR = 20 Design alternatives: 1. decrease CPI of FPSQR to 2 2. decrease average CPI of all FP operation to 2.5 Compare these two design alternatives using CPU performance equation Budditha Hettige 44
  • 45. AnswersAnswers • Note that only CPI changes; clock rate; IC remain identical Budditha Hettige 45
  • 46. MIPS as a performance measureMIPS as a performance measure Budditha Hettige 46
  • 47. ProblemsProblems MIPS as a performance measure • MIPS is dependant on instruction set – difficult to compare MIPS of computers with different instruction sets • MIPS can vary inversely to performance Budditha Hettige 47
  • 48. MFLOPS as a performance measureMFLOPS as a performance measure Budditha Hettige 48
  • 49. ProblemsProblems MIPS as a performance measure • MFLOPS is not dependable – Cray C90 has no divide instructions while Pentium has • MFLOPS depends on the mixture of fast and slow floating point operations – add (fast) and divide (slow) operations Budditha Hettige 49