SlideShare a Scribd company logo
Computer Architecture- 1
Outline
 1.1 Introduction
 1.2 Classes of Computers
 1.3 Defining Computer Architecture
 1.4 Trends in Technology
 1.5 Trends in Power in Integrated Circuits
 1.6 Trends in Cost
 1.7 Dependability
 1.8 Measuring, Reporting, and Summarizing Performance
 1.9 Quantitative Principles of Computer Design
 1.10Putting It All Together: Performance and Price-Performance
Computer Architecture- 2
1.2 Classes of Computers
 Desktop Computing: optimize price-performance.
 Servers: provide larger-scale and more reliable file and computing services.
 Embedded Computers: real-time performance requirement.
Feature Desktop Server Embedded
Price of
system
$500-$5,000 $5,000-
$5,000,000
$10-$100,000 (including
network routers at the high
end)
Price of
microprocesso
r module
$50-$500 (per
processor)
$200-$10,000
(per
processor)
$0.01-$100 (per processor)
Critical
system design
issues
Price-
performance,
graphics
performance
Throughput,
availability,
scalability
Price, power consumption,
application-specific
performance
Computer Architecture- 3
Outline
 1.1 Introduction
 1.2 Classes of Computers
 1.3 Defining Computer Architecture
 1.4 Trends in Technology
 1.5 Trends in Power in Integrated Circuits
 1.6 Trends in Cost
 1.7 Dependability
 1.8 Measuring, Reporting, and Summarizing Performance
 1.9 Quantitative Principles of Computer Design
 1.10Putting It All Together: Performance and Price-Performance
Computer Architecture- 4
Instruction Set Architecture: Critical Interface
 Properties of a good abstraction
 Lasts through many generations (portability)
 Used in many different ways (generality)
 Provides convenient functionality to higher levels
 Permits an efficient implementation at lower levels
instruction set
software
hardware
Computer Architecture- 5
Instruction Set Architecture (ISA)
 ISA is the actual programmer-visible instruction set.
 Class of ISA;
 Memory addressing;
 Addressing modes;
 Types and sizes of operands;
 Operations;
 Control flow instructions;
 Encoding on ISA.
Computer Architecture- 6
Organization, Hardware, and Architecture
 Organization: includes the high-level aspects of a computer’s
design.
 Memory system, the memory interconnect, and the design of the internal
processor or CPU (arithmetic, logic, branching, and data transfer).
 For example: AMD Opteron 64 and Intel P4 have same ISA, but they have
different internal pipeline and cache organizations.
 Hardware: detailed logic design and the packaging technology.
 For example, P4 and Mobile P4 have same ISA and organization, but they
have different clock frequency and memory system.
 Architecture: covers all three aspects of computer design –
instruction set architecture, organization, and hardware.
 Designer must meet functional requirements as well as price, power,
performance, and availability goals.
Computer Architecture- 7
Outline
 1.1 Introduction
 1.2 Classes of Computers
 1.3 Defining Computer Architecture
 1.4 Trends in Technology
 1.5 Trends in Power in Integrated Circuits
 1.6 Trends in Cost
 1.7 Dependability
 1.8 Measuring, Reporting, and Summarizing Performance
 1.9 Quantitative Principles of Computer Design
 1.10Putting It All Together: Performance and Price-Performance
Computer Architecture- 8
1.4 Trends in Technology
 A successful new ISA may last decades, for example, IBM
mainframe.
 Four critical technologies
 Integrated circuit logic technology: transistor density increased by about
35% per year, quadrupling in somewhat over four years;
 Semiconductor DRAM (Dynamic Random-Access Memory): capacity
increases by about 40% per year, doubling roughly every two years;
 Magnetic disk technology: roller coaster of rates, disk are 50-100 times
cheaper per bit than DRAM (chapter 6).
 Network technology: network performance depends both on the
performance of switches and transmission.
Computer Architecture- 9
Performance Trends: Bandwidth over Latency
 Bandwidth or throughput:
the total amount of work
done in a given time.
 Such as megabyte per
second for a disk
transfer.
 Latency or response time:
the time between the start
and the completion of an
event.
 Such as milliseconds for
a disk access.
Computer Architecture- 10
Scaling of Transistor Performance and Wires
 Feature size: the minimum size of a transistor or a wire in either
the x or y dimension.
 From 10 microns in 1971 to 0.09 microns (90 nm) in 2006;
 The density of transistors increases quadratically with a linear decrease in
feature size;
 Transistor performance improves linearly with decreasing feature size;
 Since improvement in transistor density, thus CPU move quickly from 4-
bit to 8-bit, to 16-bit, to 32-bit microprocessors;
 However, the signal delay for a wire increases in proportion to the
production of its resistance and capacitance.
Computer Architecture- 11
Outline
 1.1 Introduction
 1.2 Classes of Computers
 1.3 Defining Computer Architecture
 1.4 Trends in Technology
 1.5 Trends in Power in Integrated Circuits
 1.6 Trends in Cost
 1.7 Dependability
 1.8 Measuring, Reporting, and Summarizing Performance
 1.9 Quantitative Principles of Computer Design
 1.10Putting It All Together: Performance and Price-Performance
Computer Architecture- 12
Power in IC (1/3)
 Power also provides challenges as devices are scaled.
 Dynamic power (watts, W)in CMOS chip: the traditional dominant energy
consumption has been in switching transistors.
 For mobile devices: they care about battery life more than power, so
energy is the proper metric, measured in joules:
switched
Frequency
Voltage
load
Capacitive
2
1
Power 2
dynamic 



† In modern VLSI, the exact power measurement is the sum of,
Powertotal=Powerdynamic+Powerstatic+Powerleakage
2
dynamic Voltage
load
Capacitive
Energy 

† Hence, lower voltage can reduce Powerdynamic and Energydynamic greatly.
(In the past 20 years, supply voltage is from 5V down to 1V)
Computer Architecture- 13
Power in IC (2/3)
 Example 1 (p.22): Some microprocessor today are design to have adjustable
voltage, so that a 15% reduction in voltage may result in a 15% reduction in
frequency. What would be the impact on dynamic power?
 Answer
Since the capacitance is unchanged, the answer is the ratios of the voltages and
frequencies:
thereby reducing power to about 60% of the original.
    61
.
0
85
.
0
switch
Frequency
Voltage
85
.
0
switched
Frequency
0.85
Voltage
Power
Power 3
2
2
old
new







Computer Architecture- 14
Power in IC (3/3)
 As we move from one process to the next, (60 nm or 45 nm…)
 Transistor switching and frequency ↑;
 Capacitance and voltage ↓;
 However, power consumption and energy ↑.
 Static power: an important issue because leakage current flows
even when a transistor is off:
 Thus, transistor ↑, power ↑;
 Feature size ↓, power ↑ (why? You can find out in VLSI area).
Voltage
Current
Power static
static 

Computer Architecture- 15
Outline
 1.1 Introduction
 1.2 Classes of Computers
 1.3 Defining Computer Architecture
 1.4 Trends in Technology
 1.5 Trends in Power in Integrated Circuits
 1.6 Trends in Cost
 1.7 Dependability
 1.8 Measuring, Reporting, and Summarizing Performance
 1.9 Quantitative Principles of Computer Design
 1.10Putting It All Together: Performance and Price-Performance
Computer Architecture- 16
Silicon Wafer and Dies
 Exponential cost decrease – technology basically the same:
 A wafer is tested and chopped into dies that are packaged.
Die (晶粒)
Wafer (晶圓)
AMD K8, source: http://www.amd.com
dies along the edge
Computer Architecture- 17
Cost of an Integrated Circuit (IC)
yield
Die
wafer
per
Dies
#
wafer
of
Cost
die
of
Cost


yield
test
Final
test
final
and
packaging
of
Cost
die
testing
of
Cost
die
of
Cost
IC
of
Cost



 
area
Die
2
diameter
Wafer
π
area
Die
radius
Wafer
π
wafer
per
Dies
#
2





α
α
area
Die
desity
Defect
1
yield
Wafer
yield
Die






 



Today’s technology:   4.0, defect density 0.4 ~ 0.8 per cm2
(A greater portion of the cost that varies between machines)
(sensitive to die size) (# of dies along the edge)
Computer Architecture- 18
Examples of Cost of an IC
 Example 1 (p.22): Find the number of dies per 300 mm (30 cm) wafer for a die that is
1.5 cm on a side.
 The total die area is 2.25 cm2. Thus
 Example 2 (p.24): Find the die yield for dies that are 1.5 cm on a side and 1.0 cm on a
side, assuming a defect density of 0.4 per cm2and α is 4.
 The total die areas are 2.25 cm2 and 1.00 cm2. For the large die the yield is
 
  270
12
.
2
2
.
94
25
.
2
5
.
706
25
.
2
2
30
25
.
2
2
/
30
area
Die
2
diameter
Wafer
π
area
Die
radius
Wafer
π
wafer
per
Dies
#
2
2















44
.
0
0
.
4
25
.
2
4
.
0
1
α
area
Die
desity
Detect
1
yield
Wafer
yield
Die
4
α






 







 





68
.
0
0
.
4
00
.
1
4
.
0
1
yield
Die
4






 



For the small die, it is
Computer Architecture- 19
Outline
 1.1 Introduction
 1.2 Classes of Computers
 1.3 Defining Computer Architecture
 1.4 Trends in Technology
 1.5 Trends in Power in Integrated Circuits
 1.6 Trends in Cost
 1.7 Dependability
 1.8 Measuring, Reporting, and Summarizing Performance
 1.9 Quantitative Principles of Computer Design
 1.10Putting It All Together: Performance and Price-Performance
Computer Architecture- 20
Response Time, Throughput, and Performance
 Response time (反應時間): the time between the start and the
completion of an event – also referred to as execution time.
 The computer user is interested.
 Throughput (流通量): the total amount of work done in a given
time.
 The administrator of a large data processing center may be interested.
 In comparing design alternatives,
 The phrase “X is faster than Y” is used here to mean that the response time
or execution time is lower on X than on Y.
 In particular, “X is n times faster than Y” or “the throughput of X is n
times higher than Y” will mean
n

X
Y
time
Execution
time
Execution
Computer Architecture- 21
Performance Measuring
 Execution is the reciprocal of performance,
X
X
time
Execution
1
e
Performanc 
Y
X
X
Y
X
Y
e
Performanc
e
Performanc
e
Performanc
1
e
Performanc
1
Time
Execution
Time
Execution



n
Computer Architecture- 22
Reliable Measure – User CPU Time
 Response time may include disk access, memory access, input/output
activities, CPU event and operating system overhead – everything…
 In order to get an accurate measure of performance, we use CPU time instead
of using response time.
 CPU time is the time the CPU spends computing a program and does not
include time spent waiting for I/O or running other programs.
 CPU time can also be divided into user CPU time (program) and system CPU
time (OS).
 Key in UNIX command time, we have, 90.7s 12.9s 2:39 65% (user CPU,
system CPU, total response,%).
 In our performance measures, we use user CPU time – because of its
independence on the OS and other factors.
Computer Architecture- 23
Outline
 1.1 Introduction
 1.2 Classes of Computers
 1.3 Defining Computer Architecture
 1.4 Trends in Technology
 1.5 Trends in Power in Integrated Circuits
 1.6 Trends in Cost
 1.7 Dependability
 1.8 Measuring, Reporting, and Summarizing Performance
 1.9 Quantitative Principles of Computer Design
 1.10Putting It All Together: Performance and Price-Performance
Computer Architecture- 24
Four Useful Principles of CA Design
 Take advantage of parallelism
 One most important methods for improving performance.
» System level parallelism and Individual processor level parallelism.
 Principle of Locality
 The properties of programs.
» Temporal locality and Spatial locality.
 Focus on the common case
 For power, resource allocation and performance.
 Amdahl’s law
 “The performance improvement to be gained from using some faster mode
of execution is limited by the fraction of the time the faster mode can be
used.”
Computer Architecture- 25
Two Equations to Evaluate Alternatives
 Amdahl’s Law
 The performance gain that can be obtained by improving some porting of a
computer can be calculated using Amdahl’s Law.
 Amdahl’s Law defines the speedup that can be gained by using a particular
feature.
 The CPU Performance Equation
 Essentially all computers are constructed using a clock running at a
constant rate.
 CPU time then can be expressed by the amount of clock cycles.
Computer Architecture- 26
Amdahl's Law (1/5)
 Speedup is the ratio
 Alternatively,
 Two major reasons of Speedup enhancement
» Fractionenhanced: the fraction of the execution time in the original machine that can be
converted to take advantage of the enhancement (≦1).
» Speedupenhanced: the improvement gained by the enhanced execution mode (≧1).
t
enhancemen
the
using
out
task with
entire
for
e
Performanc
possible
t when
enhancemen
using
task
entire
for
e
Performanc
Speedup 
This fraction enhanced
possible
t when
enhancemen
using
task
entire
for
time
Execution
t
enhancemen
the
using
out
task with
entire
for
time
Execution
Speedup 
Computer Architecture- 27
Amdahl's Law (2/5)
 Thus, Execution Timeoverall = the time of the unenhanced portion
of the machine + the time spent using the enhancement, i.e. that
is,
ExTimeold ExTimenew
  











enhanced
enhanced
enhanced
old
new
Speedup
Fraction
Fraction
1
time
Execution
time
Execution
 
enhanced
enhanced
enhanced
new
old
overall
Speedup
Fraction
Fraction
1
1
time
Execution
time
Execution
Speedup




This fraction enhanced
Computer Architecture- 28
Amdahl's Law (3/5)
 Example 3 (p.40): Suppose that we want to enhance the processor used for
Web serving. The new processor is 10 times faster on computation in the Web
serving application than the original processor. Assuming that the original
processor is busy with computation 40% of the time and is waiting for I/O
60% of the time, what is the overall speedup gained by incorporating the
enhancement?
 Answer
Fractionenhanced = 0.4, Speedupenhanced = 10
56
.
1
64
.
0
1
0.04
0.6
1
10
0.4
0.4)
-
(1
1
Speedupoverall 





† Amdahl’s Law can serve as a guide to how much an enhancement will
improve performance and how to distribute resources to improve cost-
performance.
Computer Architecture- 29
Amdahl's Law (4/5)
 Example 4 (p.40): A common transformation required in graphics processors is square root.
Implementations of floating-point (FP) square root vary significantly in performance, especially
among processors designed for graphics. Suppose FP square root (FPSOR) is responsible for 20%
of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR
hardware and speed up this operation by a factor of 10. The other alternative is just to try to make
all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are
responsible for half of the execution time for the application. The design team believes that they
can make all FP instructions run 1.6 times faster with the same effort as required for the fast
square root. Compare these two design alternatives.
 Answer
We can compare these two alternatives by comparing the speedups:
Improving the performance of the FP operations overall is slightly better because of the higher
frequency.
56
.
1
82
.
0
1
10
0.2
0.2)
-
(1
1
SpeedupFPSQR 



23
.
1
8125
.
0
1
1.6
0.5
0.5)
-
(1
1
SpeedupFP 



Computer Architecture- 30
Amdahl's Law (5/5)
 Example 5 (p.41): The calculation of the failure rates of the disk subsystem was
Therefore, the fraction of the failure rate that could be improved is 5 per million hours
out of 23 for the whole system, or 0.22.
 Answer
The reliability improvement would be
Despite an impressive 4150X improvement in reliability of one module, from the
system’s perspective, the change has a measurable but small benefit.
28
.
1
78
.
0
1
4150
0.22
0.22)
-
(1
1
t
Improvemen pair
supply
power 



hours
1,000,000
23
hours
1,000,000
1
5
5
2
10
1,000,000
1
200,000
1
200,00
1
500,000
1
1,000,000
1
10
rate
Failure system












Computer Architecture- 31
CPU Performance (1/5)
 Essentially all computers are constructed using clock (all called
ticks, clock ticks, clock periods, clocks, cycles, or clock cycles)
running at a constant rate.
 Clock rate: today in GHz
 Clock cycle time: clock cycle time = 1/clock rate
 Ex. 1 GHz clock rate = 1 ns cycle time
 Thus, the CPU time for a program can be expressed two ways:
Or,
time
cycle
Clock
program
a
for
cycles
clock
CPU
Time
CPU 

rate
Clock
program
a
for
cycles
clock
CPU
Time
CPU 
Computer Architecture- 32
CPU Performance (2/5)
 We can also count the number of instructions executed – the instruction path
length or instruction count (IC).
 If we know the number of clock cycles and IC, then the average number of
clock cycles per instruction (CPI).
 CPI is computed as
 Thus, clock cycles can be defined as IC × CPI, this allows us to use CPI in the
execution time formula:
IC
program
a
for
cycles
clock
CPU
CPI 
† This figure provides insight into different styles of instruction sets and
implementations.
rate
Clock
CPI
IC
time
cycle
Clock
CPI
IC
time
CPU





Computer Architecture- 33
CPU Performance (3/5)
 The pieces fit together of CPU time
 A α% improvement in any one of three pieces leads to a α% improvement in
CPU time.
 Unfortunately, it is difficult to change one parameter in complete isolation form
others, because the technologies of them are interdependent:
» Clock cycle time: Hardware technology and organization;
» CPI: Organization and instruction set architecture;
» Instruction count: Instruction set architecture and compiler technology.
time
CPU
program
Seconds
cycle
Clock
Seconds
n
Instructio
cycles
Clock
Program
ns
Instructio
program
time
cycle
cycles
clock
time
cycle
Clock
program
a
for
cycles
clock
CPU
Time
CPU









† Processor performance is dependent upon three characteristics:
instruction count, clock cycles per instruction and clock cycle (or rate).
† Computer architecture is focus on CPI and IC parameters.
Computer Architecture- 34
CPU Performance (4/5)
 To calculate the number of total processor clock cycles as
 To express CPU time again
 And overall CPI as
i
n
i
i CPI
IC
cycles
clock
CPU
1

 

ICi: the number of times instruction i is executed in a program.
CPIi: the average number of clocks per instruction for instruction i.
† ICi/IC presents the fraction of occurrences of that instruction in a program.
† It is useful in designing the processor.
time
cycle
Clock
CPI
IC
time
CPU
1








 

i
n
i
i














n
i
i
i
i
n
i
i
1
1
CPI
count
n
Instructio
IC
count
n
Instructio
CPI
IC
CPI
Hint: CPIi should be measured
because pipeline effects, cache
misses, and any other memory
system inefficiencies.
Computer Architecture- 35
CPU Performance (5/5)
 Example 6 (p.43): Suppose we have made the following measurements:
Frequency of FP operations = 25%, Average CPI of FP operations =4.0, Average CPI of other instructions =
1.33, Frequency of FPSQR = 2%, CPI of FPSQR =20.
Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of
all FP operations to 2.5. Compare these two design alternatives using the processor performance equation.
 Answer
First, observe that only the CPI changes; the clock rate and instruction count remain identical. We start by
finding the original CPI with neither enhancement;
We can compute the CPI for the enhanced FPSQR by subtracting the cycles saved from the original CPI:
    0
.
2
75%
1.33
25%
4
count
n
Instructio
IC
CPI
CPI
1
original 











 

n
i
i
i
    64
.
1
2
-
20
%
2
0
.
2
CPI
CPI
2%
CPI
CPI only
FPSQR
new
of
FPSQR
old
original
FPSQR
new
with 







    62
.
1
5
.
2
%
25
1.33
75%
CPI FP
new 




23
.
1
625
.
1
00
.
2
CPI
CPI
CPI
cycle
Clock
IC
CPI
cycle
Clock
IC
time
CPU
time
CPU
Speedup
FP
new
original
FP
new
original
FP
new
original
FP
new 








Computer Architecture- 36
Amdahl's Law vs. CPU Performance
 CPU performance equation is better than Amdahl’s Law
 Possible to measure the constituent parts;
 To measure the fraction of execution time for which a set of instructions is
responsible;
 For an existing processor, to measure execution time and clock speed is
easy;
 The challenge lies in discovering the instruction count or the CPI.
» Most new processors include counter for both instructions executed and for
clock cycles.

More Related Content

PPTX
Memory organization in computer architecture
Faisal Hussain
 
PPTX
Computer Organisation & Architecture (chapter 1)
Subhasis Dash
 
PPTX
PIPELINE INTERRUPTS
M R Karthik
 
PPTX
Stack organization
chauhankapil
 
PPT
Flynns classification
Yasir Khan
 
PPTX
Multiprocessor
Kamal Acharya
 
PPTX
Instruction cycle presentation
Moniba Irfan
 
PDF
Memory organization
Dr. Abhineet Anand
 
Memory organization in computer architecture
Faisal Hussain
 
Computer Organisation & Architecture (chapter 1)
Subhasis Dash
 
PIPELINE INTERRUPTS
M R Karthik
 
Stack organization
chauhankapil
 
Flynns classification
Yasir Khan
 
Multiprocessor
Kamal Acharya
 
Instruction cycle presentation
Moniba Irfan
 
Memory organization
Dr. Abhineet Anand
 

What's hot (20)

PPTX
Instruction Cycle in Computer Organization.pptx
Yash346903
 
PPT
Pipeline hazards in computer Architecture ppt
mali yogesh kumar
 
PPTX
CISC & RISC Architecture
Suvendu Kumar Dash
 
PPTX
Neuro-fuzzy systems
Sagar Ahire
 
PPTX
priority interrupt computer organization
chnrketan
 
PPTX
Cache coherence ppt
ArendraSingh2
 
PPTX
Computer registers
DeepikaT13
 
PDF
Address Binding Scheme
Rajesh Piryani
 
PPT
Parallel processing
rajshreemuthiah
 
PPT
Computer architecture register transfer languages rtl
Mazin Alwaaly
 
PPT
Cpu organisation
Er Sangita Vishwakarma
 
DOCX
Control Units : Microprogrammed and Hardwired:control unit
abdosaidgkv
 
PPTX
Instruction pipeline: Computer Architecture
InteX Research Lab
 
PPT
Stored program concept
gaurav jain
 
PDF
Control Unit Design
Vinit Raut
 
PPTX
Basic Computer Organization and Design
Aksum Institute of Technology(AIT, @Letsgo)
 
PPTX
Computer architecture virtual memory
Mazin Alwaaly
 
PDF
Computer Organization Lecture Notes
FellowBuddy.com
 
PPT
Unit 4 ca-input-output
BBDITM LUCKNOW
 
PPTX
Signed Addition And Subtraction
Keyur Vadodariya
 
Instruction Cycle in Computer Organization.pptx
Yash346903
 
Pipeline hazards in computer Architecture ppt
mali yogesh kumar
 
CISC & RISC Architecture
Suvendu Kumar Dash
 
Neuro-fuzzy systems
Sagar Ahire
 
priority interrupt computer organization
chnrketan
 
Cache coherence ppt
ArendraSingh2
 
Computer registers
DeepikaT13
 
Address Binding Scheme
Rajesh Piryani
 
Parallel processing
rajshreemuthiah
 
Computer architecture register transfer languages rtl
Mazin Alwaaly
 
Cpu organisation
Er Sangita Vishwakarma
 
Control Units : Microprogrammed and Hardwired:control unit
abdosaidgkv
 
Instruction pipeline: Computer Architecture
InteX Research Lab
 
Stored program concept
gaurav jain
 
Control Unit Design
Vinit Raut
 
Basic Computer Organization and Design
Aksum Institute of Technology(AIT, @Letsgo)
 
Computer architecture virtual memory
Mazin Alwaaly
 
Computer Organization Lecture Notes
FellowBuddy.com
 
Unit 4 ca-input-output
BBDITM LUCKNOW
 
Signed Addition And Subtraction
Keyur Vadodariya
 
Ad

Similar to 287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt (20)

PPTX
1 Computer Architecture
fika sweety
 
PPTX
Caqa5e ch1 with_review_and_examples
Aravindharamanan S
 
PDF
Lecture 1 Advanced Computer Architecture
MuhammadYasirQadri1
 
PDF
Lecture1_Introduction_computerar (1).pdf
jacksafahi
 
PPTX
Computer Architechture and Organization
Aiman Hafeez
 
PPTX
FUNDAMENTALS OF COMPUTER DESIGN
venkatraman227
 
PPT
Computer Abstractions and Technologies
srsriramsrs
 
PPT
Chapter_01computer architecture chap 2 .ppt
Maaz609108
 
PPTX
Computer Architecture and Organization-
C.Helen Sulochana
 
PPTX
Fundamentals of Quantitative Design and Analysis.pptx
aliali240367
 
PPT
lec01_intr architecture com computeo.ppt
compengwaelalahmar
 
PPTX
Chapter_01.pptx
aliceasiedu980
 
PPTX
Advanced Computer Architecture – An Introduction
Dilum Bandara
 
PPTX
Fundamentals.pptx
dhivyak49
 
PPT
CA UNIT I PPT.ppt
RAJESH S
 
PPTX
Computer Architecture
Haris456
 
PDF
lec01.pdf
BeiYu6
 
PPTX
Chapter 1.pptx
claudio48
 
PDF
Memory Designed and Some Other Conecpt of Computer Architecture
DakshGoti2
 
PPT
Chapter 1 computer abstractions and technology
BATMUNHMUNHZAYA
 
1 Computer Architecture
fika sweety
 
Caqa5e ch1 with_review_and_examples
Aravindharamanan S
 
Lecture 1 Advanced Computer Architecture
MuhammadYasirQadri1
 
Lecture1_Introduction_computerar (1).pdf
jacksafahi
 
Computer Architechture and Organization
Aiman Hafeez
 
FUNDAMENTALS OF COMPUTER DESIGN
venkatraman227
 
Computer Abstractions and Technologies
srsriramsrs
 
Chapter_01computer architecture chap 2 .ppt
Maaz609108
 
Computer Architecture and Organization-
C.Helen Sulochana
 
Fundamentals of Quantitative Design and Analysis.pptx
aliali240367
 
lec01_intr architecture com computeo.ppt
compengwaelalahmar
 
Chapter_01.pptx
aliceasiedu980
 
Advanced Computer Architecture – An Introduction
Dilum Bandara
 
Fundamentals.pptx
dhivyak49
 
CA UNIT I PPT.ppt
RAJESH S
 
Computer Architecture
Haris456
 
lec01.pdf
BeiYu6
 
Chapter 1.pptx
claudio48
 
Memory Designed and Some Other Conecpt of Computer Architecture
DakshGoti2
 
Chapter 1 computer abstractions and technology
BATMUNHMUNHZAYA
 
Ad

Recently uploaded (20)

PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PPTX
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PDF
All chapters of Strength of materials.ppt
girmabiniyam1234
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
DOCX
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
All chapters of Strength of materials.ppt
girmabiniyam1234
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 

287233027-Chapter-1-Fundamentals-of-Computer-Design-ppt.ppt

  • 1. Computer Architecture- 1 Outline  1.1 Introduction  1.2 Classes of Computers  1.3 Defining Computer Architecture  1.4 Trends in Technology  1.5 Trends in Power in Integrated Circuits  1.6 Trends in Cost  1.7 Dependability  1.8 Measuring, Reporting, and Summarizing Performance  1.9 Quantitative Principles of Computer Design  1.10Putting It All Together: Performance and Price-Performance
  • 2. Computer Architecture- 2 1.2 Classes of Computers  Desktop Computing: optimize price-performance.  Servers: provide larger-scale and more reliable file and computing services.  Embedded Computers: real-time performance requirement. Feature Desktop Server Embedded Price of system $500-$5,000 $5,000- $5,000,000 $10-$100,000 (including network routers at the high end) Price of microprocesso r module $50-$500 (per processor) $200-$10,000 (per processor) $0.01-$100 (per processor) Critical system design issues Price- performance, graphics performance Throughput, availability, scalability Price, power consumption, application-specific performance
  • 3. Computer Architecture- 3 Outline  1.1 Introduction  1.2 Classes of Computers  1.3 Defining Computer Architecture  1.4 Trends in Technology  1.5 Trends in Power in Integrated Circuits  1.6 Trends in Cost  1.7 Dependability  1.8 Measuring, Reporting, and Summarizing Performance  1.9 Quantitative Principles of Computer Design  1.10Putting It All Together: Performance and Price-Performance
  • 4. Computer Architecture- 4 Instruction Set Architecture: Critical Interface  Properties of a good abstraction  Lasts through many generations (portability)  Used in many different ways (generality)  Provides convenient functionality to higher levels  Permits an efficient implementation at lower levels instruction set software hardware
  • 5. Computer Architecture- 5 Instruction Set Architecture (ISA)  ISA is the actual programmer-visible instruction set.  Class of ISA;  Memory addressing;  Addressing modes;  Types and sizes of operands;  Operations;  Control flow instructions;  Encoding on ISA.
  • 6. Computer Architecture- 6 Organization, Hardware, and Architecture  Organization: includes the high-level aspects of a computer’s design.  Memory system, the memory interconnect, and the design of the internal processor or CPU (arithmetic, logic, branching, and data transfer).  For example: AMD Opteron 64 and Intel P4 have same ISA, but they have different internal pipeline and cache organizations.  Hardware: detailed logic design and the packaging technology.  For example, P4 and Mobile P4 have same ISA and organization, but they have different clock frequency and memory system.  Architecture: covers all three aspects of computer design – instruction set architecture, organization, and hardware.  Designer must meet functional requirements as well as price, power, performance, and availability goals.
  • 7. Computer Architecture- 7 Outline  1.1 Introduction  1.2 Classes of Computers  1.3 Defining Computer Architecture  1.4 Trends in Technology  1.5 Trends in Power in Integrated Circuits  1.6 Trends in Cost  1.7 Dependability  1.8 Measuring, Reporting, and Summarizing Performance  1.9 Quantitative Principles of Computer Design  1.10Putting It All Together: Performance and Price-Performance
  • 8. Computer Architecture- 8 1.4 Trends in Technology  A successful new ISA may last decades, for example, IBM mainframe.  Four critical technologies  Integrated circuit logic technology: transistor density increased by about 35% per year, quadrupling in somewhat over four years;  Semiconductor DRAM (Dynamic Random-Access Memory): capacity increases by about 40% per year, doubling roughly every two years;  Magnetic disk technology: roller coaster of rates, disk are 50-100 times cheaper per bit than DRAM (chapter 6).  Network technology: network performance depends both on the performance of switches and transmission.
  • 9. Computer Architecture- 9 Performance Trends: Bandwidth over Latency  Bandwidth or throughput: the total amount of work done in a given time.  Such as megabyte per second for a disk transfer.  Latency or response time: the time between the start and the completion of an event.  Such as milliseconds for a disk access.
  • 10. Computer Architecture- 10 Scaling of Transistor Performance and Wires  Feature size: the minimum size of a transistor or a wire in either the x or y dimension.  From 10 microns in 1971 to 0.09 microns (90 nm) in 2006;  The density of transistors increases quadratically with a linear decrease in feature size;  Transistor performance improves linearly with decreasing feature size;  Since improvement in transistor density, thus CPU move quickly from 4- bit to 8-bit, to 16-bit, to 32-bit microprocessors;  However, the signal delay for a wire increases in proportion to the production of its resistance and capacitance.
  • 11. Computer Architecture- 11 Outline  1.1 Introduction  1.2 Classes of Computers  1.3 Defining Computer Architecture  1.4 Trends in Technology  1.5 Trends in Power in Integrated Circuits  1.6 Trends in Cost  1.7 Dependability  1.8 Measuring, Reporting, and Summarizing Performance  1.9 Quantitative Principles of Computer Design  1.10Putting It All Together: Performance and Price-Performance
  • 12. Computer Architecture- 12 Power in IC (1/3)  Power also provides challenges as devices are scaled.  Dynamic power (watts, W)in CMOS chip: the traditional dominant energy consumption has been in switching transistors.  For mobile devices: they care about battery life more than power, so energy is the proper metric, measured in joules: switched Frequency Voltage load Capacitive 2 1 Power 2 dynamic     † In modern VLSI, the exact power measurement is the sum of, Powertotal=Powerdynamic+Powerstatic+Powerleakage 2 dynamic Voltage load Capacitive Energy   † Hence, lower voltage can reduce Powerdynamic and Energydynamic greatly. (In the past 20 years, supply voltage is from 5V down to 1V)
  • 13. Computer Architecture- 13 Power in IC (2/3)  Example 1 (p.22): Some microprocessor today are design to have adjustable voltage, so that a 15% reduction in voltage may result in a 15% reduction in frequency. What would be the impact on dynamic power?  Answer Since the capacitance is unchanged, the answer is the ratios of the voltages and frequencies: thereby reducing power to about 60% of the original.     61 . 0 85 . 0 switch Frequency Voltage 85 . 0 switched Frequency 0.85 Voltage Power Power 3 2 2 old new       
  • 14. Computer Architecture- 14 Power in IC (3/3)  As we move from one process to the next, (60 nm or 45 nm…)  Transistor switching and frequency ↑;  Capacitance and voltage ↓;  However, power consumption and energy ↑.  Static power: an important issue because leakage current flows even when a transistor is off:  Thus, transistor ↑, power ↑;  Feature size ↓, power ↑ (why? You can find out in VLSI area). Voltage Current Power static static  
  • 15. Computer Architecture- 15 Outline  1.1 Introduction  1.2 Classes of Computers  1.3 Defining Computer Architecture  1.4 Trends in Technology  1.5 Trends in Power in Integrated Circuits  1.6 Trends in Cost  1.7 Dependability  1.8 Measuring, Reporting, and Summarizing Performance  1.9 Quantitative Principles of Computer Design  1.10Putting It All Together: Performance and Price-Performance
  • 16. Computer Architecture- 16 Silicon Wafer and Dies  Exponential cost decrease – technology basically the same:  A wafer is tested and chopped into dies that are packaged. Die (晶粒) Wafer (晶圓) AMD K8, source: http://www.amd.com dies along the edge
  • 17. Computer Architecture- 17 Cost of an Integrated Circuit (IC) yield Die wafer per Dies # wafer of Cost die of Cost   yield test Final test final and packaging of Cost die testing of Cost die of Cost IC of Cost      area Die 2 diameter Wafer π area Die radius Wafer π wafer per Dies # 2      α α area Die desity Defect 1 yield Wafer yield Die            Today’s technology:   4.0, defect density 0.4 ~ 0.8 per cm2 (A greater portion of the cost that varies between machines) (sensitive to die size) (# of dies along the edge)
  • 18. Computer Architecture- 18 Examples of Cost of an IC  Example 1 (p.22): Find the number of dies per 300 mm (30 cm) wafer for a die that is 1.5 cm on a side.  The total die area is 2.25 cm2. Thus  Example 2 (p.24): Find the die yield for dies that are 1.5 cm on a side and 1.0 cm on a side, assuming a defect density of 0.4 per cm2and α is 4.  The total die areas are 2.25 cm2 and 1.00 cm2. For the large die the yield is     270 12 . 2 2 . 94 25 . 2 5 . 706 25 . 2 2 30 25 . 2 2 / 30 area Die 2 diameter Wafer π area Die radius Wafer π wafer per Dies # 2 2                44 . 0 0 . 4 25 . 2 4 . 0 1 α area Die desity Detect 1 yield Wafer yield Die 4 α                       68 . 0 0 . 4 00 . 1 4 . 0 1 yield Die 4            For the small die, it is
  • 19. Computer Architecture- 19 Outline  1.1 Introduction  1.2 Classes of Computers  1.3 Defining Computer Architecture  1.4 Trends in Technology  1.5 Trends in Power in Integrated Circuits  1.6 Trends in Cost  1.7 Dependability  1.8 Measuring, Reporting, and Summarizing Performance  1.9 Quantitative Principles of Computer Design  1.10Putting It All Together: Performance and Price-Performance
  • 20. Computer Architecture- 20 Response Time, Throughput, and Performance  Response time (反應時間): the time between the start and the completion of an event – also referred to as execution time.  The computer user is interested.  Throughput (流通量): the total amount of work done in a given time.  The administrator of a large data processing center may be interested.  In comparing design alternatives,  The phrase “X is faster than Y” is used here to mean that the response time or execution time is lower on X than on Y.  In particular, “X is n times faster than Y” or “the throughput of X is n times higher than Y” will mean n  X Y time Execution time Execution
  • 21. Computer Architecture- 21 Performance Measuring  Execution is the reciprocal of performance, X X time Execution 1 e Performanc  Y X X Y X Y e Performanc e Performanc e Performanc 1 e Performanc 1 Time Execution Time Execution    n
  • 22. Computer Architecture- 22 Reliable Measure – User CPU Time  Response time may include disk access, memory access, input/output activities, CPU event and operating system overhead – everything…  In order to get an accurate measure of performance, we use CPU time instead of using response time.  CPU time is the time the CPU spends computing a program and does not include time spent waiting for I/O or running other programs.  CPU time can also be divided into user CPU time (program) and system CPU time (OS).  Key in UNIX command time, we have, 90.7s 12.9s 2:39 65% (user CPU, system CPU, total response,%).  In our performance measures, we use user CPU time – because of its independence on the OS and other factors.
  • 23. Computer Architecture- 23 Outline  1.1 Introduction  1.2 Classes of Computers  1.3 Defining Computer Architecture  1.4 Trends in Technology  1.5 Trends in Power in Integrated Circuits  1.6 Trends in Cost  1.7 Dependability  1.8 Measuring, Reporting, and Summarizing Performance  1.9 Quantitative Principles of Computer Design  1.10Putting It All Together: Performance and Price-Performance
  • 24. Computer Architecture- 24 Four Useful Principles of CA Design  Take advantage of parallelism  One most important methods for improving performance. » System level parallelism and Individual processor level parallelism.  Principle of Locality  The properties of programs. » Temporal locality and Spatial locality.  Focus on the common case  For power, resource allocation and performance.  Amdahl’s law  “The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.”
  • 25. Computer Architecture- 25 Two Equations to Evaluate Alternatives  Amdahl’s Law  The performance gain that can be obtained by improving some porting of a computer can be calculated using Amdahl’s Law.  Amdahl’s Law defines the speedup that can be gained by using a particular feature.  The CPU Performance Equation  Essentially all computers are constructed using a clock running at a constant rate.  CPU time then can be expressed by the amount of clock cycles.
  • 26. Computer Architecture- 26 Amdahl's Law (1/5)  Speedup is the ratio  Alternatively,  Two major reasons of Speedup enhancement » Fractionenhanced: the fraction of the execution time in the original machine that can be converted to take advantage of the enhancement (≦1). » Speedupenhanced: the improvement gained by the enhanced execution mode (≧1). t enhancemen the using out task with entire for e Performanc possible t when enhancemen using task entire for e Performanc Speedup  This fraction enhanced possible t when enhancemen using task entire for time Execution t enhancemen the using out task with entire for time Execution Speedup 
  • 27. Computer Architecture- 27 Amdahl's Law (2/5)  Thus, Execution Timeoverall = the time of the unenhanced portion of the machine + the time spent using the enhancement, i.e. that is, ExTimeold ExTimenew               enhanced enhanced enhanced old new Speedup Fraction Fraction 1 time Execution time Execution   enhanced enhanced enhanced new old overall Speedup Fraction Fraction 1 1 time Execution time Execution Speedup     This fraction enhanced
  • 28. Computer Architecture- 28 Amdahl's Law (3/5)  Example 3 (p.40): Suppose that we want to enhance the processor used for Web serving. The new processor is 10 times faster on computation in the Web serving application than the original processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for I/O 60% of the time, what is the overall speedup gained by incorporating the enhancement?  Answer Fractionenhanced = 0.4, Speedupenhanced = 10 56 . 1 64 . 0 1 0.04 0.6 1 10 0.4 0.4) - (1 1 Speedupoverall       † Amdahl’s Law can serve as a guide to how much an enhancement will improve performance and how to distribute resources to improve cost- performance.
  • 29. Computer Architecture- 29 Amdahl's Law (4/5)  Example 4 (p.40): A common transformation required in graphics processors is square root. Implementations of floating-point (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FPSOR) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for half of the execution time for the application. The design team believes that they can make all FP instructions run 1.6 times faster with the same effort as required for the fast square root. Compare these two design alternatives.  Answer We can compare these two alternatives by comparing the speedups: Improving the performance of the FP operations overall is slightly better because of the higher frequency. 56 . 1 82 . 0 1 10 0.2 0.2) - (1 1 SpeedupFPSQR     23 . 1 8125 . 0 1 1.6 0.5 0.5) - (1 1 SpeedupFP    
  • 30. Computer Architecture- 30 Amdahl's Law (5/5)  Example 5 (p.41): The calculation of the failure rates of the disk subsystem was Therefore, the fraction of the failure rate that could be improved is 5 per million hours out of 23 for the whole system, or 0.22.  Answer The reliability improvement would be Despite an impressive 4150X improvement in reliability of one module, from the system’s perspective, the change has a measurable but small benefit. 28 . 1 78 . 0 1 4150 0.22 0.22) - (1 1 t Improvemen pair supply power     hours 1,000,000 23 hours 1,000,000 1 5 5 2 10 1,000,000 1 200,000 1 200,00 1 500,000 1 1,000,000 1 10 rate Failure system            
  • 31. Computer Architecture- 31 CPU Performance (1/5)  Essentially all computers are constructed using clock (all called ticks, clock ticks, clock periods, clocks, cycles, or clock cycles) running at a constant rate.  Clock rate: today in GHz  Clock cycle time: clock cycle time = 1/clock rate  Ex. 1 GHz clock rate = 1 ns cycle time  Thus, the CPU time for a program can be expressed two ways: Or, time cycle Clock program a for cycles clock CPU Time CPU   rate Clock program a for cycles clock CPU Time CPU 
  • 32. Computer Architecture- 32 CPU Performance (2/5)  We can also count the number of instructions executed – the instruction path length or instruction count (IC).  If we know the number of clock cycles and IC, then the average number of clock cycles per instruction (CPI).  CPI is computed as  Thus, clock cycles can be defined as IC × CPI, this allows us to use CPI in the execution time formula: IC program a for cycles clock CPU CPI  † This figure provides insight into different styles of instruction sets and implementations. rate Clock CPI IC time cycle Clock CPI IC time CPU     
  • 33. Computer Architecture- 33 CPU Performance (3/5)  The pieces fit together of CPU time  A α% improvement in any one of three pieces leads to a α% improvement in CPU time.  Unfortunately, it is difficult to change one parameter in complete isolation form others, because the technologies of them are interdependent: » Clock cycle time: Hardware technology and organization; » CPI: Organization and instruction set architecture; » Instruction count: Instruction set architecture and compiler technology. time CPU program Seconds cycle Clock Seconds n Instructio cycles Clock Program ns Instructio program time cycle cycles clock time cycle Clock program a for cycles clock CPU Time CPU          † Processor performance is dependent upon three characteristics: instruction count, clock cycles per instruction and clock cycle (or rate). † Computer architecture is focus on CPI and IC parameters.
  • 34. Computer Architecture- 34 CPU Performance (4/5)  To calculate the number of total processor clock cycles as  To express CPU time again  And overall CPI as i n i i CPI IC cycles clock CPU 1     ICi: the number of times instruction i is executed in a program. CPIi: the average number of clocks per instruction for instruction i. † ICi/IC presents the fraction of occurrences of that instruction in a program. † It is useful in designing the processor. time cycle Clock CPI IC time CPU 1            i n i i               n i i i i n i i 1 1 CPI count n Instructio IC count n Instructio CPI IC CPI Hint: CPIi should be measured because pipeline effects, cache misses, and any other memory system inefficiencies.
  • 35. Computer Architecture- 35 CPU Performance (5/5)  Example 6 (p.43): Suppose we have made the following measurements: Frequency of FP operations = 25%, Average CPI of FP operations =4.0, Average CPI of other instructions = 1.33, Frequency of FPSQR = 2%, CPI of FPSQR =20. Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives using the processor performance equation.  Answer First, observe that only the CPI changes; the clock rate and instruction count remain identical. We start by finding the original CPI with neither enhancement; We can compute the CPI for the enhanced FPSQR by subtracting the cycles saved from the original CPI:     0 . 2 75% 1.33 25% 4 count n Instructio IC CPI CPI 1 original                n i i i     64 . 1 2 - 20 % 2 0 . 2 CPI CPI 2% CPI CPI only FPSQR new of FPSQR old original FPSQR new with             62 . 1 5 . 2 % 25 1.33 75% CPI FP new      23 . 1 625 . 1 00 . 2 CPI CPI CPI cycle Clock IC CPI cycle Clock IC time CPU time CPU Speedup FP new original FP new original FP new original FP new         
  • 36. Computer Architecture- 36 Amdahl's Law vs. CPU Performance  CPU performance equation is better than Amdahl’s Law  Possible to measure the constituent parts;  To measure the fraction of execution time for which a set of instructions is responsible;  For an existing processor, to measure execution time and clock speed is easy;  The challenge lies in discovering the instruction count or the CPI. » Most new processors include counter for both instructions executed and for clock cycles.

Editor's Notes