SlideShare a Scribd company logo
Low Power Design
of Integrated Systems
Assoc. Prof. Dimitrios Soudris
dsoudris@ee.duth.gr
Technology Directions:
SIA Roadmap
Year 1999 2002 2005 2008 2011 2014
Feature size (nm) 180 130 100 70 50 35
Logic trans/cm2
6.2M 18M 39M 84M 180M 390M
Cost/trans (mc) 1.735 .580 .255 .110 .049 .022
#pads/chip 1867 2553 3492 4776 6532 8935
Clock (MHz) 1250 2100 3500 6000 10000 16900
Chip size (mm2
) 340 430 520 620 750 900
Wiring levels 6-7 7 7-8 8-9 9 10
Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5
High-perf pow (W) 90 130 160 170 175 183
Battery pow (W) 1.4 2 2.4 2.8 3.2 3.7
Technology Process Evolution
Technology Directions:
SIA Roadmap 2002
Transistors
#Transistors
Frequency
Performance
Performance
Power Consumption
Power consumption
5378086.ppt
Power Terminology
• Power is the rate at which energy is delivered
or exchanged
» electrical energy is converted to heat energy
during operation
• Power Dissipation - rate at which energy is
taken from the source (Vdd ) and converted
into heat
Why Smaller Power?
• Large Market of Portable devices
– e.g. laptops, mobile phones
• Achieve larger transistor integration
– Pentium IV contains 42 million transistors
– Teraflops chip contains 1.9 billion
transistors
• Need for “green” computers
– 10% of total electrical energy consumed by
PCs
Battery Technology Improvements
The Industry’s Reaction
• Reduce chip capacitance through process scaling
==> Expensive
• Reduce Voltage levels from 5V  3.3V 2V
==> Industry is hard to move (microprocessors,
memory,...)
• Better Circuit Techniques
==> Gated clocks, Power-Down of non-operational
units…
• Example: IBM 80 MHz PowerPC RISC (3 W @ 3.3V)
–Power Management Logic determines activity on per cycle basis
–Clocks of idle blocks are turned off  12-30% savings
–Doze - Nap and Sleep mode (5 mW)
Example: Intel Pentium-II processor
• Pentium-1: 15 Watt (5V - 66MHz)
• Pentium-2: 8 Watt (3.3V- 133 MHz)
Where Does Power Go in CMOS?
• The power consumption in digital CMOS circuits
Pavg = Pdynamic + Pshort-circuit + Pleakage
• Dynamic Power Consumption
• Short Circuit Currents
• Leakage (Static)
Charging and Discharging Capacitors
Short Circuit Path between Supply Rails during Switching
Leaking diodes and transistors
Present & Future in Power
Consumption
Dynamic Power Consumption(1)
• where VDD supply voltage, CL capacitance, N is the average
number of transitions per clock cycle, and f frequency operation
O UT
CL
Charging
current
O UT
CL
Discharging
current
(b) (c)
IN O UT
CL
(a)
Vdd
Vdd
Vdd
P C V N f
dynamic L dd
   
2
• For technologies up to 0.35 m, the dynamic
consumption is about 80% of the total consumption
• Goal ===> reduce dynamic power consumption
– reduction capacitance
– reduction of supply voltage
– reduction of frequency
– reduction of switching activity
– or combination of above factors
Dynamic Power Consumption (2)
Leakage current consumption
• the reverse-bias diode leakage at the transistor
drains and
• the sub-threshold current through an turned-off
transistor channel
p+ p+
n-type substrate
+
Vdd
leakage
current
reversed-biased diode
(drain-substrate)
gate
The leakage of a reverse-biased pMOS transistor.
0.5 1 1.5 2
0
10-15
10-13
10-9
10-11
10-7
10-3
10-5
Subthreshold
region
Saturated
region
Decreasing V DS
, Vdd
Log ID
VGS, volts
Subthreshold leakage with respect to gate-source
voltage
5378086.ppt
The Design Flow
System
Specifications
System-Level Design
Architecture-Level
Design
Logic-Level Design
Circuit-Level Design /
Layout synthesis
System
Specifications
System-Level Design
System-Level
Analysis/Estimation
Architecture-Level
Design
Architecture-Level
Analysis/Estimation
Logic-Level Design
Logic-Level
Analysis/Estimation
Circuit-Level Design /
Layout synthesis
Circuit-Level
Analysis/Estimation
Power models
for S ystem-level
components
Power models
for macrocells,
control logic
Power models
for gates, cells
(a)
(b)
Power savings in terms of the design level
Systemlevel
Behavior level
Logic level
Transistor level
Layout level
RTlevel
10-20 x
2-5 x
20-50%
Increasing
power
savings
Lower Vdd Increases Delay
CL * Vdd
I
=
Td
Td(Vdd=5)
Td(Vdd=2)
=
(2) * (5 - 0.7)2
(5) * (2 - 0.7)2
 4
I ~ (Vdd - Vt)2
Relatively independent of logic function and style.
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
5.50
6.00
6.50
7.00
7.50
2.00 4.00 6.00
Vdd (volts)
NORMALIZED
DELAY
adder (SPICE)
microcoded DSP chip
multiplier
adder
ring oscillator
clock generator
2.0m technology
P x td = Et = CL * Vdd
2
E(Vdd=2)
=
(CL) * (2)2
(CL) * (5)2
E(Vdd=5)
Strong function of voltage (V2
dependence).
Relatively independent of logic function and style.
E(Vdd=2)  0.16 E(Vdd =5)
0.03
0.05
0.07
0.1
0.15
0.20
0.30
0.50
0.70
1.00
1.5
1 2 5
51 stage ring oscillator
8-bit adder
Vdd (volts)
quadratic dependence
NORMALIZED
POWER-DELAY
PRODUCT
Power Delay Product Improves with lowering VDD.
Reducing Vdd
Lowering the Threshold
DESIGN FOR PLeakage == PDynamic
Vt = 0.2
Vt = 0
I
D
VGS
Reduces the Speed Loss, But Increases Leakage
Vdd
Delay
2Vt
Interesting Design Approach:
Transistor Sizing for Power
Minimization
Minimum sized devices are usually optimal for low-power.
Small W/L’s
Large W/L’s
Higher Voltage
Lower Voltage
Lower Capacitance
Higher Capacitance
Larger sized devices are useful only when interconnect dominated.
Techniques to reduce supply voltage
Algorithm
Architecture
Circuit/Logic
Technology
Transformation to exploit
concurrency
Parallelism and Pipelining
Transistor Sizing, Fast Logic
Structures
Threshold Voltage Reduction,
Feature Size scaling
Techniques to minimizing the
switched capacitance
Partitioning, Power-down, power states
Complexity, Concurrency, Regularity,
Locality, Data representation
Concurrency, Instruction set selection,
Signal correlations,
Data representation, Data Encoding
Transistor sizing, Logic optimization,
Power down, Layout Optimization
Advanced packaging, SOI
Architecture
Circuit/Logic
Technology
Algorithm
U
System
16-bit carry-select
1
3.6
4.4
9
10
33
relative
energy/operation
16-bit M
ultiplier
8x128x16 SRAM
(read)
8x128x16 SRAM
(write)
External I/O
Access
16 bit M
emory Access
relative
energy
Storage
Interconnect
Other RISC
components
0.0
0.2
0.4
clocks
Power consumption of transfer and storage
over datapath operations both in hardware
[Men95] and software [Tiw94, Gon96] .
Architecture Power Optimization
Techniques
• Architecture-driven voltage reduction: The key idea is to
speed up the circuit in order to be able reduces voltage while
meeting throughput rate constraints. Voltage reduction can
be achieved by introducing parallelism in hardware or
inserting flip-flops
• Switching activity minimization: Try to prevent the
generation and propagation of spurious transitions or to
reduce the number of transitions, e.g. retiming, path
balancing, data representation
• Switched capacitance minimization: Aim at the minimization
of switched capacitance
• Dynamic power management: Under certain conditions, a
circuit part becomes inactive, avoiding unnecessary
calculations, e.g. gated clocks, operand isolation, pre-
computation, and guarded evaluation
Architecture Trade-offs:
Reference Data Path
• Critical path delay  Tadder + Tcomparator (= 25ns),  fref = 40MHz
• Total capacitance being switched = Cref
• Vdd = Vref = 5V
• Power for reference datapath = Pref = Cref Vref
2
fref
Voltage Reduction Technique:
Parallelism
• The clock rate can be reduced by half with the same throughput
 fpar = fref / 2
• Vpar = Vref / 1.7 Cpar = 2.15 Cref
• Ppar = (2.15 Cref ) (Vref /1.7)2
(fref /2)  0.36 P ref
Voltage Reduction Technique:
Pipeline
• fpipe = fref, Cpipe = 1.1 Cref, Vpipe = Vref /1.7
• Voltage can be dropped while maintaining the original
throughput
• Ppipe = Cpipe Vpipe
2
fpipe = (1.1 Cref ) (Vref /1.7)2
fref = 0.37 Pref
Comparisons
Logic Style and Power Consumption
• Power-delay product improves as voltage decreases
• The “best” logic style minimizes power-delay for a given delay
constraint
The concept of gating clock signals
0 1
REG clock
X Y
B
A <
<
clock
gated
clock
scheme 1
<
clock
gated
clock
scheme 2
comparator
output
gated clock
(scheme 2)
gated clock
(scheme 1)
clock
0
0
0
0
1 clock period
(a) (c)
(b)
Resource Sharing Can Increase
Activity
Global bus architecture Local bus architecture
Shared Resources incur Switching Overhead
Reducing Effective Capacitance
Data representation
• Sign-extension activity significantly reduced using
sign-magnitude representation
Switching Activity in Adders
Switching Activity in Multipliers
Signals and Operations Reordering
• Example: complex multiplication
Trading a multiplication for an addition
(a) (b)
x
Xr
x
-
Xi
Ar
Ai
Yr
x
Xr
x
+
Xi
Ai
Ar
Yi
Ai-Ar
x
Xr
x
+
Ar
Yi
x
Xi
Yr
Ai+Ar
-
+
Xr Xi
Module Selection
* *
*i ii iii
+i
+ii
(a)
(c)
(d)
* *
*i ii iii
+
+ii
*
ii iii
+i
+ii
*
*i
Area=2744
Latency=30 ns
Power=1199μW
ripple
adder
carry
loohahead
adder
Area=3959
Latency=20 ns
Power=1467μW
array
multiplier
wallace
multiplier
Area=16185
Latency=60 ns
Power=18540μW
Area=18443
Latency=40 ns
Power=23545μW
RTL
Library
(b)
Glitching activity reduction (3)
x y
z
ARCHITECTURE 1
Power Consumption:
Without glitches: 823.9 μW
With glitches: 1650 μW
ARCHITECTURE 2
Power Consumption:
Without glitches: 951.7 μW
With glitches: 1357.7 μW
Function
if (x < y) then
z=c+d
else
z=a+b
a c
0 1
x y
a b c d
b d
0 1
0 1
z
Two-Level Logic Circuits
Switching Activity Minimization (1)
• Taking into account the static and transition
probabilities (i.e. temporal correlation) of the primary
inputs, we can insert in certain gates of the first logic
level (i.e. AND gates), additional input signals
resulting into reduced switching activity
• Appropriately-selected input signals force the
outputs of the AND gates to logic level zero for a
number of combinations of the binary input signals
Two-Level Logic Circuits Switching
Activity Minimization (2)
• Example:
• Signal x3 exhibits low-transition probability and
high static-1 probability, while the signals x0 , x1,
and x2 are characterized by high-transition
probabilities
F'
g4
g4
g1
g2
g3
x0
x1
x0
x2
x0
x3
x3
'
y1
'
y2
'
y3
F
g4
g1
g2
g3
x0
x1
x0
x2
x0
x3
y1
y2
y3
g4
Intial Logic Circuit Modified Logic circuit
F x x x x x x
  
0 1 0 2 0 3
• A. Chandrakasan and R. Brodersen, “Low Power CMOS Design”,
Kluwer Academic Publishers, 1995
• Christian Piguet, Editor, « Low-Power Electronics Design”, CRC
Press, November 2004
• D. Soudris, C. Piguet, C. Goutis, “Designing CMOS Circuits for Low-
Power”, Kluwer Academic Press, October 2002
• F. Catthoor, K. Danckaert, et. al.: 2002, Data Access and Storage
Management for Embedded Programmable Processors. Kluwer
Academic Publishers
• Stamatis Vassiliadis and Dimitrios Soudris, “Fine- and Coarse-
Grain Reconfigurable Computing” Springer,
Dordrecht/London/Boston, August 2007
• http://vlsi.ee.duth.gr/~dsoudris
• AMDREL website  http://vlsi.ee.duh.gr/amdrel
Additional Info

More Related Content

PDF
Low power sram
IAEME Publication
 
PDF
Linux on RISC-V with Open Hardware (ELC-E 2020)
Drew Fustini
 
PDF
Power estimation in low power vlsi design
Dr.rukmani Devi
 
DOCX
VLSI & E-CAD Lab Manual
Amairullah Khan Lodhi
 
PPTX
Slow peripheral interfaces (i2 c spi uart)
PREMAL GAJJAR
 
PPTX
Op amp basics
anju_karsh
 
PDF
Superscalar and VLIW architectures
Amit Kumar Rathi
 
PDF
Emertxe Certified Embedded Professional (ECEP) : Induction
Emertxe Information Technologies Pvt Ltd
 
Low power sram
IAEME Publication
 
Linux on RISC-V with Open Hardware (ELC-E 2020)
Drew Fustini
 
Power estimation in low power vlsi design
Dr.rukmani Devi
 
VLSI & E-CAD Lab Manual
Amairullah Khan Lodhi
 
Slow peripheral interfaces (i2 c spi uart)
PREMAL GAJJAR
 
Op amp basics
anju_karsh
 
Superscalar and VLIW architectures
Amit Kumar Rathi
 
Emertxe Certified Embedded Professional (ECEP) : Induction
Emertxe Information Technologies Pvt Ltd
 

What's hot (20)

PDF
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Luke Tillman
 
PDF
Unit 2 processor&amp;memory-organisation
Pavithra S
 
PPTX
ASIC Design Flow | Physical Design | VLSI
Jayant Suthar
 
PPTX
quine mc cluskey method
Unsa Shakir
 
PDF
Unit 2_Noise.pdf
RavindraGahane
 
PDF
CMOS logic circuits
Mahesh_Naidu
 
PDF
Arm cm3 architecture_and_programmer_model
Ganesh Naik
 
PDF
Topdown parsing
Royalzig Luxury Furniture
 
PPT
Types of instructions
ihsanjamil
 
PPTX
Intel x86 and ARM Data types
Rowena Cornejo
 
PDF
Advanced Low Power Techniques in Chip Design
Dr. Shivananda Koteshwar
 
PPTX
Low Power VLSI Design Presentation_final
JITENDER -
 
PPT
ARM7TDM
Ramasubbu .P
 
PPTX
Monte carlo analysis
GargiKhanna1
 
PDF
Code generation in Compiler Design
Kuppusamy P
 
PPTX
Low power in vlsi with upf basics part 1
SUNODH GARLAPATI
 
PDF
Process Scheduler and Balancer in Linux Kernel
Haifeng Li
 
PDF
Heuristic search
Soheil Khodayari
 
PDF
from Binary to Binary: How Qemu Works
Zhen Wei
 
PPSX
A Comparison Of Vlsi Interconnect Models
happybhatia
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Luke Tillman
 
Unit 2 processor&amp;memory-organisation
Pavithra S
 
ASIC Design Flow | Physical Design | VLSI
Jayant Suthar
 
quine mc cluskey method
Unsa Shakir
 
Unit 2_Noise.pdf
RavindraGahane
 
CMOS logic circuits
Mahesh_Naidu
 
Arm cm3 architecture_and_programmer_model
Ganesh Naik
 
Topdown parsing
Royalzig Luxury Furniture
 
Types of instructions
ihsanjamil
 
Intel x86 and ARM Data types
Rowena Cornejo
 
Advanced Low Power Techniques in Chip Design
Dr. Shivananda Koteshwar
 
Low Power VLSI Design Presentation_final
JITENDER -
 
ARM7TDM
Ramasubbu .P
 
Monte carlo analysis
GargiKhanna1
 
Code generation in Compiler Design
Kuppusamy P
 
Low power in vlsi with upf basics part 1
SUNODH GARLAPATI
 
Process Scheduler and Balancer in Linux Kernel
Haifeng Li
 
Heuristic search
Soheil Khodayari
 
from Binary to Binary: How Qemu Works
Zhen Wei
 
A Comparison Of Vlsi Interconnect Models
happybhatia
 
Ad

Similar to 5378086.ppt (20)

PPTX
Low power
preeti banra
 
PPT
3-Anandi.ppt
ECEHoD16
 
PPT
LPVLSI.ppt
8885684828
 
PPT
Anandi.ppt
Godwinraj D
 
PPT
Low power methods.ppt
KishoreKumarREnginee
 
PDF
A Literature Review On Design Strategies And Methodologies Of Low Power VLSI ...
Nathan Mathis
 
PDF
Analysis Of Optimization Techniques For Low Power VLSI Design
Amy Cernava
 
PPTX
LOW POWER DESIGN VLSI
Duronto riyad
 
PDF
Optimized Design of an Alu Block Using Power Gating Technique
IJERA Editor
 
PPT
low current systems design and implementation ppt presentation
ssuserd0023d1
 
PDF
A Survey on Low Power VLSI Designs
IJEEE
 
PPTX
ECE6003-Module_1.pptx electronics and communication
muskans14
 
PPT
Low power vlsi design
Vinchipsytm Vlsitraining
 
PDF
Embedded Systems Power Management
Patrick Bellasi
 
PPT
LPflow_updated.ppt
ssuser36861c
 
PPT
C:\fakepath\apache track d updated
Alona Gradman
 
PPT
Apache track d updated
Alona Gradman
 
PPTX
Trends and challenges in vlsi
labishettybhanu
 
PDF
Low-Power Design and Verification
DVClub
 
Low power
preeti banra
 
3-Anandi.ppt
ECEHoD16
 
LPVLSI.ppt
8885684828
 
Anandi.ppt
Godwinraj D
 
Low power methods.ppt
KishoreKumarREnginee
 
A Literature Review On Design Strategies And Methodologies Of Low Power VLSI ...
Nathan Mathis
 
Analysis Of Optimization Techniques For Low Power VLSI Design
Amy Cernava
 
LOW POWER DESIGN VLSI
Duronto riyad
 
Optimized Design of an Alu Block Using Power Gating Technique
IJERA Editor
 
low current systems design and implementation ppt presentation
ssuserd0023d1
 
A Survey on Low Power VLSI Designs
IJEEE
 
ECE6003-Module_1.pptx electronics and communication
muskans14
 
Low power vlsi design
Vinchipsytm Vlsitraining
 
Embedded Systems Power Management
Patrick Bellasi
 
LPflow_updated.ppt
ssuser36861c
 
C:\fakepath\apache track d updated
Alona Gradman
 
Apache track d updated
Alona Gradman
 
Trends and challenges in vlsi
labishettybhanu
 
Low-Power Design and Verification
DVClub
 
Ad

More from kavita417551 (6)

PPT
5172197.ppt
kavita417551
 
PPT
9402094.ppt
kavita417551
 
PPT
5006278.ppt
kavita417551
 
PPT
9077262.ppt
kavita417551
 
PPT
11136442.ppt
kavita417551
 
PDF
Lec-2.pdf
kavita417551
 
5172197.ppt
kavita417551
 
9402094.ppt
kavita417551
 
5006278.ppt
kavita417551
 
9077262.ppt
kavita417551
 
11136442.ppt
kavita417551
 
Lec-2.pdf
kavita417551
 

Recently uploaded (20)

PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PPT
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PDF
All chapters of Strength of materials.ppt
girmabiniyam1234
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PDF
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PDF
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Inventory management chapter in automation and robotics.
atisht0104
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
All chapters of Strength of materials.ppt
girmabiniyam1234
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 

5378086.ppt

  • 1. Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris [email protected]
  • 2. Technology Directions: SIA Roadmap Year 1999 2002 2005 2008 2011 2014 Feature size (nm) 180 130 100 70 50 35 Logic trans/cm2 6.2M 18M 39M 84M 180M 390M Cost/trans (mc) 1.735 .580 .255 .110 .049 .022 #pads/chip 1867 2553 3492 4776 6532 8935 Clock (MHz) 1250 2100 3500 6000 10000 16900 Chip size (mm2 ) 340 430 520 620 750 900 Wiring levels 6-7 7 7-8 8-9 9 10 Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5 High-perf pow (W) 90 130 160 170 175 183 Battery pow (W) 1.4 2 2.4 2.8 3.2 3.7
  • 3. Technology Process Evolution Technology Directions: SIA Roadmap 2002
  • 9. Power Terminology • Power is the rate at which energy is delivered or exchanged » electrical energy is converted to heat energy during operation • Power Dissipation - rate at which energy is taken from the source (Vdd ) and converted into heat
  • 10. Why Smaller Power? • Large Market of Portable devices – e.g. laptops, mobile phones • Achieve larger transistor integration – Pentium IV contains 42 million transistors – Teraflops chip contains 1.9 billion transistors • Need for “green” computers – 10% of total electrical energy consumed by PCs
  • 12. The Industry’s Reaction • Reduce chip capacitance through process scaling ==> Expensive • Reduce Voltage levels from 5V  3.3V 2V ==> Industry is hard to move (microprocessors, memory,...) • Better Circuit Techniques ==> Gated clocks, Power-Down of non-operational units… • Example: IBM 80 MHz PowerPC RISC (3 W @ 3.3V) –Power Management Logic determines activity on per cycle basis –Clocks of idle blocks are turned off  12-30% savings –Doze - Nap and Sleep mode (5 mW)
  • 13. Example: Intel Pentium-II processor • Pentium-1: 15 Watt (5V - 66MHz) • Pentium-2: 8 Watt (3.3V- 133 MHz)
  • 14. Where Does Power Go in CMOS? • The power consumption in digital CMOS circuits Pavg = Pdynamic + Pshort-circuit + Pleakage • Dynamic Power Consumption • Short Circuit Currents • Leakage (Static) Charging and Discharging Capacitors Short Circuit Path between Supply Rails during Switching Leaking diodes and transistors
  • 15. Present & Future in Power Consumption
  • 16. Dynamic Power Consumption(1) • where VDD supply voltage, CL capacitance, N is the average number of transitions per clock cycle, and f frequency operation O UT CL Charging current O UT CL Discharging current (b) (c) IN O UT CL (a) Vdd Vdd Vdd P C V N f dynamic L dd     2
  • 17. • For technologies up to 0.35 m, the dynamic consumption is about 80% of the total consumption • Goal ===> reduce dynamic power consumption – reduction capacitance – reduction of supply voltage – reduction of frequency – reduction of switching activity – or combination of above factors Dynamic Power Consumption (2)
  • 18. Leakage current consumption • the reverse-bias diode leakage at the transistor drains and • the sub-threshold current through an turned-off transistor channel p+ p+ n-type substrate + Vdd leakage current reversed-biased diode (drain-substrate) gate The leakage of a reverse-biased pMOS transistor. 0.5 1 1.5 2 0 10-15 10-13 10-9 10-11 10-7 10-3 10-5 Subthreshold region Saturated region Decreasing V DS , Vdd Log ID VGS, volts Subthreshold leakage with respect to gate-source voltage
  • 20. The Design Flow System Specifications System-Level Design Architecture-Level Design Logic-Level Design Circuit-Level Design / Layout synthesis System Specifications System-Level Design System-Level Analysis/Estimation Architecture-Level Design Architecture-Level Analysis/Estimation Logic-Level Design Logic-Level Analysis/Estimation Circuit-Level Design / Layout synthesis Circuit-Level Analysis/Estimation Power models for S ystem-level components Power models for macrocells, control logic Power models for gates, cells (a) (b)
  • 21. Power savings in terms of the design level Systemlevel Behavior level Logic level Transistor level Layout level RTlevel 10-20 x 2-5 x 20-50% Increasing power savings
  • 22. Lower Vdd Increases Delay CL * Vdd I = Td Td(Vdd=5) Td(Vdd=2) = (2) * (5 - 0.7)2 (5) * (2 - 0.7)2  4 I ~ (Vdd - Vt)2 Relatively independent of logic function and style. 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50 7.00 7.50 2.00 4.00 6.00 Vdd (volts) NORMALIZED DELAY adder (SPICE) microcoded DSP chip multiplier adder ring oscillator clock generator 2.0m technology
  • 23. P x td = Et = CL * Vdd 2 E(Vdd=2) = (CL) * (2)2 (CL) * (5)2 E(Vdd=5) Strong function of voltage (V2 dependence). Relatively independent of logic function and style. E(Vdd=2)  0.16 E(Vdd =5) 0.03 0.05 0.07 0.1 0.15 0.20 0.30 0.50 0.70 1.00 1.5 1 2 5 51 stage ring oscillator 8-bit adder Vdd (volts) quadratic dependence NORMALIZED POWER-DELAY PRODUCT Power Delay Product Improves with lowering VDD. Reducing Vdd
  • 24. Lowering the Threshold DESIGN FOR PLeakage == PDynamic Vt = 0.2 Vt = 0 I D VGS Reduces the Speed Loss, But Increases Leakage Vdd Delay 2Vt Interesting Design Approach:
  • 25. Transistor Sizing for Power Minimization Minimum sized devices are usually optimal for low-power. Small W/L’s Large W/L’s Higher Voltage Lower Voltage Lower Capacitance Higher Capacitance Larger sized devices are useful only when interconnect dominated.
  • 26. Techniques to reduce supply voltage Algorithm Architecture Circuit/Logic Technology Transformation to exploit concurrency Parallelism and Pipelining Transistor Sizing, Fast Logic Structures Threshold Voltage Reduction, Feature Size scaling
  • 27. Techniques to minimizing the switched capacitance Partitioning, Power-down, power states Complexity, Concurrency, Regularity, Locality, Data representation Concurrency, Instruction set selection, Signal correlations, Data representation, Data Encoding Transistor sizing, Logic optimization, Power down, Layout Optimization Advanced packaging, SOI Architecture Circuit/Logic Technology Algorithm U System
  • 28. 16-bit carry-select 1 3.6 4.4 9 10 33 relative energy/operation 16-bit M ultiplier 8x128x16 SRAM (read) 8x128x16 SRAM (write) External I/O Access 16 bit M emory Access relative energy Storage Interconnect Other RISC components 0.0 0.2 0.4 clocks Power consumption of transfer and storage over datapath operations both in hardware [Men95] and software [Tiw94, Gon96] .
  • 29. Architecture Power Optimization Techniques • Architecture-driven voltage reduction: The key idea is to speed up the circuit in order to be able reduces voltage while meeting throughput rate constraints. Voltage reduction can be achieved by introducing parallelism in hardware or inserting flip-flops • Switching activity minimization: Try to prevent the generation and propagation of spurious transitions or to reduce the number of transitions, e.g. retiming, path balancing, data representation • Switched capacitance minimization: Aim at the minimization of switched capacitance • Dynamic power management: Under certain conditions, a circuit part becomes inactive, avoiding unnecessary calculations, e.g. gated clocks, operand isolation, pre- computation, and guarded evaluation
  • 30. Architecture Trade-offs: Reference Data Path • Critical path delay  Tadder + Tcomparator (= 25ns),  fref = 40MHz • Total capacitance being switched = Cref • Vdd = Vref = 5V • Power for reference datapath = Pref = Cref Vref 2 fref
  • 31. Voltage Reduction Technique: Parallelism • The clock rate can be reduced by half with the same throughput  fpar = fref / 2 • Vpar = Vref / 1.7 Cpar = 2.15 Cref • Ppar = (2.15 Cref ) (Vref /1.7)2 (fref /2)  0.36 P ref
  • 32. Voltage Reduction Technique: Pipeline • fpipe = fref, Cpipe = 1.1 Cref, Vpipe = Vref /1.7 • Voltage can be dropped while maintaining the original throughput • Ppipe = Cpipe Vpipe 2 fpipe = (1.1 Cref ) (Vref /1.7)2 fref = 0.37 Pref
  • 34. Logic Style and Power Consumption • Power-delay product improves as voltage decreases • The “best” logic style minimizes power-delay for a given delay constraint
  • 35. The concept of gating clock signals 0 1 REG clock X Y B A < < clock gated clock scheme 1 < clock gated clock scheme 2 comparator output gated clock (scheme 2) gated clock (scheme 1) clock 0 0 0 0 1 clock period (a) (c) (b)
  • 36. Resource Sharing Can Increase Activity
  • 37. Global bus architecture Local bus architecture Shared Resources incur Switching Overhead Reducing Effective Capacitance
  • 38. Data representation • Sign-extension activity significantly reduced using sign-magnitude representation
  • 40. Switching Activity in Multipliers
  • 41. Signals and Operations Reordering • Example: complex multiplication Trading a multiplication for an addition (a) (b) x Xr x - Xi Ar Ai Yr x Xr x + Xi Ai Ar Yi Ai-Ar x Xr x + Ar Yi x Xi Yr Ai+Ar - + Xr Xi
  • 42. Module Selection * * *i ii iii +i +ii (a) (c) (d) * * *i ii iii + +ii * ii iii +i +ii * *i Area=2744 Latency=30 ns Power=1199μW ripple adder carry loohahead adder Area=3959 Latency=20 ns Power=1467μW array multiplier wallace multiplier Area=16185 Latency=60 ns Power=18540μW Area=18443 Latency=40 ns Power=23545μW RTL Library (b)
  • 43. Glitching activity reduction (3) x y z ARCHITECTURE 1 Power Consumption: Without glitches: 823.9 μW With glitches: 1650 μW ARCHITECTURE 2 Power Consumption: Without glitches: 951.7 μW With glitches: 1357.7 μW Function if (x < y) then z=c+d else z=a+b a c 0 1 x y a b c d b d 0 1 0 1 z
  • 44. Two-Level Logic Circuits Switching Activity Minimization (1) • Taking into account the static and transition probabilities (i.e. temporal correlation) of the primary inputs, we can insert in certain gates of the first logic level (i.e. AND gates), additional input signals resulting into reduced switching activity • Appropriately-selected input signals force the outputs of the AND gates to logic level zero for a number of combinations of the binary input signals
  • 45. Two-Level Logic Circuits Switching Activity Minimization (2) • Example: • Signal x3 exhibits low-transition probability and high static-1 probability, while the signals x0 , x1, and x2 are characterized by high-transition probabilities F' g4 g4 g1 g2 g3 x0 x1 x0 x2 x0 x3 x3 ' y1 ' y2 ' y3 F g4 g1 g2 g3 x0 x1 x0 x2 x0 x3 y1 y2 y3 g4 Intial Logic Circuit Modified Logic circuit F x x x x x x    0 1 0 2 0 3
  • 46. • A. Chandrakasan and R. Brodersen, “Low Power CMOS Design”, Kluwer Academic Publishers, 1995 • Christian Piguet, Editor, « Low-Power Electronics Design”, CRC Press, November 2004 • D. Soudris, C. Piguet, C. Goutis, “Designing CMOS Circuits for Low- Power”, Kluwer Academic Press, October 2002 • F. Catthoor, K. Danckaert, et. al.: 2002, Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers • Stamatis Vassiliadis and Dimitrios Soudris, “Fine- and Coarse- Grain Reconfigurable Computing” Springer, Dordrecht/London/Boston, August 2007 • http://vlsi.ee.duth.gr/~dsoudris • AMDREL website  http://vlsi.ee.duh.gr/amdrel Additional Info