SlideShare a Scribd company logo
VLSI Design Techniques
UNIT IV
2
Multiplexers
• A Multiplexer consists of n input lines and
one output line.
• 2:1 multiplexer chooses between two
inputs
S W1 W0 F
0 X 0 0
0 X 1 1
1 0 X 0
1 1 X 1
0
1
S
W0
W1
F
3
Gate-Level Mux Design
•
• How many transistors are needed? 20
1 0 (too many transistors)
F SW SW
 
4
4
W1
W0
S F
4
2
2
2 F
2
W1
W0
S
4
Transmission Gate Mux
• Non restoring MUX uses two transmission
gates
– Only 4 transistors S
S
W0
W1
F
S
• Verilog: 2x1 MUX
– Uses the conditional ?: operator
• Software designers detest this operator
• Hardware designers revel in its beauty
• Behavioral Style in Verilog
Must be reg type when used as LHS in
an always block
Sensitivity list: statements inside the always
block are only executed when one or more
signals in the list changes value
7
4:1 Multiplexer
• 4:1 mux chooses one of 4 inputs using
two selects
– Two levels of 2:1 muxes
– Or four tristates
S0
D0
D1
0
1
0
1
0
1
Y
S1
D2
D3
D0
D1
D2
D3
Y
S1S0 S1S0 S1S0 S1S0
Design of 4:1 Multiplexer
Contd..
// Structural Description
ME- applied electronics Vlsi UNIT IV.ppt
4:1 Multiplexer Example
• Verilog: 4x1 MUX (Data Flow Style)
– This is getting complicated!
– Need a way to specify a “group” of bits like w[0:3]
– Need a way to replace ?: with “if then else”
• Vectored Signals in Verilog
Signals can be grouped as bit vectors
• The order of the bits is user determined
• W has 4 lines with the MSB = W[0] and the LSB = W[3]
• S has two lines with the MSB = S[1] and the LSB = S[0]
Format is [MSB:LSB]
• Hierarchical Design of a 16x1 MUX
w8
w11
s1
w0
s0
w3
w4
w7
w12
w15
s3
s2
f
The Verilog code for mux4x1 must be either in
the same file as mux16x1, or in a separate file
(called mux4x1.v) in the same directory as
mux16x1
Structural style Verilog
• 4x1 MUX: Behavioral Styles
16
1. Decoder
• Definition
• Logic Diagram
• Application (Universal Set)
• Tree of Decoders
17
1. Decoder: Definition
0
y0
y1
y7
I0
I1
I2
1
2
0
1
2
3
4
5
6
7
EN (enable)
n inputs
n= 3
2n
outputs
23
= 8
yi = 1 if En= 1 & (I2, I1, I0 ) = i
yi= 0 otherwise
n to 2n
decoder
function:
.
.
18
• N inputs, 2N
outputs
• One-hot outputs: only one output HIGH at once
1. Decoder: Definition
2:4
Decoder
A1
A0
Y3
Y2
Y1
Y0
00
01
10
11
0 0
0 1
1 0
1 1
0
0
0
1
Y3 Y2 Y1 Y0
A0
A1
0
0
1
0
0
1
0
0
1
0
0
0
19
Decoder: Logic Diagram
y0
I2’
I1’
I0’
y1
I2
I1’
I0’
En
y7
I2
I1
I0
.
.
yi = mi En
y0 = 1 if (I2, I1, I0 ) = (0,0,0) & En = 1
y7 = I2I1I0En
20
Decoder Application: universal set {Decoder, OR}
Example: Implement functions f1(a,b,c) = m(1,2,4)
f2(a,b,c) = m(2,3), and f3(a,b,c) = m(0,5,6)
with a 3-input decoder and OR gates.
I0
y0
y1
.
.
y7
c
b
a
I1
I2
0
1
2
3
4
5
6
7
En
y1
y2
y4 f1
y2
y3
f2
y0
y6 f3
y5
21
• OR minterms
Decoders
2:4
Decoder
A
B
00
01
10
11
Y = AB + AB
Y
AB
AB
AB
AB
Minterm
= A  B
22
Tree of Decoders
Implement a 4-24
decoder with 3-23
decoders.
I0
y0
y1
y7
I1
I2
0
1
2
3
4
5
6
7
I0
y8
y9
y15
I1
I2
0
1
2
3
4
5
6
7
a
d
c
b
23
Implement a 6-26
decoder with 3-23
decoders.
En
D0
I2, I1, I0
D1
y0
y7
y8
y15
D7
y56
y63
En
I2, I1, I0
I2, I1, I0
I5, I4, I3
Tree of Decoders
…
…
2-into-4 decoder
Gate-level diagram
Block diagram
In Block diagrams:
Circles on the input indicate the logic convention of the input signal
Circles on the output indicate the physical configuration of the output gate
2-into-4 decoder with
P. A. Inputs and a N.A. Output
Gate-level diagram
Block diagram
• 2 to 4 Decoder: Behavioral Style
• Hierarchical Design of a 4 to 16 Decoder
Structural style Verilog
dec2to4 to be presented soon ….
w0
En
y0
w1 y1
y2
y3
y8
y9
y10
y11
w2
w0 y0
y1
y2
y3
w0
En
y0
w1 y1
y2
y3
w0
En
y0
w1 y1
y2
y3
y4
y5
y6
y7
w1
w0
En
y0
w1 y1
y2
y3
y12
y13
y14
y15
w0
En
y0
w1 y1
y2
y3
w3
En w
28
2. Encoder
• Definition
• Logic Diagram
• Priority Encoder
29
2. Encoder: Definition
yn-1 …y0
En
A
I2
n
-1
…
I0
8 inputs 3 outputs
y0
y1
y2
0
1
2
3
4
5
6
7
En
At most one Ii = 1.
(yn-1,.., y0 ) = i if Ii = 1 & n = 1
(yn-1,.., y0 ) = 0 otherwise.
A = 1 if En = 1 and one i s.t. Ii = 1
A = 0 otherwise.
Encoder Description:
A
I0
I7
0
1
2
30
Encoder: Logic Diagram
En
I1
I3
I5
I7
y0
En
I2
I3
I6
I7
y1
31
En
I4
I5
I6
I7
y2
En
I0
I1
I6
I7
A
.
.
Encoder: Logic Diagram
• 4 to 2 Binary Encoder
Left extended by x to fill 2 bits
33
Priority Encoder: Definition
Description: Input (I2
n
-1,…, I0), Output (yn-1 ,…,,y0)
(yn-1 ,…,,y0) = i if Ii = 1 & En = 1 & Ik = 0
for all k > i (high bit priority) or
for all k< i (low bit priority).
Eo = 1 if En = 1 & Ii = 0 for all i,
Gs = 1 if En = 1 & i s.t. Ii = 1.
E
(Gs is like A, and Eo tells us if
enable is true or not).
0
1
2
3
4
5
6
7
En
Eo Gs
I0
I7
y0
y1
y2
0
1
2
34
Priority Encoder: Implement a 32-input priority
encoder w/ 8 input priority encoders (high bit priority).
y32, y31, y30
I31-24
Eo Gs
y22, y21, y20
I25-16
Eo Gs
y12, y11, y10
I15-8
Eo Gs
y02, y01, y00
I7-0
Eo Gs
En
• 4 to 2 Priority Encoder
casex vs case
• 4 to 2 Priority Encoder Using a For Loop
A signal that is assigned a value multiple
times in an always block retains its last value
 priority scheme relies on this for correct
setting of Y and z
Comparators
• Example: 2-bit comparator
Comparators
• K-Maps of the
comparator
outputs
Comparators
• Example: MIS 4-bit magnitude
comparator
Comparators
• 4-bit magnitude comparator function
table
Comparators
• Cascading 4-bit
comparators to obtain a
16-bit comparator
Comparators
Shift Registers
• Basic shift registers are classified by
structure according to the following types:
• Serial-in/serial-out
• Parallel-in/serial-out
• Serial-in/parallel-out
• Universal parallel-in/parallel-out
• Ring counter
SISO Shift Register
PISO Shift Register
SIPO Shift Register
PIPO Shift Register
Ring Counter
Arithmetic Circuits
• Machine Arithmetic
– Arithmetic combinational circuits are required for
• Addition
• Subtraction
• Multiplication
• Division (hardware implementation is optional)
– Addition / subtraction can be done easily using full
adders and a minimum of additional logic
Addition of Binary Numbers
Full Adder. The full adder is the fundamental building block
of most arithmetic circuits:
The sum and carry outputs are described as:
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i c
b
c
a
b
a
c
b
a
c
b
a
c
b
a
c
b
a
c 






1
i
i
i
i
i
i
i
i
i
i
i
i
i c
b
a
c
b
a
c
b
a
c
b
a
s 



Full
Adder
Cin
Cout
si
ai bi
Addition of Binary Numbers
Propagate
Propagate
Generate
Generate
Inputs Outputs
ci
ai
bi
si
ci+1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Full-Adder Implementation
Full Adder operations is defined by equations:
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i c
p
c
b
a
c
b
a
c
b
a
c
b
a
c
b
a
s 








i
i
i
i
i
i
i
i
i
i
i
i c
p
g
b
a
c
b
a
c
b
a
c 




1
One-bit adder could be
implemented as shown
Carry-Propagate:
and Carry-Generate gi
i
i
i b
a
p 

i
i
i b
a
g 

cout
cin
si
ai bi
• Behavioral Style Full Adder
data flow styles
behavioral style
• Higher Level Behavioral Style for Adder
– This won't work! Can't loop over “subcircuits”
subcircuits to be instantiated like this
are not allowed in a for loop
High-Speed Addition
i
i
i c
p
s 

i
i
i
i c
p
g
c 

1
One-bit adder could be
implemented more efficiently
because MUX is faster
i
i
i b
a
p 

i
i
i b
a
g 

0
1
s
bi
ai
cout
si
cin
• Ripple-Carry Adder
– A ripple carry adder cascades full adders together
• Simple, but not the most efficient design
• Carry propagate adders are more efficient
FA
xn –1
cn cn 1
-
yn 1
–
sn 1
–
FA
x1
c2
y1
s1
FA
c1
x0 y0
s0
c0
MSB position LSB position
The Ripple-Carry Adder
A0 B0
S0
Co,0
Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2
A3 B3
S3
Co,3
(= Ci,1)
FA FA FA FA
Worst case delay linear with the number of bits
tadder N 1
–
 tcarry tsum
+

td = O(N)
Goal: Make the fastest possible carry path circuit
From Rabaey
Inversion Property
A B
S
Co
Ci FA
A B
S
Co
Ci FA
S A B Ci
 
  S A B Ci
 
 
=
Co A B Ci
 
  Co A B Ci
 
 
=
From Rabaey
Minimize Critical Path by Reducing Inverting
Stages
A0 B0
S0
Co,0
Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2 Co,3
FA’ FA’ FA’ FA’
A3 B3
S3
Odd Cell
Even Cell
Exploit Inversion Property
Note: need 2 different types of cells
From Rabaey
Ripple Carry Adder
Carry-Chain of an RCA implemented using multiplexer from the
standard cell library: ai+1 bi+1 ai bi
ai+2
bi+2
cout
ci+1
ci
si
si+1
si+2
cin
Critical Path
Oklobdzija, ISCAS’88
• Ripple-Carry Adder Using Generate
compiler produces n modules with names
addstage[0].addbit, addstage[1].addbit, …,
addstage[n-1].addbit
Carry Look Ahead Logic (CLA)
i
i
i c
p
s 

i
i
i
i c
p
g
c 

1
i
i
i b
a
p 

i
i
i b
a
g 

0
1
s
bi
ai
cout
si
cin
CLA Definitions: 4-bit Adder
ai bi
Ci
gi pi
ai+1 bi+1
Ci+1
gi+1 pi+1
ai+2 bi+2
Ci+2
gi+2 pi+2
ai+3 bi+3
Ci+3
gi+3 pi+3
Ci+4
1
1
1
1
1
1
1
1
1
1
2 )
(
c
p
p
g
p
g
c
p
g
p
g
c
p
g
c
i
i
i
i
i
i
i
i
i
i
i
i
i

















i
i
i
i
i
i
i
i
i
i
i
i c
p
g
b
a
c
b
a
c
b
a
c 




1
Carry-Lookahead Adder: 4-bits
ai bi
Ci
gi pi
ai+1 bi+1
Ci+1
gi+1 pi+1
ai+2 bi+2
Ci+2
gi+2 pi+2
ai+3 bi+3
Ci+3
gi+3 pi+3
Ci+4
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
c
p
p
p
g
p
p
g
p
g
c
p
p
g
p
g
p
g
c
p
g
c
1
2
1
2
1
2
2
1
1
1
2
2
2
2
2
3 )
(


























i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
c
p
p
p
p
g
p
p
p
g
p
p
g
p
g
g
p
p
g
p
g
p
g
c
p
g
c
1
2
3
1
2
3
1
2
3
2
3
3
1
2
1
2
2
3
3
3
3
3
4 )
(



































Gj Pj
Carry-Lookahead Adder
i
i
i
i
i
i
i
i
i
i
j g
p
p
p
g
p
p
g
p
g
G 1
2
3
1
2
3
2
3
3 







 



i
i
i
i
j p
p
p
p
P 1
2
3 



j
j
j
j c
P
G
c 

 )
1
(
4
One gate delay 
to calculate p, g
One to calculate
P and two for G
Three gate delays
To calculate C4(j+1)
Compare that to 8  in RCA !
ai bi
Cin Cj
Gj
Pj
ai+1 bi+1
gi+1
pi+1 gi pi
ai+2 bi+2
ai+3 bi+3
gi+1pi+1
gi+1pi+1
C4(j+1)
C4j+1
C4j+2
C4j+3
P, G Group
Carry-Lookahead Adder
(Weinberger and Smith)
i
i
i
i
i
i
i
i
i
i
j G
P
P
P
G
P
P
G
P
G 1
2
3
1
2
3
2
3
3
*
G 












i
i
i
i
j P
P
P
P
P 1
2
3
*




j
k
k
j c
P
G
c 4
)
1
(
4 *
* 


Pj
G* P*
C4j+1
Gj
Pj+1
Gj+1
Pj+3
Gj+3
Pj+2
Gj+2
C4j
C4(j+1)
C4j+2
C4j+3
Additional two gate delays
C16 will take a total of 5 vs. 32 for RCA !
Carry-Lookahead Adder
(Weinberger and Smith: original derivation, 1958 )
Carry-Lookahead Adder
(Weinberger and Smith: original derivation )
Carry lookahead addition
• Two ways in which carry=1 from the ith bit-slice
– Generated if both ai and bi are 1
•
– A CarryIn (ci) of 1 is propagated if ai or bi are 1
•
bi
ai
gi 

bi
ai
pi 

Are these carries generated or propagated?
Carry lookahead addition
• Two ways in which carry=1 from the ith bit-slice
– Generated if both ai and bi are 1
•
– A CarryIn (ci) of 1 is propagated if ai or bi are 1
•
• The carry out, ci+1, is therefore
• Using substitution, can get carries in parallel
bi
ai
gi 

bi
ai
pi 

ci
pi
gi
ci 


1
0
0
1
2
3
0
1
2
3
1
2
3
2
3
3
4
0
0
1
2
0
1
2
1
2
2
3
0
0
1
0
1
1
2
0
0
0
1
c
p
p
p
p
g
p
p
p
g
p
p
g
p
g
c
c
p
p
p
g
p
p
g
p
g
c
c
p
p
g
p
g
c
c
p
g
c


































.
.
.
Carry lookahead addition
• Drawbacks
– n+1 input OR and AND gates for nth
input
– Irregular structure with many long wires
• Solution: do two levels of carry
lookahead
– First level generates Result (using carry
lookahead to generate the internal
carries) and propagate and generate
signals for a group of 4 bits (Pi and Gi)
– Second level generates carry out’s for
each group based on carry in and Pi and
Gi from previous group
Carry lookahead addition
• Internal equations for group 0
• Group equations for group 0
• C1 output of carry lookahead unit
bi
ai
gi 

bi
ai
pi 

ci
pi
gi
ci 


1
0
1
2
3
1
2
3
2
3
3
0 g
p
p
p
g
p
p
g
p
g
G 









0
1
2
3
0 p
p
p
p
P 



0
0
0
1 c
P
G
C 


Carry lookahead addition
• Internal equations for group 1
• Group equations for group 1
• C2 output of carry lookahead unit
High Speed Adders
• Carry-Skip Circuits
• Carry-Select Adders
• Carry-Save Adders
Carry-Skip Addition
• CLA group generate (Gi) hardware is complex
• Carry skip addition
– Generate Gi’s by ripple carry of a and b inputs with
cin’s = 0 (except c0)
Generate Pi’s as in carry lookahead
– For each group, combine Gi, Pi, and cin as in carry
lookahead to form group carries
– Generate sums by ripple carry from a and b inputs
and group carries
Carry skip addition
• Operation
– Generate Gi’s through ripple carry
Carry skip addition
• Operation
– Generate Gi’s through ripple carry
– In parallel, generate Pi’s as in carry lookahead
Carry skip addition
• Operation
– Group carries to each block are generated or propagated
Carry skip addition
• Operation
– Group carries to each block are generated or propagated
– Sums are generated in parallel from group carries
Carry-Skip Adder
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3
Co,2
Co,1
Co,0
Ci,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2
Co,1
Co,0
Ci,0
Co,3
Multiplexer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)
then Co3 = C0, else “kill” or “generate”.
Bypass
From Rabaey
Carry-Skip Adder:
N-bits, k-bits/group, r=N/k groups
Gr
G r-
1
...
S
N-k-1
SN-1
a N-1bN-1 b N-k-1
aN-k-1
S (r-1)k-1 S (r-2)k
G1
Go
...
Sk
S
2k-1
a 2k-1
b 2k-1 bk
ak
S
k-1
S
0
...
...
a (r-1)k
b(r-1)k a (r-1)kb (r-1)k
...
a k-1
b k-1
a0 b0
...
Cin
... ... ... ... ... ... ... ...
Pr-1
Pr-2 P1 P0
Cout + + + +
AND
OR
OR
OR OR
AND
AND
AND
critical path, delay =2(k-1)+(N/2-2)
Carry-Skip Adder
  SKIP
RCA
d t
N
t
k
t 








 2
2
1
2
N
tp
ripple adder
bypass adder
4..8
k
Carry-Select Addition
• For each group, do two additions in parallel
– One with cin forced to 0
– One with cin forced to 1
• Generate cin in parallel and use a MUX to select the
correct sum outputs
• Example for 8 bits
MUXes
Carry-select addition
• A larger design
– Why different numbers of bits in each block?
• Hint1: it’s to minimize the adder delay
• Hint2: assume a k-input block has k time units of delay, and the
AND- OR logic has 1 time unit of delay
generated carry
propagated carry
Carry-Select Sum Adder
Carry-Select Adder
Addition under assumption of Cin=0 and Cin =1.
Carry Select Adder:
combining two 32-b VBAs in select mode
Delay =VBA32+ MUX
Carry Save Addition
• A full adder sums 3 inputs and produces 2 outputs
– Carry output has twice weight of sum output
• N full adders in parallel are called carry save adder
– Produce N sums and N carry outs
Z4
Y4
X4
S4
C4
Z3
Y3
X3
S3
C3
Z2
Y2
X2
S2
C2
Z1
Y1
X1
S1
C1
XN...1
YN...1
ZN...1
SN...1
CN...1
n-bit CSA
CSA Application
• Use k-2 stages of CSAs
– Keep result in carry-save redundant form
• Final CPA computes actual result
4-bit CSA
5-bit CSA
0001 0111 1101 0010
+
1011
0101_
0001
0111
+1101
1011
0101_
X
Y
Z
S
C
0101_
1011
+0010
X
Y
Z
S
C
A
B
S
CSA Application
• Use k-2 stages of CSAs
– Keep result in carry-save redundant form
• Final CPA computes actual result
4-bit CSA
5-bit CSA
0001 0111 1101 0010
+
1011
0101_
01010_ 00011
0001
0111
+1101
1011
0101_
X
Y
Z
S
C
0101_
1011
+0010
00011
01010_
X
Y
Z
S
C
01010_
+ 00011
A
B
S
CSA Application
• Use k-2 stages of CSAs
– Keep result in carry-save redundant form
• Final CPA computes actual result
4-bit CSA
5-bit CSA
0001 0111 1101 0010
+
1011
0101_
01010_ 00011
0001
0111
+1101
1011
0101_
X
Y
Z
S
C
0101_
1011
+0010
00011
01010_
X
Y
Z
S
C
01010_
+ 00011
10111
A
B
S
10111
Multiplication
• Example: 1100 : 1210
0101 : 510
Multiplication
• Example: 1100 : 1210
0101 : 510
1100
Multiplication
• Example: 1100 : 1210
0101 : 510
1100
0000
Multiplication
• Example: 1100 : 1210
0101 : 510
1100
0000
1100
Multiplication
• Example: 1100 : 1210
0101 : 510
1100
0000
1100
0000
Multiplication
• Example: 1100 : 1210
0101 : 510
1100
0000
1100
0000
00111100 : 6010
Multiplication
• Example:
• M x N-bit multiplication
– Produce N M-bit partial products
– Sum these to produce M+N-bit product
1100 : 1210
0101 : 510
1100
0000
1100
0000
00111100 : 6010
multiplier
multiplicand
partial
products
product
General Form
• Multiplicand: Y = (yM-1, yM-2, …, y1, y0)
• Multiplier: X = (xN-1, xN-2, …, x1, x0)
• Product:
1 1 1 1
0 0 0 0
2 2 2
M N N M
j i i j
j i i j
j i i j
P y x x y
   

   
   
 
   
 
 
  
x0
y5
x0
y4
x0
y3
x0
y2
x0
y1
x0
y0
y5
y4
y3
y2
y1
y0
x5 x4 x3 x2 x1 x0
x1
y5
x1
y4
x1
y3
x1
y2
x1
y1
x1
y0
x2
y5
x2
y4
x2
y3
x2
y2
x2
y1
x2
y0
x3
y5
x3
y4
x3
y3
x3
y2
x3
y1
x3
y0
x4
y5
x4
y4
x4
y3
x4
y2
x4
y1
x4
y0
x5y5 x5y4 x5y3 x5y2 x5y1 x5y0
p0
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
multiplier
multiplicand
partial
products
product
Dot Diagram
• Each dot represents a bit
partial products
multiplier
x
x0
x15
Array Multiplier
y0
y1
y2
y3
x0
x1
x2
x3
p0
p1
p2
p3
p4
p5
p6
p7
B
A
Sin Cin
Sout
Cout
B
A
Cin
Cout
Sout
Sin
=
CSA
Array
CPA
critical path B
A
Sout
Cout Cin
Cout
Sout
=
Cin
B
A
Rectangular Array
• Squash array to fit rectangular floorplan
y0
y1
y2
y3
x0
x1
x2
x3
p0
p1
p2
p3
p4
p5
p6
p7
Fewer Partial Products
• Array multiplier requires N partial products
• If we looked at groups of r bits, we could
form N/r partial products.
– Faster and smaller?
– Called radix-2r
encoding
• Ex: r = 2: look at pairs of bits
– Form partial products of 0, Y, 2Y, 3Y
– First three are easy, but 3Y requires adder 
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Encoding
• Instead of 3Y, try –Y, then increment next
partial product to add 4Y
• Similarly, for 2Y, try –2Y + 4Y in next
partial product
Booth Hardware
• Booth encoder generates control lines for
each PP
– Booth selectors choose PP bits
Mi
yj
Xi
yj-1
2Xi
PPij
Booth
Selector
Booth
Encoder
x2i+1
x2i
x2i-1
Sign Extension
• Partial products can be negative
– Require sign extension, which is cumbersome
– High fanout on most significant bit
multiplier
x
x0
x15
0
0
0
x-1
x16
x17
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
PP0
PP1
PP2
PP3
PP4
PP5
PP6
PP7
PP8
Simplified Sign Ext.
• Sign bits are either all 0’s or all 1’s
– Note that all 0’s is all 1’s + 1 in proper column
– Use this to reduce loading on MSB
s
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
s
s
1
1
1
1
1
1
1
1
1
1
1
1
1
s
s
1
1
1
1
1
1
1
1
1
1
1
s
s
1
1
1
1
1
1
1
1
1
s
s
1
1
1
1
1
1
1
s
s
1
1
1
1
1
s
s
1
1
1
s
s
1
s
PP0
PP1
PP2
PP3
PP4
PP5
PP6
PP7
PP8
Even Simpler Sign Ext.
• No need to add all the 1’s in hardware
– Precompute the answer!
s
s
s
s
s
s
1
s
s
1
s
s
1
s
s
1
s
s
1
s
s
1
s
s
PP0
PP1
PP2
PP3
PP4
PP5
PP6
PP7
PP8
Advanced Multiplication
• Signed vs. unsigned inputs
• Higher radix Booth encoding
• Array vs. tree CSA networks

More Related Content

Similar to ME- applied electronics Vlsi UNIT IV.ppt (20)

PPT
Combinational circuits
Hareem Aslam
 
PPT
Logic System Design KTU Chapter-4.ppt
Albin562191
 
PPT
combinational circuits dispositivos .ppt
FilibertoMoralesGarc
 
PPTX
Digital VLSI - Unit 2.pptx
SanjaiPrasad
 
PPT
digital electr Chapter-4-Combinational ckt.ppt
HasanujJaman11
 
PPTX
combinational-circuit.pptx it tis creative study of digital electronics for ...
RishabhSingh308993
 
PPT
combinational-circuit.ppt
ShivamRathod34
 
PPT
combinational-circuit presenmtation .ppt
FilibertoMoralesGarc
 
PDF
logic.deghjkl;jldddghbmedgygshedgyssfjfheueh
menna67hassan
 
PDF
Combinational Circuits PPT.pdf
ANKITKUMARSINGH963335
 
PPTX
combinational-circuit (1).pptx for all the digital electronics data
SapnaSingh529565
 
PPT
217456070-Chapter-3_eletrical engineering
KritiArora55
 
PPTX
ENG 202 – Digital Electronics 1 - Chapter 4 (1).pptx
Aishah928448
 
PDF
Combinational and sequential logic
Deepak John
 
PPTX
digital logic circuits, digital component
Rai University
 
PPTX
Lecture 1
MunasarAbdirahman
 
PPT
Decoder encoder
shahzad ali
 
PPT
Digital Design for B.Tech. / B.Sc.
PSK Research Foundation
 
PPT
Unit 2 module-2
praveenabollyjoshi
 
PPTX
6. Combinational Logic with MSI and LSI_1682052077241 (1).pptx
newarboy9
 
Combinational circuits
Hareem Aslam
 
Logic System Design KTU Chapter-4.ppt
Albin562191
 
combinational circuits dispositivos .ppt
FilibertoMoralesGarc
 
Digital VLSI - Unit 2.pptx
SanjaiPrasad
 
digital electr Chapter-4-Combinational ckt.ppt
HasanujJaman11
 
combinational-circuit.pptx it tis creative study of digital electronics for ...
RishabhSingh308993
 
combinational-circuit.ppt
ShivamRathod34
 
combinational-circuit presenmtation .ppt
FilibertoMoralesGarc
 
logic.deghjkl;jldddghbmedgygshedgyssfjfheueh
menna67hassan
 
Combinational Circuits PPT.pdf
ANKITKUMARSINGH963335
 
combinational-circuit (1).pptx for all the digital electronics data
SapnaSingh529565
 
217456070-Chapter-3_eletrical engineering
KritiArora55
 
ENG 202 – Digital Electronics 1 - Chapter 4 (1).pptx
Aishah928448
 
Combinational and sequential logic
Deepak John
 
digital logic circuits, digital component
Rai University
 
Decoder encoder
shahzad ali
 
Digital Design for B.Tech. / B.Sc.
PSK Research Foundation
 
Unit 2 module-2
praveenabollyjoshi
 
6. Combinational Logic with MSI and LSI_1682052077241 (1).pptx
newarboy9
 

Recently uploaded (20)

PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Ad

ME- applied electronics Vlsi UNIT IV.ppt

  • 2. 2 Multiplexers • A Multiplexer consists of n input lines and one output line. • 2:1 multiplexer chooses between two inputs S W1 W0 F 0 X 0 0 0 X 1 1 1 0 X 0 1 1 X 1 0 1 S W0 W1 F
  • 3. 3 Gate-Level Mux Design • • How many transistors are needed? 20 1 0 (too many transistors) F SW SW   4 4 W1 W0 S F 4 2 2 2 F 2 W1 W0 S
  • 4. 4 Transmission Gate Mux • Non restoring MUX uses two transmission gates – Only 4 transistors S S W0 W1 F S
  • 5. • Verilog: 2x1 MUX – Uses the conditional ?: operator • Software designers detest this operator • Hardware designers revel in its beauty
  • 6. • Behavioral Style in Verilog Must be reg type when used as LHS in an always block Sensitivity list: statements inside the always block are only executed when one or more signals in the list changes value
  • 7. 7 4:1 Multiplexer • 4:1 mux chooses one of 4 inputs using two selects – Two levels of 2:1 muxes – Or four tristates S0 D0 D1 0 1 0 1 0 1 Y S1 D2 D3 D0 D1 D2 D3 Y S1S0 S1S0 S1S0 S1S0
  • 8. Design of 4:1 Multiplexer
  • 12. • Verilog: 4x1 MUX (Data Flow Style) – This is getting complicated! – Need a way to specify a “group” of bits like w[0:3] – Need a way to replace ?: with “if then else”
  • 13. • Vectored Signals in Verilog Signals can be grouped as bit vectors • The order of the bits is user determined • W has 4 lines with the MSB = W[0] and the LSB = W[3] • S has two lines with the MSB = S[1] and the LSB = S[0] Format is [MSB:LSB]
  • 14. • Hierarchical Design of a 16x1 MUX w8 w11 s1 w0 s0 w3 w4 w7 w12 w15 s3 s2 f The Verilog code for mux4x1 must be either in the same file as mux16x1, or in a separate file (called mux4x1.v) in the same directory as mux16x1 Structural style Verilog
  • 15. • 4x1 MUX: Behavioral Styles
  • 16. 16 1. Decoder • Definition • Logic Diagram • Application (Universal Set) • Tree of Decoders
  • 17. 17 1. Decoder: Definition 0 y0 y1 y7 I0 I1 I2 1 2 0 1 2 3 4 5 6 7 EN (enable) n inputs n= 3 2n outputs 23 = 8 yi = 1 if En= 1 & (I2, I1, I0 ) = i yi= 0 otherwise n to 2n decoder function: . .
  • 18. 18 • N inputs, 2N outputs • One-hot outputs: only one output HIGH at once 1. Decoder: Definition 2:4 Decoder A1 A0 Y3 Y2 Y1 Y0 00 01 10 11 0 0 0 1 1 0 1 1 0 0 0 1 Y3 Y2 Y1 Y0 A0 A1 0 0 1 0 0 1 0 0 1 0 0 0
  • 19. 19 Decoder: Logic Diagram y0 I2’ I1’ I0’ y1 I2 I1’ I0’ En y7 I2 I1 I0 . . yi = mi En y0 = 1 if (I2, I1, I0 ) = (0,0,0) & En = 1 y7 = I2I1I0En
  • 20. 20 Decoder Application: universal set {Decoder, OR} Example: Implement functions f1(a,b,c) = m(1,2,4) f2(a,b,c) = m(2,3), and f3(a,b,c) = m(0,5,6) with a 3-input decoder and OR gates. I0 y0 y1 . . y7 c b a I1 I2 0 1 2 3 4 5 6 7 En y1 y2 y4 f1 y2 y3 f2 y0 y6 f3 y5
  • 21. 21 • OR minterms Decoders 2:4 Decoder A B 00 01 10 11 Y = AB + AB Y AB AB AB AB Minterm = A  B
  • 22. 22 Tree of Decoders Implement a 4-24 decoder with 3-23 decoders. I0 y0 y1 y7 I1 I2 0 1 2 3 4 5 6 7 I0 y8 y9 y15 I1 I2 0 1 2 3 4 5 6 7 a d c b
  • 23. 23 Implement a 6-26 decoder with 3-23 decoders. En D0 I2, I1, I0 D1 y0 y7 y8 y15 D7 y56 y63 En I2, I1, I0 I2, I1, I0 I5, I4, I3 Tree of Decoders … …
  • 24. 2-into-4 decoder Gate-level diagram Block diagram In Block diagrams: Circles on the input indicate the logic convention of the input signal Circles on the output indicate the physical configuration of the output gate
  • 25. 2-into-4 decoder with P. A. Inputs and a N.A. Output Gate-level diagram Block diagram
  • 26. • 2 to 4 Decoder: Behavioral Style
  • 27. • Hierarchical Design of a 4 to 16 Decoder Structural style Verilog dec2to4 to be presented soon …. w0 En y0 w1 y1 y2 y3 y8 y9 y10 y11 w2 w0 y0 y1 y2 y3 w0 En y0 w1 y1 y2 y3 w0 En y0 w1 y1 y2 y3 y4 y5 y6 y7 w1 w0 En y0 w1 y1 y2 y3 y12 y13 y14 y15 w0 En y0 w1 y1 y2 y3 w3 En w
  • 28. 28 2. Encoder • Definition • Logic Diagram • Priority Encoder
  • 29. 29 2. Encoder: Definition yn-1 …y0 En A I2 n -1 … I0 8 inputs 3 outputs y0 y1 y2 0 1 2 3 4 5 6 7 En At most one Ii = 1. (yn-1,.., y0 ) = i if Ii = 1 & n = 1 (yn-1,.., y0 ) = 0 otherwise. A = 1 if En = 1 and one i s.t. Ii = 1 A = 0 otherwise. Encoder Description: A I0 I7 0 1 2
  • 32. • 4 to 2 Binary Encoder Left extended by x to fill 2 bits
  • 33. 33 Priority Encoder: Definition Description: Input (I2 n -1,…, I0), Output (yn-1 ,…,,y0) (yn-1 ,…,,y0) = i if Ii = 1 & En = 1 & Ik = 0 for all k > i (high bit priority) or for all k< i (low bit priority). Eo = 1 if En = 1 & Ii = 0 for all i, Gs = 1 if En = 1 & i s.t. Ii = 1. E (Gs is like A, and Eo tells us if enable is true or not). 0 1 2 3 4 5 6 7 En Eo Gs I0 I7 y0 y1 y2 0 1 2
  • 34. 34 Priority Encoder: Implement a 32-input priority encoder w/ 8 input priority encoders (high bit priority). y32, y31, y30 I31-24 Eo Gs y22, y21, y20 I25-16 Eo Gs y12, y11, y10 I15-8 Eo Gs y02, y01, y00 I7-0 Eo Gs En
  • 35. • 4 to 2 Priority Encoder casex vs case
  • 36. • 4 to 2 Priority Encoder Using a For Loop A signal that is assigned a value multiple times in an always block retains its last value  priority scheme relies on this for correct setting of Y and z
  • 38. • Example: 2-bit comparator Comparators
  • 39. • K-Maps of the comparator outputs Comparators
  • 40. • Example: MIS 4-bit magnitude comparator Comparators
  • 41. • 4-bit magnitude comparator function table Comparators
  • 42. • Cascading 4-bit comparators to obtain a 16-bit comparator Comparators
  • 43. Shift Registers • Basic shift registers are classified by structure according to the following types: • Serial-in/serial-out • Parallel-in/serial-out • Serial-in/parallel-out • Universal parallel-in/parallel-out • Ring counter
  • 50. • Machine Arithmetic – Arithmetic combinational circuits are required for • Addition • Subtraction • Multiplication • Division (hardware implementation is optional) – Addition / subtraction can be done easily using full adders and a minimum of additional logic
  • 51. Addition of Binary Numbers Full Adder. The full adder is the fundamental building block of most arithmetic circuits: The sum and carry outputs are described as: i i i i i i i i i i i i i i i i i i i c b c a b a c b a c b a c b a c b a c        1 i i i i i i i i i i i i i c b a c b a c b a c b a s     Full Adder Cin Cout si ai bi
  • 52. Addition of Binary Numbers Propagate Propagate Generate Generate Inputs Outputs ci ai bi si ci+1 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1
  • 53. Full-Adder Implementation Full Adder operations is defined by equations: i i i i i i i i i i i i i i i i i i c p c b a c b a c b a c b a c b a s          i i i i i i i i i i i i c p g b a c b a c b a c      1 One-bit adder could be implemented as shown Carry-Propagate: and Carry-Generate gi i i i b a p   i i i b a g   cout cin si ai bi
  • 54. • Behavioral Style Full Adder data flow styles behavioral style
  • 55. • Higher Level Behavioral Style for Adder – This won't work! Can't loop over “subcircuits” subcircuits to be instantiated like this are not allowed in a for loop
  • 56. High-Speed Addition i i i c p s   i i i i c p g c   1 One-bit adder could be implemented more efficiently because MUX is faster i i i b a p   i i i b a g   0 1 s bi ai cout si cin
  • 57. • Ripple-Carry Adder – A ripple carry adder cascades full adders together • Simple, but not the most efficient design • Carry propagate adders are more efficient FA xn –1 cn cn 1 - yn 1 – sn 1 – FA x1 c2 y1 s1 FA c1 x0 y0 s0 c0 MSB position LSB position
  • 58. The Ripple-Carry Adder A0 B0 S0 Co,0 Ci,0 A1 B1 S1 Co,1 A2 B2 S2 Co,2 A3 B3 S3 Co,3 (= Ci,1) FA FA FA FA Worst case delay linear with the number of bits tadder N 1 –  tcarry tsum +  td = O(N) Goal: Make the fastest possible carry path circuit From Rabaey
  • 59. Inversion Property A B S Co Ci FA A B S Co Ci FA S A B Ci     S A B Ci     = Co A B Ci     Co A B Ci     = From Rabaey
  • 60. Minimize Critical Path by Reducing Inverting Stages A0 B0 S0 Co,0 Ci,0 A1 B1 S1 Co,1 A2 B2 S2 Co,2 Co,3 FA’ FA’ FA’ FA’ A3 B3 S3 Odd Cell Even Cell Exploit Inversion Property Note: need 2 different types of cells From Rabaey
  • 61. Ripple Carry Adder Carry-Chain of an RCA implemented using multiplexer from the standard cell library: ai+1 bi+1 ai bi ai+2 bi+2 cout ci+1 ci si si+1 si+2 cin Critical Path Oklobdzija, ISCAS’88
  • 62. • Ripple-Carry Adder Using Generate compiler produces n modules with names addstage[0].addbit, addstage[1].addbit, …, addstage[n-1].addbit
  • 63. Carry Look Ahead Logic (CLA) i i i c p s   i i i i c p g c   1 i i i b a p   i i i b a g   0 1 s bi ai cout si cin
  • 64. CLA Definitions: 4-bit Adder ai bi Ci gi pi ai+1 bi+1 Ci+1 gi+1 pi+1 ai+2 bi+2 Ci+2 gi+2 pi+2 ai+3 bi+3 Ci+3 gi+3 pi+3 Ci+4 1 1 1 1 1 1 1 1 1 1 2 ) ( c p p g p g c p g p g c p g c i i i i i i i i i i i i i                  i i i i i i i i i i i i c p g b a c b a c b a c      1
  • 65. Carry-Lookahead Adder: 4-bits ai bi Ci gi pi ai+1 bi+1 Ci+1 gi+1 pi+1 ai+2 bi+2 Ci+2 gi+2 pi+2 ai+3 bi+3 Ci+3 gi+3 pi+3 Ci+4 i i i i i i i i i i i i i i i i i i i i i i c p p p g p p g p g c p p g p g p g c p g c 1 2 1 2 1 2 2 1 1 1 2 2 2 2 2 3 ) (                           i i i i i i i i i i i i i i i i i i i i i i i i i i i c p p p p g p p p g p p g p g g p p g p g p g c p g c 1 2 3 1 2 3 1 2 3 2 3 3 1 2 1 2 2 3 3 3 3 3 4 ) (                                    Gj Pj
  • 66. Carry-Lookahead Adder i i i i i i i i i i j g p p p g p p g p g G 1 2 3 1 2 3 2 3 3              i i i i j p p p p P 1 2 3     j j j j c P G c    ) 1 ( 4 One gate delay  to calculate p, g One to calculate P and two for G Three gate delays To calculate C4(j+1) Compare that to 8  in RCA ! ai bi Cin Cj Gj Pj ai+1 bi+1 gi+1 pi+1 gi pi ai+2 bi+2 ai+3 bi+3 gi+1pi+1 gi+1pi+1 C4(j+1) C4j+1 C4j+2 C4j+3 P, G Group
  • 67. Carry-Lookahead Adder (Weinberger and Smith) i i i i i i i i i i j G P P P G P P G P G 1 2 3 1 2 3 2 3 3 * G              i i i i j P P P P P 1 2 3 *     j k k j c P G c 4 ) 1 ( 4 * *    Pj G* P* C4j+1 Gj Pj+1 Gj+1 Pj+3 Gj+3 Pj+2 Gj+2 C4j C4(j+1) C4j+2 C4j+3 Additional two gate delays C16 will take a total of 5 vs. 32 for RCA !
  • 68. Carry-Lookahead Adder (Weinberger and Smith: original derivation, 1958 )
  • 69. Carry-Lookahead Adder (Weinberger and Smith: original derivation )
  • 70. Carry lookahead addition • Two ways in which carry=1 from the ith bit-slice – Generated if both ai and bi are 1 • – A CarryIn (ci) of 1 is propagated if ai or bi are 1 • bi ai gi   bi ai pi   Are these carries generated or propagated?
  • 71. Carry lookahead addition • Two ways in which carry=1 from the ith bit-slice – Generated if both ai and bi are 1 • – A CarryIn (ci) of 1 is propagated if ai or bi are 1 • • The carry out, ci+1, is therefore • Using substitution, can get carries in parallel bi ai gi   bi ai pi   ci pi gi ci    1 0 0 1 2 3 0 1 2 3 1 2 3 2 3 3 4 0 0 1 2 0 1 2 1 2 2 3 0 0 1 0 1 1 2 0 0 0 1 c p p p p g p p p g p p g p g c c p p p g p p g p g c c p p g p g c c p g c                                   . . .
  • 72. Carry lookahead addition • Drawbacks – n+1 input OR and AND gates for nth input – Irregular structure with many long wires • Solution: do two levels of carry lookahead – First level generates Result (using carry lookahead to generate the internal carries) and propagate and generate signals for a group of 4 bits (Pi and Gi) – Second level generates carry out’s for each group based on carry in and Pi and Gi from previous group
  • 73. Carry lookahead addition • Internal equations for group 0 • Group equations for group 0 • C1 output of carry lookahead unit bi ai gi   bi ai pi   ci pi gi ci    1 0 1 2 3 1 2 3 2 3 3 0 g p p p g p p g p g G           0 1 2 3 0 p p p p P     0 0 0 1 c P G C   
  • 74. Carry lookahead addition • Internal equations for group 1 • Group equations for group 1 • C2 output of carry lookahead unit
  • 75. High Speed Adders • Carry-Skip Circuits • Carry-Select Adders • Carry-Save Adders
  • 76. Carry-Skip Addition • CLA group generate (Gi) hardware is complex • Carry skip addition – Generate Gi’s by ripple carry of a and b inputs with cin’s = 0 (except c0) Generate Pi’s as in carry lookahead – For each group, combine Gi, Pi, and cin as in carry lookahead to form group carries – Generate sums by ripple carry from a and b inputs and group carries
  • 77. Carry skip addition • Operation – Generate Gi’s through ripple carry
  • 78. Carry skip addition • Operation – Generate Gi’s through ripple carry – In parallel, generate Pi’s as in carry lookahead
  • 79. Carry skip addition • Operation – Group carries to each block are generated or propagated
  • 80. Carry skip addition • Operation – Group carries to each block are generated or propagated – Sums are generated in parallel from group carries
  • 81. Carry-Skip Adder FA FA FA FA P0 G1 P0 G1 P2 G2 P3 G3 Co,3 Co,2 Co,1 Co,0 Ci,0 FA FA FA FA P0 G1 P0 G1 P2 G2 P3 G3 Co,2 Co,1 Co,0 Ci,0 Co,3 Multiplexer BP=PoP1P2P3 Idea: If (P0 and P1 and P2 and P3 = 1) then Co3 = C0, else “kill” or “generate”. Bypass From Rabaey
  • 82. Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups Gr G r- 1 ... S N-k-1 SN-1 a N-1bN-1 b N-k-1 aN-k-1 S (r-1)k-1 S (r-2)k G1 Go ... Sk S 2k-1 a 2k-1 b 2k-1 bk ak S k-1 S 0 ... ... a (r-1)k b(r-1)k a (r-1)kb (r-1)k ... a k-1 b k-1 a0 b0 ... Cin ... ... ... ... ... ... ... ... Pr-1 Pr-2 P1 P0 Cout + + + + AND OR OR OR OR AND AND AND critical path, delay =2(k-1)+(N/2-2)
  • 83. Carry-Skip Adder   SKIP RCA d t N t k t           2 2 1 2 N tp ripple adder bypass adder 4..8 k
  • 84. Carry-Select Addition • For each group, do two additions in parallel – One with cin forced to 0 – One with cin forced to 1 • Generate cin in parallel and use a MUX to select the correct sum outputs • Example for 8 bits MUXes
  • 85. Carry-select addition • A larger design – Why different numbers of bits in each block? • Hint1: it’s to minimize the adder delay • Hint2: assume a k-input block has k time units of delay, and the AND- OR logic has 1 time unit of delay generated carry propagated carry
  • 87. Carry-Select Adder Addition under assumption of Cin=0 and Cin =1.
  • 88. Carry Select Adder: combining two 32-b VBAs in select mode Delay =VBA32+ MUX
  • 89. Carry Save Addition • A full adder sums 3 inputs and produces 2 outputs – Carry output has twice weight of sum output • N full adders in parallel are called carry save adder – Produce N sums and N carry outs Z4 Y4 X4 S4 C4 Z3 Y3 X3 S3 C3 Z2 Y2 X2 S2 C2 Z1 Y1 X1 S1 C1 XN...1 YN...1 ZN...1 SN...1 CN...1 n-bit CSA
  • 90. CSA Application • Use k-2 stages of CSAs – Keep result in carry-save redundant form • Final CPA computes actual result 4-bit CSA 5-bit CSA 0001 0111 1101 0010 + 1011 0101_ 0001 0111 +1101 1011 0101_ X Y Z S C 0101_ 1011 +0010 X Y Z S C A B S
  • 91. CSA Application • Use k-2 stages of CSAs – Keep result in carry-save redundant form • Final CPA computes actual result 4-bit CSA 5-bit CSA 0001 0111 1101 0010 + 1011 0101_ 01010_ 00011 0001 0111 +1101 1011 0101_ X Y Z S C 0101_ 1011 +0010 00011 01010_ X Y Z S C 01010_ + 00011 A B S
  • 92. CSA Application • Use k-2 stages of CSAs – Keep result in carry-save redundant form • Final CPA computes actual result 4-bit CSA 5-bit CSA 0001 0111 1101 0010 + 1011 0101_ 01010_ 00011 0001 0111 +1101 1011 0101_ X Y Z S C 0101_ 1011 +0010 00011 01010_ X Y Z S C 01010_ + 00011 10111 A B S 10111
  • 93. Multiplication • Example: 1100 : 1210 0101 : 510
  • 94. Multiplication • Example: 1100 : 1210 0101 : 510 1100
  • 95. Multiplication • Example: 1100 : 1210 0101 : 510 1100 0000
  • 96. Multiplication • Example: 1100 : 1210 0101 : 510 1100 0000 1100
  • 97. Multiplication • Example: 1100 : 1210 0101 : 510 1100 0000 1100 0000
  • 98. Multiplication • Example: 1100 : 1210 0101 : 510 1100 0000 1100 0000 00111100 : 6010
  • 99. Multiplication • Example: • M x N-bit multiplication – Produce N M-bit partial products – Sum these to produce M+N-bit product 1100 : 1210 0101 : 510 1100 0000 1100 0000 00111100 : 6010 multiplier multiplicand partial products product
  • 100. General Form • Multiplicand: Y = (yM-1, yM-2, …, y1, y0) • Multiplier: X = (xN-1, xN-2, …, x1, x0) • Product: 1 1 1 1 0 0 0 0 2 2 2 M N N M j i i j j i i j j i i j P y x x y                           x0 y5 x0 y4 x0 y3 x0 y2 x0 y1 x0 y0 y5 y4 y3 y2 y1 y0 x5 x4 x3 x2 x1 x0 x1 y5 x1 y4 x1 y3 x1 y2 x1 y1 x1 y0 x2 y5 x2 y4 x2 y3 x2 y2 x2 y1 x2 y0 x3 y5 x3 y4 x3 y3 x3 y2 x3 y1 x3 y0 x4 y5 x4 y4 x4 y3 x4 y2 x4 y1 x4 y0 x5y5 x5y4 x5y3 x5y2 x5y1 x5y0 p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 multiplier multiplicand partial products product
  • 101. Dot Diagram • Each dot represents a bit partial products multiplier x x0 x15
  • 103. Rectangular Array • Squash array to fit rectangular floorplan y0 y1 y2 y3 x0 x1 x2 x3 p0 p1 p2 p3 p4 p5 p6 p7
  • 104. Fewer Partial Products • Array multiplier requires N partial products • If we looked at groups of r bits, we could form N/r partial products. – Faster and smaller? – Called radix-2r encoding • Ex: r = 2: look at pairs of bits – Form partial products of 0, Y, 2Y, 3Y – First three are easy, but 3Y requires adder 
  • 105. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 106. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 107. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 108. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 109. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 110. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 111. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 112. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product
  • 113. Booth Hardware • Booth encoder generates control lines for each PP – Booth selectors choose PP bits Mi yj Xi yj-1 2Xi PPij Booth Selector Booth Encoder x2i+1 x2i x2i-1
  • 114. Sign Extension • Partial products can be negative – Require sign extension, which is cumbersome – High fanout on most significant bit multiplier x x0 x15 0 0 0 x-1 x16 x17 s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8
  • 115. Simplified Sign Ext. • Sign bits are either all 0’s or all 1’s – Note that all 0’s is all 1’s + 1 in proper column – Use this to reduce loading on MSB s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 s s 1 1 1 1 1 1 1 1 1 1 1 1 1 s s 1 1 1 1 1 1 1 1 1 1 1 s s 1 1 1 1 1 1 1 1 1 s s 1 1 1 1 1 1 1 s s 1 1 1 1 1 s s 1 1 1 s s 1 s PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8
  • 116. Even Simpler Sign Ext. • No need to add all the 1’s in hardware – Precompute the answer! s s s s s s 1 s s 1 s s 1 s s 1 s s 1 s s 1 s s PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8
  • 117. Advanced Multiplication • Signed vs. unsigned inputs • Higher radix Booth encoding • Array vs. tree CSA networks

Editor's Notes

  • #51: Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation.
  • #52: Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation.
  • #53: First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance.
  • #56: First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance.
  • #61: A ripple carry adder for N-bit numbers is implemented by concatenating N full adders as shown in this slide. At the i-th bit position, the i-th bits of operands A and B and a carry signal from the preceding adder stage are used to generate the i-th bit of the sum, si, and a carry, ci+1, to the next adder stage. This scheme is called a Ripple Carry Adder, since the carry signal “ripple” from the least significant bit position to the most significant one. If the ripple carry adder is implemented by concatenating N full adders, the delay of such an adder is 2N gate delays from Cin-to-Cout. The path from the input to the output signal that is likely to take the longest time is designated as a "critical path". In the case of a Ripple Carry Adder, this is the path from the least significant input a0 or b0 to the last sum bit sn. Assuming multiplexer based XOR gate implementation, this critical path will consist of N+1 pass transistor delays. However, such a long chain of transistors will significantly degrade the signal, thus some amplification points are necessary. In practice, we can use a multiplexer cell to build this critical path using standard cell library as shown in this slide.
  • #63: First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance.
  • #82: Since the Cin-to-Cout represents the longest path in the ripple-carry-adder an obvious attempt is to accelerate carry propagation through the adder. This is accomplished by using Carry-Propagate pi signals within a group of bits. If all the pi signals within the group are set to pi = 1, the condition exist for the carry to bypass the entire group: Carry Skip Adder divides the words to be added into groups of equal size of k-bits. The basic structure of an N-bit Carry Skip Adder is shown here. Within the group, carry propagates in a ripple-carry fashion. In addition, an AND gate is used to form the group propagate signal. If group propagate signal is “true” the condition exists for carry to bypass, the group as shown in this slide. The maximal delay of a Carry Skip Adder is encountered when carry signal is generated in the least-significant bit position, rippling through k-1 bit positions, skipping over N/k-2 groups in the middle, rippling through the k-1 bits of most significant group and being assimilated in the Nth bit position to produce the sum SN: Thus, Carry Skip Adder is faster than Ripple Carry Adder at the expense of a few relatively simple modifications. The delay of the Carry Skip Adder is still linearly dependent on the size of the adder N, however this linear dependence is reduced by a factor of 1/k.
  • #87: The theoretically fastest scheme for addition of two numbers is "Conditional-Sum Addition" proposed by Sklansky in 1960. The essence of this scheme is in the realization that we can add two numbers without waiting for the carry signal to arrive. Simply, the numbers are added in two instances: one assuming Cin = 0 and the other assuming Cin = 1. The conditionally produced results: Sum0, Sum1 and Carry0, Carry1 are selected by a multiplexer using an incoming carry signal Cin as a multiplexer control. Similarly to the Carry-Lookahead Adder the input bits are divided into groups which are in this case added "conditionally". It is apparent that while building Conditional-Sum Adder the hardware complexity starts to grow rapidly starting from the Least Significant Bit position. Therefore, in practice, the full-blown implementation of the CNSA is not found. However, the idea of adding the Most Significant portion of the operands conditionally and selecting the results once the carry-in signal is computed in the Least Significant portion, is attractive. Such a scheme, which is a subset of Conditional-Sum Adder, is known as "Carry-Select Adder".   Carry Select Adder divides the words to be added into blocks and forms two sums for each block in parallel: -one with a carry in of ZERO and the other with a carry in of ONE. In this slide an example of a 16 bit carry select adder in shown: The carry-out from the Least Significant 4-bit block controls a multiplexer that selects the sum from the Most Significant portion. The carry out is computed using the equation for the carry out of the group, since the group propagate signal Pi is the carry out of an adder with a carry input of ONE and the group generate Gi signal is the carry out of an adder with a carry input of ZERO. This speeds-up the computation of the carry signal which is necessary for selection in the next block. The upper 8-bits are computed conditionally using two Carry-Select Adders similar to the one used in the Least Significant 8-bit portion. The delay of this adder is determined by the speed of the Least Significant k-bit block (4-bit RCA in this example) and delay of multiplexers in the Most Significant path. Generally the delay of such adder is proportional to the log function of the size of the adder.