SlideShare a Scribd company logo
T.Jyothsna et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207

RESEARCH ARTICLE

www.ijera.com

OPEN ACCESS

Low Power, High Speed Parallel Architecture For Cyclic
Convolution Based On Fermat Number Transform (FNT)
T.Jyothsna1 M.Tech, M.Pradeep2 M.Tech
1

E.C.E department, shri Vishnu engineering college for women, Vishnupur, Bhimavaram, India
Associate Professor in E.C.E department, shri Vishnu engineering college for women, Vishnupur,
Bhimavaram, India
2

Abstract
The power consumption, Delay and Area of this new novel 4-2 Compressor Architecture is compared with
Existing architecture. In the proposed architecture the outputs are efficiently used to improve Low power, high
speed, performance, less and delay. FNT is exact with no round off errors and Truncation errors. The Binary
Arithmetic in FNT performs the Exact Computation. To perform the cyclic convolution in FNT some techniques
are implemented. The Techniques are code Conversion method without Addition (CCWA) and Butterfly
Operation without Addition (BOWA) are proposed to perform the FNT and its Inverse (IFNT) except their final
stages in the Convolution. Here the Point wise Multiplication in the Convolution is accomplished by Modulo
2^n+1 Partial Products Multipliers (MPPM) and Output partial products which are Inputs to the IFNT. Thus
Modulo 2^n+1 Carry save Propagation Additions are avoided in the FNT and the IFNT except their final stages
and the Modulo 2^n+1 multiplier. Thus the Power and Execution delay of the entire FNT will be reduced which
is only because of usage of above techniques in the Design. Therefore the proposed one has less Power better
Throughput Performance and involves less hardware complexity. This will be done by using Very Large Scale
Integration (VLSI) technology and various Cad tools available, so as to implement hardware The synthesis
results using 180nm SOC Technology is been used.

I.

INTRODUCTION

Here the cyclic convolution is performed
based on FNT, is used in DSP (digital signal
processing) Applications for Security of information
transmission and reception purpose. For obtaining
low power we are being using Novel architecture of
xor-xnor, mux style 4-2 compressor [8]
Area is more, Delay is more .All these are
overcome by using xor-xnor, mux style 4-2
compressor Generally Convolution is a basic
operation in DSP[1] but when finite word length is
calculating for the convolution their exists round off
and truncation errors and is very computational
expensive
operation
Therefore
to
reduce
computational complexity we are opting for cyclic
convolution or circular convolution, it is simpler and
easy and produces less output samples and it is one of
the most important and efficient operation in DSP.
Cyclic convolution can be performed
efficiently using FNT rather than both DFT and FFT.
The cyclic based on FFT is widely used operation in
signal processing in a complex domain. cyclic
convolution and correlation without roundoff errors
and better efficiency than the FFT. However there is
one interesting case of the NTT [3] is Fermat number
transform.
The cyclic convolution based on FNT is
simple and less computational complexity because the
expensive multiplications in FFT in FNT with its
integer power 2. Fermat number is a positive integer of
www.ijera.com

the form Fn=22t+1 where t is nonnegative integer.
FNT [4],[5] is suitable to digital computation
therefore fnt implementation is exact without
roundoff errors.The Fermat number transform has
been used in many applications such as video
processing, digital filtering, and multiplication of
large numbers and also in Pseudo random generator.
Important
operations
of
cyclic
convolution based on FNT with the unit root 2
includes i) ccwa (code convolution without addition) ii)
bowa (butterfly operation without addition) and
mppm.The CCWA and BOWA both consists of novel
modulo 2n+14-2compressor
in the diminished-1
representation of X i.e.. X=X-1[9].

II.

CODE CONVERSION WITHOUT
ADDITION

It is first stage in FNT .here CC converts the
normal binary code (NBC) into the diminished-1
representation. The delay and area of cc of n-bit NBC
is close to the ones of an n-bit carry propagation adder.
To reduce the cost we propose the CCWA which is
been performed by modulo 2n+1 4-2 compressor.
I0, I1, I2, I3 are four inputs applied to modulo 2n+1 4-2
compressor. Outputs obtained are sum vector Ho* and
carry vector H1*in the diminished-1 representation
[5].

1203 | P a g e
T.Jyothsna et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207

www.ijera.com

I.

The existing 4-2 compressor:
Compressors are the basic components in
many applications particularly in partial product
summation in multipliers. Multiplication is a basic
arithmetic operation in applications such as DSP which
rely on efficient implementation of ALU and floating
point units to execute operations like convolution and
filtering.

Fig 3: modulo 2n+1 4-2 compressor.
The equations of output in the proposed
architecture are shown below
Fig 1: Existing 4-2 compressor
As we used two full adder the complexity
increases and power is more, area occupied is more
hence delay is more .In order to Obtain Low power high
speed, less area we proposed Novel Architecture of xorxnor, mux style 4-2 compressor [6].
II.

proposed novel architecture of 4-2 compressor:
In this proposed new compressor architecture
the design of low power, high speed, delay and area of
these new compressor architecture are compared with
existing one.

In this Modulo 2n+1 4-2 compressor, the novel
architecture of 4-2 compressor [7, 8] [fig 2] as shown
above is called for required number of times to
perform the CCWA. Outputs are sum vector H0* &
H1*. The MSB H1* is complimented and connected
back to its LSB. The obtained results consisting of two
diminished-1 values.

III.

Fig 2: modulo 2n+1 4-2 compressor.
In this each full adder are broken into their
constituent XOR blocks .. Both the Xor and Xnor
values are computed efficiently used to reduce
delay .This is due to availability of the selection bit at
the mux block so that before the arrival of input. Thus
the time required for switching of transistors is reduced.

BUTTERFLY OPERATION
WITHOUT ADDITION

BOWA is the one of operation performed in
FNT after the CCWA has been performed. It consists of
two modulo 2n+1 4-2 compressors, a multiplier and
some inverters as shown below in fig 4. It can be
performed without carry propagation chain so as to
reduce delay and area. Here the designed low power 42 compressor of novel architecture thus the power
generated will be less.

Fig 4: Butterfly operation without Addition
K*,L*,M*,N* are corresponding to two inputs
and two outputs of previous BO in the diminished-1
www.ijera.com

1204 | P a g e
T.Jyothsna et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207

www.ijera.com

number system respectively.

IV.

MODULO 2N+1 PARTIAL PRODUCT
MULTIPLIER

After performing the calculations of CCWA
and BOWA both of them will produce the carry-save
order then MPPM[10] is used perform point wise
multiplications so that final carry-propagation addition
of two partial products in multiplier is avoided therefore
the
execution
delay
will
be
reduced.
Modulo2n+1multiplier is proposed by Efstathiou, there
are n+3 partial products. An full adder based
Daddatree [7] reduces the n+3 partial products into two
summands. Dadda multiplier is faster than other
multipliers therefore it gives the fast performance than
other multipliers. In the proposed cyclic convolution
based on FNT of parallel architecture, the BOWA
accepts four operands in diminished-1 number system.
Every point wise multiplication produce two partial
products rather than one product. It takes away the
final modulo 2n+1 adder of two partial products in the
multiplier thus the final modulo 2n+1 adder is removed
and modulo 2n+1 partial product multiplier is used to
save the area and delay.

V.

PARALLEL ARCHITECTURE OF
CYCLIC CONVOLUTION

Parallel architecture of cyclic convolution for
cyclic is designed by using CCWA, BOWA and
MPPM as shown below. Point wise multiplication and
generates N pair of partial products. Later IFNT of
partial products are performed to produce sequence {Pi}
of the cyclic convolution.

(a)Parallel FNT structure b) Parallel IFNT structure
Fig6: Structures for FNT and IFNT (Ft=28+1) It has
log2N+1 stages of operations.
The efficient FNT structure involves log2N+ 1
stages of operations. The original operands are
converted into the diminished-1 representation in the
CCWA stage, containing the information of modulo
2n+1 addition or subtraction in the first butterfly
operation stage of the previous FNT structure. Then
the results are sent to the next stage of BOWA. After
log2n-1 stages of BOWAs, the results composed of two
diminished-1 operands are obtained. The final stage of
FNT consists of modulo 2n+1 carry-propagation adders
which are used to evaluate the final results in the
diminished-1 representation.
Implementation:
4-2 compressor and also for the existing
architecture in order simulate both the codes and
compare the low power calculations for both the
architectures. All this has been done as follows.
Verilog code is written and then simulated using
QuestaSim tool from Mentor Graphics. The
System-on-Chip (SOC) approach is adopted using
Cadence tools, SOC Encounter software the Power
and Area Analysis is done and reduced by Xilinx
Xpower/RTL Precision Synthesistool and the power
and area is optimized.
RESULTS OF
COMPRESSOR:

OLD

FULL

ADDER

Fig 5: Parallel Architecture of cyclic convolution
based on FNT [9].
It consists of Two FNTS , IFNT and point
wise multiplication modulo 2n+1 .It has two input
sequences {ai} and {bi} produce two sequences
{Ai}and {Bi} (i=1, 2 …N- 1). Sequences {Ai} and
{Bi}, then AI and Bi applied to N MPPM to perform
the point wise multiplication and generates N pair of
partial products.
Later IFNT of partial products are performed
to produce sequence {Pi} of the cyclic convolution.

www.ijera.com

Fig7: results of old full adder 4-2 compressor

1205 | P a g e

4-2
T.Jyothsna et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207

www.ijera.com

Existing compressor RTL POWER

Fig8: Schematic of old full adder 4-2 compressor

Fig11: Existing compressor RTL POWER

REULTS OF PROPOSED 4-2 COMPRESSOR

Proposed 4-2 compressor RTL POWER

Fig9: Results of proposed 4-2 compressor

Fig12: Proposed 4-2 compressor RTL POWER
The schematic of FNT architecture in Questasim
software

Fig10:

Schematic 4-2 compressor
Fig13: The schematic of FNT architecture in
Questasim software

www.ijera.com

1206 | P a g e
T.Jyothsna et al Int. Journal of Engineering Research and Application
ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207

www.ijera.com

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

J. G. Proakis and D. G. Manolakis, Digital
signal processing: principles, algorithms, and
applications, Prentice Hall, New Jersey, 2007.
A. B. O'Donnell, C. J. Bleakley, “Area
efficient fault tolerant convolution using
RRNS with NTTs and WSCA”, Electronics
Letters, 2008, 44(10), pp.648-649
R. Conway, “Modified Overlap Technique
Using Fermat and Mersenne Transforms”,
IEEE Trans. Circuits and Systems II: Express
Briefs, 2006, 53(8), pp.632 – 636
H. H. Alaeddine, E. H. Baghious and G.
Madre et al., “Realization of multi-delay filter
using Fermat number transforms”, IEICE
Trans. Fundamentals, 2008, E91A(9), pp.
2571-2577
L. M. Leibowitz, “A simplified binary
arithmetic for the Fermat number transform,”
IEEE Trans. Acoustics Speech and Signal
Processing, 1976, 24(5):356-359
C. Cheng, K.K. Parhi, “Hardware efficient
fast DCT based on novel cyclic convolution
structures”, IEEE Trans. Signal processing,
2006, 54(11), pp. 4419- 4434
K. Prasad and K. K. Parhi, “Low-power 4-2
and 5-2 compressors,” in Proc. of the 35th
Asilomar Conf. On Signals, Systems and
Computers, vol. 1, 2001, pp. 129–133.
C. H. Chang, J. Gu, M. Zhang, “Ultra lowvoltage low-power CMOS 4-2 and 5-2
compressors for fast arithmetic circuits” IEEE
Transactions on Circuits and Systems I:
Regular Papers, Volume 51, Issue 10, Oct.
2004 Page(s):1985 – 1997
H. T. Vergos, C. Efstathiou, D. Nikolos,
“Diminishedone modulo 2n + 1 adder design”,
IEEE Trans. Computers, 2002, 51(12), pp.
1389-1399
C. Efstathiou, H. Vergos, G. Dimitrakopoulos,
et al., “Efficient diminished-1 modulo 2n + 1
multipliers”, IEEE Trans. Computers, 2005,
54(4), pp. 491-496
M. Nagamatsu, S. Tanaka, J. Mori, et. al. “15ns 32 × 32-b CMOS multiplier with an
improved parallel structure”, IEEE Journal of
Solid-State Circuits, 1990, 25(2), pp. 494-497
BOOKS
Essentials of VLSI Circuits and systems by
Kamran Eshraghian, Douglas A. Pucknell,
SholehEshraghian.
Modern VLSI Design, System - on - Chip
Design Third Edition Wayne Wolf, Pearson
Education.
Barbe, D.F. (Ed.) (1982) Very Large Scale
Integration – Fundamentals and applications,
Springer - Verlag, West Germany / USA.

www.ijera.com

1207 | P a g e

More Related Content

PDF
Aw4102359364
IJERA Editor
 
PDF
A comparative study of different multiplier designs
Hoopeer Hoopeer
 
PDF
IRJET - Distributed Arithmetic Method for Complex Multiplication
IRJET Journal
 
PDF
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
IRJET Journal
 
PDF
The_Mismatch_Noise_Cancellation_Architecture
Shereef Shehata
 
PDF
HIGH SPEED REVERSE CONVERTER FOR HIGH DYNAMIC RANGE MODULI SET
P singh
 
PDF
Design and Analysis of 4-2 Compressor for Arithmetic Application
Associate Professor in VSB Coimbatore
 
DOCX
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
AishwaryaRavishankar8
 
Aw4102359364
IJERA Editor
 
A comparative study of different multiplier designs
Hoopeer Hoopeer
 
IRJET - Distributed Arithmetic Method for Complex Multiplication
IRJET Journal
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
IRJET Journal
 
The_Mismatch_Noise_Cancellation_Architecture
Shereef Shehata
 
HIGH SPEED REVERSE CONVERTER FOR HIGH DYNAMIC RANGE MODULI SET
P singh
 
Design and Analysis of 4-2 Compressor for Arithmetic Application
Associate Professor in VSB Coimbatore
 
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
AishwaryaRavishankar8
 

What's hot (20)

PDF
J0166875
IOSR Journals
 
PDF
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 Compressor
IJERD Editor
 
PDF
Compressor based approximate multiplier architectures for media processing ap...
IJECEIAES
 
PDF
N046018089
IJERA Editor
 
PDF
High Performance MAC Unit for FFT Implementation
IJMER
 
PDF
Parallel Hardware Implementation of Convolution using Vedic Mathematics
IOSR Journals
 
PDF
Iaetsd design and implementation of pseudo random number generator
Iaetsd Iaetsd
 
PDF
C0421013019
ijceronline
 
PDF
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET Journal
 
PDF
Multiplier and Accumulator Using Csla
IOSR Journals
 
PDF
Lc3519051910
IJERA Editor
 
PDF
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm
ijsrd.com
 
PDF
F0213137
IOSR Journals
 
DOCX
A high performance fir filter architecture for fixed and reconfigurable appli...
Ieee Xpert
 
PDF
Design and Implementation of High Speed Area Efficient Double Precision Float...
IOSR Journals
 
PPTX
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Angel Yogi
 
PDF
Ad4103173176
IJERA Editor
 
PDF
A Pipelined Fused Processing Unit for DSP Applications
ijiert bestjournal
 
PDF
Ijarcet vol-2-issue-7-2357-2362
Editor IJARCET
 
DOCX
Flexible dsp accelerator architecture exploiting carry save arithmetic
Nexgen Technology
 
J0166875
IOSR Journals
 
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 Compressor
IJERD Editor
 
Compressor based approximate multiplier architectures for media processing ap...
IJECEIAES
 
N046018089
IJERA Editor
 
High Performance MAC Unit for FFT Implementation
IJMER
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
IOSR Journals
 
Iaetsd design and implementation of pseudo random number generator
Iaetsd Iaetsd
 
C0421013019
ijceronline
 
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
IRJET Journal
 
Multiplier and Accumulator Using Csla
IOSR Journals
 
Lc3519051910
IJERA Editor
 
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm
ijsrd.com
 
F0213137
IOSR Journals
 
A high performance fir filter architecture for fixed and reconfigurable appli...
Ieee Xpert
 
Design and Implementation of High Speed Area Efficient Double Precision Float...
IOSR Journals
 
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Angel Yogi
 
Ad4103173176
IJERA Editor
 
A Pipelined Fused Processing Unit for DSP Applications
ijiert bestjournal
 
Ijarcet vol-2-issue-7-2357-2362
Editor IJARCET
 
Flexible dsp accelerator architecture exploiting carry save arithmetic
Nexgen Technology
 
Ad

Viewers also liked (8)

PDF
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
ux singapore
 
PDF
SEO: Getting Personal
Kirsty Hulse
 
PDF
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Stanford GSB Corporate Governance Research Initiative
 
PDF
The impact of innovation on travel and tourism industries (World Travel Marke...
Brian Solis
 
PDF
Open Source Creativity
Sara Cannon
 
PPSX
Reuters: Pictures of the Year 2016 (Part 2)
maditabalnco
 
PDF
The Six Highest Performing B2B Blog Post Formats
Barry Feldman
 
PDF
The Outcome Economy
Helge Tennø
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
ux singapore
 
SEO: Getting Personal
Kirsty Hulse
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Stanford GSB Corporate Governance Research Initiative
 
The impact of innovation on travel and tourism industries (World Travel Marke...
Brian Solis
 
Open Source Creativity
Sara Cannon
 
Reuters: Pictures of the Year 2016 (Part 2)
maditabalnco
 
The Six Highest Performing B2B Blog Post Formats
Barry Feldman
 
The Outcome Economy
Helge Tennø
 
Ad

Similar to Gv3512031207 (20)

PDF
Implementation of cyclic convolution based on fnt
eSAT Journals
 
PDF
Implementation of cyclic convolution based on fnt
eSAT Publishing House
 
PDF
Fast Fourier Transform utilizing Modified 4:2 & 7:2 Compressor
IJERD Editor
 
PDF
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET Journal
 
PPT
Basic principle of a systolic system-Convolution
tumulagitham
 
PDF
427 432
Editor IJARCET
 
PDF
Aw25293296
IJERA Editor
 
PDF
Id2514581462
IJERA Editor
 
PDF
Id2514581462
IJERA Editor
 
PDF
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET Journal
 
PDF
Paper id 25201467
IJRAT
 
PDF
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET Journal
 
PDF
High Speed Area Efficient 8-point FFT using Vedic Multiplier
IJERA Editor
 
PDF
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...
IRJET Journal
 
PDF
Iaetsd mac using compressor based multiplier and carry save adder
Iaetsd Iaetsd
 
PPTX
64 point fft chip
ShalyJ
 
PPT
13486500-FFT.ppt
Pratik Gohel
 
PDF
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...
IOSRJVSP
 
PDF
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd Iaetsd
 
PDF
Gn3311521155
IJERA Editor
 
Implementation of cyclic convolution based on fnt
eSAT Journals
 
Implementation of cyclic convolution based on fnt
eSAT Publishing House
 
Fast Fourier Transform utilizing Modified 4:2 & 7:2 Compressor
IJERD Editor
 
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET Journal
 
Basic principle of a systolic system-Convolution
tumulagitham
 
Aw25293296
IJERA Editor
 
Id2514581462
IJERA Editor
 
Id2514581462
IJERA Editor
 
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET Journal
 
Paper id 25201467
IJRAT
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET Journal
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
IJERA Editor
 
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...
IRJET Journal
 
Iaetsd mac using compressor based multiplier and carry save adder
Iaetsd Iaetsd
 
64 point fft chip
ShalyJ
 
13486500-FFT.ppt
Pratik Gohel
 
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...
IOSRJVSP
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd Iaetsd
 
Gn3311521155
IJERA Editor
 

Recently uploaded (20)

PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Software Development Methodologies in 2025
KodekX
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 

Gv3512031207

  • 1. T.Jyothsna et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207 RESEARCH ARTICLE www.ijera.com OPEN ACCESS Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT) T.Jyothsna1 M.Tech, M.Pradeep2 M.Tech 1 E.C.E department, shri Vishnu engineering college for women, Vishnupur, Bhimavaram, India Associate Professor in E.C.E department, shri Vishnu engineering college for women, Vishnupur, Bhimavaram, India 2 Abstract The power consumption, Delay and Area of this new novel 4-2 Compressor Architecture is compared with Existing architecture. In the proposed architecture the outputs are efficiently used to improve Low power, high speed, performance, less and delay. FNT is exact with no round off errors and Truncation errors. The Binary Arithmetic in FNT performs the Exact Computation. To perform the cyclic convolution in FNT some techniques are implemented. The Techniques are code Conversion method without Addition (CCWA) and Butterfly Operation without Addition (BOWA) are proposed to perform the FNT and its Inverse (IFNT) except their final stages in the Convolution. Here the Point wise Multiplication in the Convolution is accomplished by Modulo 2^n+1 Partial Products Multipliers (MPPM) and Output partial products which are Inputs to the IFNT. Thus Modulo 2^n+1 Carry save Propagation Additions are avoided in the FNT and the IFNT except their final stages and the Modulo 2^n+1 multiplier. Thus the Power and Execution delay of the entire FNT will be reduced which is only because of usage of above techniques in the Design. Therefore the proposed one has less Power better Throughput Performance and involves less hardware complexity. This will be done by using Very Large Scale Integration (VLSI) technology and various Cad tools available, so as to implement hardware The synthesis results using 180nm SOC Technology is been used. I. INTRODUCTION Here the cyclic convolution is performed based on FNT, is used in DSP (digital signal processing) Applications for Security of information transmission and reception purpose. For obtaining low power we are being using Novel architecture of xor-xnor, mux style 4-2 compressor [8] Area is more, Delay is more .All these are overcome by using xor-xnor, mux style 4-2 compressor Generally Convolution is a basic operation in DSP[1] but when finite word length is calculating for the convolution their exists round off and truncation errors and is very computational expensive operation Therefore to reduce computational complexity we are opting for cyclic convolution or circular convolution, it is simpler and easy and produces less output samples and it is one of the most important and efficient operation in DSP. Cyclic convolution can be performed efficiently using FNT rather than both DFT and FFT. The cyclic based on FFT is widely used operation in signal processing in a complex domain. cyclic convolution and correlation without roundoff errors and better efficiency than the FFT. However there is one interesting case of the NTT [3] is Fermat number transform. The cyclic convolution based on FNT is simple and less computational complexity because the expensive multiplications in FFT in FNT with its integer power 2. Fermat number is a positive integer of www.ijera.com the form Fn=22t+1 where t is nonnegative integer. FNT [4],[5] is suitable to digital computation therefore fnt implementation is exact without roundoff errors.The Fermat number transform has been used in many applications such as video processing, digital filtering, and multiplication of large numbers and also in Pseudo random generator. Important operations of cyclic convolution based on FNT with the unit root 2 includes i) ccwa (code convolution without addition) ii) bowa (butterfly operation without addition) and mppm.The CCWA and BOWA both consists of novel modulo 2n+14-2compressor in the diminished-1 representation of X i.e.. X=X-1[9]. II. CODE CONVERSION WITHOUT ADDITION It is first stage in FNT .here CC converts the normal binary code (NBC) into the diminished-1 representation. The delay and area of cc of n-bit NBC is close to the ones of an n-bit carry propagation adder. To reduce the cost we propose the CCWA which is been performed by modulo 2n+1 4-2 compressor. I0, I1, I2, I3 are four inputs applied to modulo 2n+1 4-2 compressor. Outputs obtained are sum vector Ho* and carry vector H1*in the diminished-1 representation [5]. 1203 | P a g e
  • 2. T.Jyothsna et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207 www.ijera.com I. The existing 4-2 compressor: Compressors are the basic components in many applications particularly in partial product summation in multipliers. Multiplication is a basic arithmetic operation in applications such as DSP which rely on efficient implementation of ALU and floating point units to execute operations like convolution and filtering. Fig 3: modulo 2n+1 4-2 compressor. The equations of output in the proposed architecture are shown below Fig 1: Existing 4-2 compressor As we used two full adder the complexity increases and power is more, area occupied is more hence delay is more .In order to Obtain Low power high speed, less area we proposed Novel Architecture of xorxnor, mux style 4-2 compressor [6]. II. proposed novel architecture of 4-2 compressor: In this proposed new compressor architecture the design of low power, high speed, delay and area of these new compressor architecture are compared with existing one. In this Modulo 2n+1 4-2 compressor, the novel architecture of 4-2 compressor [7, 8] [fig 2] as shown above is called for required number of times to perform the CCWA. Outputs are sum vector H0* & H1*. The MSB H1* is complimented and connected back to its LSB. The obtained results consisting of two diminished-1 values. III. Fig 2: modulo 2n+1 4-2 compressor. In this each full adder are broken into their constituent XOR blocks .. Both the Xor and Xnor values are computed efficiently used to reduce delay .This is due to availability of the selection bit at the mux block so that before the arrival of input. Thus the time required for switching of transistors is reduced. BUTTERFLY OPERATION WITHOUT ADDITION BOWA is the one of operation performed in FNT after the CCWA has been performed. It consists of two modulo 2n+1 4-2 compressors, a multiplier and some inverters as shown below in fig 4. It can be performed without carry propagation chain so as to reduce delay and area. Here the designed low power 42 compressor of novel architecture thus the power generated will be less. Fig 4: Butterfly operation without Addition K*,L*,M*,N* are corresponding to two inputs and two outputs of previous BO in the diminished-1 www.ijera.com 1204 | P a g e
  • 3. T.Jyothsna et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207 www.ijera.com number system respectively. IV. MODULO 2N+1 PARTIAL PRODUCT MULTIPLIER After performing the calculations of CCWA and BOWA both of them will produce the carry-save order then MPPM[10] is used perform point wise multiplications so that final carry-propagation addition of two partial products in multiplier is avoided therefore the execution delay will be reduced. Modulo2n+1multiplier is proposed by Efstathiou, there are n+3 partial products. An full adder based Daddatree [7] reduces the n+3 partial products into two summands. Dadda multiplier is faster than other multipliers therefore it gives the fast performance than other multipliers. In the proposed cyclic convolution based on FNT of parallel architecture, the BOWA accepts four operands in diminished-1 number system. Every point wise multiplication produce two partial products rather than one product. It takes away the final modulo 2n+1 adder of two partial products in the multiplier thus the final modulo 2n+1 adder is removed and modulo 2n+1 partial product multiplier is used to save the area and delay. V. PARALLEL ARCHITECTURE OF CYCLIC CONVOLUTION Parallel architecture of cyclic convolution for cyclic is designed by using CCWA, BOWA and MPPM as shown below. Point wise multiplication and generates N pair of partial products. Later IFNT of partial products are performed to produce sequence {Pi} of the cyclic convolution. (a)Parallel FNT structure b) Parallel IFNT structure Fig6: Structures for FNT and IFNT (Ft=28+1) It has log2N+1 stages of operations. The efficient FNT structure involves log2N+ 1 stages of operations. The original operands are converted into the diminished-1 representation in the CCWA stage, containing the information of modulo 2n+1 addition or subtraction in the first butterfly operation stage of the previous FNT structure. Then the results are sent to the next stage of BOWA. After log2n-1 stages of BOWAs, the results composed of two diminished-1 operands are obtained. The final stage of FNT consists of modulo 2n+1 carry-propagation adders which are used to evaluate the final results in the diminished-1 representation. Implementation: 4-2 compressor and also for the existing architecture in order simulate both the codes and compare the low power calculations for both the architectures. All this has been done as follows. Verilog code is written and then simulated using QuestaSim tool from Mentor Graphics. The System-on-Chip (SOC) approach is adopted using Cadence tools, SOC Encounter software the Power and Area Analysis is done and reduced by Xilinx Xpower/RTL Precision Synthesistool and the power and area is optimized. RESULTS OF COMPRESSOR: OLD FULL ADDER Fig 5: Parallel Architecture of cyclic convolution based on FNT [9]. It consists of Two FNTS , IFNT and point wise multiplication modulo 2n+1 .It has two input sequences {ai} and {bi} produce two sequences {Ai}and {Bi} (i=1, 2 …N- 1). Sequences {Ai} and {Bi}, then AI and Bi applied to N MPPM to perform the point wise multiplication and generates N pair of partial products. Later IFNT of partial products are performed to produce sequence {Pi} of the cyclic convolution. www.ijera.com Fig7: results of old full adder 4-2 compressor 1205 | P a g e 4-2
  • 4. T.Jyothsna et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207 www.ijera.com Existing compressor RTL POWER Fig8: Schematic of old full adder 4-2 compressor Fig11: Existing compressor RTL POWER REULTS OF PROPOSED 4-2 COMPRESSOR Proposed 4-2 compressor RTL POWER Fig9: Results of proposed 4-2 compressor Fig12: Proposed 4-2 compressor RTL POWER The schematic of FNT architecture in Questasim software Fig10: Schematic 4-2 compressor Fig13: The schematic of FNT architecture in Questasim software www.ijera.com 1206 | P a g e
  • 5. T.Jyothsna et al Int. Journal of Engineering Research and Application ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct 2013, pp.1203-1207 www.ijera.com REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] J. G. Proakis and D. G. Manolakis, Digital signal processing: principles, algorithms, and applications, Prentice Hall, New Jersey, 2007. A. B. O'Donnell, C. J. Bleakley, “Area efficient fault tolerant convolution using RRNS with NTTs and WSCA”, Electronics Letters, 2008, 44(10), pp.648-649 R. Conway, “Modified Overlap Technique Using Fermat and Mersenne Transforms”, IEEE Trans. Circuits and Systems II: Express Briefs, 2006, 53(8), pp.632 – 636 H. H. Alaeddine, E. H. Baghious and G. Madre et al., “Realization of multi-delay filter using Fermat number transforms”, IEICE Trans. Fundamentals, 2008, E91A(9), pp. 2571-2577 L. M. Leibowitz, “A simplified binary arithmetic for the Fermat number transform,” IEEE Trans. Acoustics Speech and Signal Processing, 1976, 24(5):356-359 C. Cheng, K.K. Parhi, “Hardware efficient fast DCT based on novel cyclic convolution structures”, IEEE Trans. Signal processing, 2006, 54(11), pp. 4419- 4434 K. Prasad and K. K. Parhi, “Low-power 4-2 and 5-2 compressors,” in Proc. of the 35th Asilomar Conf. On Signals, Systems and Computers, vol. 1, 2001, pp. 129–133. C. H. Chang, J. Gu, M. Zhang, “Ultra lowvoltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits” IEEE Transactions on Circuits and Systems I: Regular Papers, Volume 51, Issue 10, Oct. 2004 Page(s):1985 – 1997 H. T. Vergos, C. Efstathiou, D. Nikolos, “Diminishedone modulo 2n + 1 adder design”, IEEE Trans. Computers, 2002, 51(12), pp. 1389-1399 C. Efstathiou, H. Vergos, G. Dimitrakopoulos, et al., “Efficient diminished-1 modulo 2n + 1 multipliers”, IEEE Trans. Computers, 2005, 54(4), pp. 491-496 M. Nagamatsu, S. Tanaka, J. Mori, et. al. “15ns 32 × 32-b CMOS multiplier with an improved parallel structure”, IEEE Journal of Solid-State Circuits, 1990, 25(2), pp. 494-497 BOOKS Essentials of VLSI Circuits and systems by Kamran Eshraghian, Douglas A. Pucknell, SholehEshraghian. Modern VLSI Design, System - on - Chip Design Third Edition Wayne Wolf, Pearson Education. Barbe, D.F. (Ed.) (1982) Very Large Scale Integration – Fundamentals and applications, Springer - Verlag, West Germany / USA. www.ijera.com 1207 | P a g e