SlideShare a Scribd company logo
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
LUT Optimization for Distributed Arithmetic-Based
Block Least Mean Square Adaptive Filter
Abstract:
In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)-
based block least mean square (BLMS) adaptive filter (ADF) and based on that we propose intra-
iteration LUT sharing to reduce its hardware resources, energy consumption, and iteration
period. The proposed LUT optimization scheme offers a saving of 60% LUT content for block
size 8 and still higher saving for larger block sizes over the conventional design approach. The
proposed architecture of this paper analysis the logic size, area and power consumption using
Xilinx 14.2.
Enhancement of the project:
Existing System:
Distributed arithmetic (DA)-based design approach has been proposed to derive low-complexity
hardware structures for ADFs. The DA-based ADF uses lookup tables (LUTs) for the calculation
of filter output and weight-increment terms, which constitute most of its hardware resources. The
DA-based LMS ADF structure of uses two separate LUTs for the calculation of filter output and
weight-increment terms. Few design schemes have been suggested in recent past for efficient
realization of LMS ADF in FPGA.
A DA-based pipelined structure is proposed for the realization of delayed LMS ADF with low
adaptation delay. Subsequently, another DA-based design has been proposed for LMS ADFs,
where a single LUT is used to perform both filtering and weight-updating and a parallel LUT-
update method is used to reduce LUT-update time. Carry-save accumulation is used to further
reduce the iteration period of the DA-based LMS structure. A few DA-based designs have also
been proposed for the FPGA realization of BLMS ADF. We have proposed a DA structure for
BLMS ADF. Although many DA-based designs have been suggested for LMS- and BLMS-
based ADF, we do not find any LUT optimization scheme in the literature specific to BLMS
DA-LUT. In this paper, we have made an analysis of intra-iteration LUT contents of DA-based
BLMS ADF design to find the redundant LUT words which could be shared to minimize
hardware resources, the number of LUT accesses, energy consumption and iteration period.
Disadvantages:
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
 The LUT size is large
 LUT-update is complex
Proposed System:
Allred et al. have identified the LUT redundancy corresponding to successive iterations of the
DA-based LMS ADF, and based on that the half of the auxiliary LUT contents is updated. No
LUT optimization scheme, however, has been proposed to take advantage of redundant LUT
values in the DA-LMS computation. We observe that, in DA-based LMS ADF, the redundant
LUT values belong to different processing cycles and they need to be stored in LUT or outside
LUT, which consumes the same amount of resource. Therefore, the redundant LUT values of
DA-based LMS do not offer LUT optimization except LUT words to be updated. However, in
the case of DA-based BLMS ADF, the redundant LUT values of L successive iterations are
created within a processing cycle, which allow the possibility of LUT optimization, where L is
the block size.
Conventionally, 16 NP LUT words are required to implement NP LUTs of the LU matrix. For
filter length N = 16, 256 LUT words are required to implement the LU matrix for L = 4. The
contents of LU matrix of BLMS filter for block size L = 4 are shown in Fig. 1. The LUT content
is represented by function E(.), which enumerates a sum of 16 possible combination of an input
vector.
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Fig. 1. LUT content of the LU matrix of block size L = 4 for four consecutive iterations [kth, (k + 1)th, (k
+ 2)th, and (k + 3)th]. Light gray color LUTs of successive iteration with identical content. The input
argument s i,0 k for 0 ≤ i ≤ 3 of the first column of LU is defined for the kth iteration input-block {x(n) →
x(n − 3)}, where n = k L. {x(n) → x(n − 3)}: input sequence {x(n), x(n − 1), x(n − 2), x(n − 3)}. Gray
color: succeeding LUTs with overlapped input vectors.
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Intra-iteration LUT Sharing
The LUT content depends on the argument (sij
k,p) of the LUT enumeration function E which
does not change during an iteration. We analyze the arguments (sij
k,p) corresponding to one
column of the LU matrix to find the redundant values in the LUTs of one column of LU.
Inter-iteration LUT Reuse
As shown in Fig. 1, The LUT contents of the first (M − 1) columns of LUs of any given iteration
can be reused by the last (M − 1) columns of LUs during the next iteration, which need not be
updated.
Proposed Design Strategy
The entire LUT content needs to be available in the same cycle for the sharing of LUT words.
The conventional RAM-based LUTs are not suitable for LUT sharing, since in any given cycle,
they allow access to only one (or a few in the case of multiported RAM) of the stored LUT
values. A register-based LUT (REG-LUT) could be used instead for the proposed DA-based
design.
Based on these facts, we have arrived at the following design strategy to derive an area-delay-
power efficient structure for the DA-based BLMS ADF.
1) The register-based shared LUT is used instead of the conventional RAM-based LUT to
exploit intra-iteration LUT sharing.
2) Based on the inter-iteration LUT reuse provision of BLMS ADF only one column out of
(N/L) columns of the LU matrix is updated in every iteration.
3) A full-parallel design for LUT-update unit is used to generate update values of one LU
column to update its contents in one cycle.
The proposed structure is similar to the structure of at block level. However, the internal
structures of LUT-update block and processing element (PE) of the DA module are different than
that of due to shared LUTs used in the proposed design.
The structure of the DA module of the proposed structure is shown in Fig. 2. Each PE of the DA-
module uses REG-LUTs instead of RAM-LUTs as in the case to make the use of the LUT
sharing property. It requires only (16L − 25) registers instead of 16P L RAM words as
required.The LUT-update unit of the DA-module of the proposed structure computes a set of
(16L−25) values to update LUTs of a PE in one cycle against 16 cycles required.
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Fig. 2. Structure of DA module of the proposed DA BLMS ADF of filter length N and block size
L, where N = M L.
Advantages:
 reduce the LUT-size
 reduce LUT-update complexity
Software implementation:
 Modelsim
 Xilinx ISE

More Related Content

What's hot (20)

PDF
A Parallel Packed Memory Array to Store Dynamic Graphs
Subhajit Sahu
 
PDF
Ad04606184188
IJERA Editor
 
PDF
HIGH SPEED MULTIPLE VALUED LOGIC FULL ADDER USING CARBON NANO TUBE FIELD EFFE...
VLSICS Design
 
PPT
Protein structure alignment beyond spatial proximity 3 dsig_2012
Sheng Wang
 
PDF
B1030610
IJERD Editor
 
PDF
Domain adaptation
Tomoya Koike
 
DOCX
Analysis of parallel algorithms for energy consumption
are you
 
PDF
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
PPT
Brian
Daksh Bapna
 
PPTX
Floor planning
shaik sharief
 
PDF
A survey of low power wallace and dadda multipliers using different logic ful...
eSAT Journals
 
PDF
Energy efficient resources allocations for wireless communication systems
TELKOMNIKA JOURNAL
 
PDF
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
VLSICS Design
 
PDF
Learning global pooling operators in deep neural networks for image retrieval...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
PDF
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
PDF
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
Kyong-Ha Lee
 
PDF
Modified montgomery modular multiplier for cryptosystems
IAEME Publication
 
PDF
HIGH PERFORMANCE SPLIT RADIX FFT
AM Publications
 
DOCX
FractalTreeIndex
Akhil M Sreenath
 
PDF
Design & Implementation of LUT Based Multiplier Using APCOMS Technique
ijsrd.com
 
A Parallel Packed Memory Array to Store Dynamic Graphs
Subhajit Sahu
 
Ad04606184188
IJERA Editor
 
HIGH SPEED MULTIPLE VALUED LOGIC FULL ADDER USING CARBON NANO TUBE FIELD EFFE...
VLSICS Design
 
Protein structure alignment beyond spatial proximity 3 dsig_2012
Sheng Wang
 
B1030610
IJERD Editor
 
Domain adaptation
Tomoya Koike
 
Analysis of parallel algorithms for energy consumption
are you
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
Floor planning
shaik sharief
 
A survey of low power wallace and dadda multipliers using different logic ful...
eSAT Journals
 
Energy efficient resources allocations for wireless communication systems
TELKOMNIKA JOURNAL
 
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
VLSICS Design
 
Learning global pooling operators in deep neural networks for image retrieval...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
Kyong-Ha Lee
 
Modified montgomery modular multiplier for cryptosystems
IAEME Publication
 
HIGH PERFORMANCE SPLIT RADIX FFT
AM Publications
 
FractalTreeIndex
Akhil M Sreenath
 
Design & Implementation of LUT Based Multiplier Using APCOMS Technique
ijsrd.com
 

Similar to Lut optimization for distributed arithmetic based block least mean square adaptive filter (20)

PDF
06340356
Narava Vivek
 
PDF
Design and implementation of DA FIR filter for bio-inspired computing archite...
IJECEIAES
 
DOCX
Novel design algorithm for low complexity
LogicMindtech Nologies
 
PDF
Review on Implementation of Fir Adaptive Filter Using Distributed Arithmatic...
IJMER
 
PDF
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
IOSR Journals
 
PDF
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
sipij
 
PDF
Design of Multiplier Less 32 Tap FIR Filter using VHDL
IJMER
 
DOCX
Novel design algorithm for low complexity programmable fir filters based on e...
jpstudcorner
 
PDF
FIR Filter Implementation by Systolization using DA-based Decomposition
IDES Editor
 
PDF
Memory Based Hardware Efficient Implementation of FIR Filters
Dr.SHANTHI K.G
 
DOCX
Novel design algorithm for low complexity programmable fir filters based on e...
I3E Technologies
 
PDF
Fl2510031009
IJERA Editor
 
PDF
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
PDF
An Efficient Reconfigurable Filter Design for Reducing Dynamic Power
Editor IJCATR
 
PDF
Area Efficient and high-speed fir filter implementation using divided LUT method
IJMER
 
PDF
Implementation and validation of multiplier less fpga based digital filter
IAEME Publication
 
PDF
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
 
PDF
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
 
DOCX
A high performance fir filter architecture for
LogicMindtech Nologies
 
PDF
B43030508
IJERA Editor
 
06340356
Narava Vivek
 
Design and implementation of DA FIR filter for bio-inspired computing archite...
IJECEIAES
 
Novel design algorithm for low complexity
LogicMindtech Nologies
 
Review on Implementation of Fir Adaptive Filter Using Distributed Arithmatic...
IJMER
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
IOSR Journals
 
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
sipij
 
Design of Multiplier Less 32 Tap FIR Filter using VHDL
IJMER
 
Novel design algorithm for low complexity programmable fir filters based on e...
jpstudcorner
 
FIR Filter Implementation by Systolization using DA-based Decomposition
IDES Editor
 
Memory Based Hardware Efficient Implementation of FIR Filters
Dr.SHANTHI K.G
 
Novel design algorithm for low complexity programmable fir filters based on e...
I3E Technologies
 
Fl2510031009
IJERA Editor
 
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
An Efficient Reconfigurable Filter Design for Reducing Dynamic Power
Editor IJCATR
 
Area Efficient and high-speed fir filter implementation using divided LUT method
IJMER
 
Implementation and validation of multiplier less fpga based digital filter
IAEME Publication
 
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
 
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
 
A high performance fir filter architecture for
LogicMindtech Nologies
 
B43030508
IJERA Editor
 
Ad

Recently uploaded (20)

PDF
Smart Lead Magnet Review: Effortless Email List Growth with Automated Funnels...
Larry888358
 
PPTX
Top Oil and Gas Companies in India Fuelling the Nation’s Growth.pptx
Essar Group
 
PDF
From Legacy to Velocity: how we rebuilt everything in 8 months.
Product-Tech Team
 
PDF
Agriculture Machinery PartsAgriculture Machinery Parts
mizhanw168
 
PDF
Securiport - A Global Leader
Securiport
 
PPTX
2025 July - ABM for B2B in Hubspot - Demand Gen HUG.pptx
mjenkins13
 
PDF
Jordan Minnesota City Codes and Ordinances
Forklift Trucks in Minnesota
 
PDF
Gabino Barbosa - A Master Of Efficiency
Gabino Barbosa
 
PPTX
Technical Analysis of 1st Generation Biofuel Feedstocks - 25th June 2025
TOFPIK
 
PPTX
Key Neurology Coding Changes Every Physician Should Know (1).pptx
alicecarlos1
 
DOCX
TCP Communication Flag Txzczczxcxzzxypes.docx
esso24
 
PDF
kcb-group-plc-2024-integrated-report-and-financial-statements (3).pdf
DanielNdegwa10
 
PPTX
World First Cardiovascular & Thoracic CT Scanner
arineta37
 
PDF
SUMMER SAFETY FLYER SPECIAL Q3 - 16 Pages
One Source Industrial Supplies
 
PPTX
25 Future Mega Trends Reshaping the World in 2025 and Beyond
presentifyai
 
PDF
"Complete Guide to the Partner Visa 2025
Zealand Immigration
 
PDF
HOW TO RECOVER LOST CRYPTOCURRENCY - VISIT iBOLT CYBER HACKER COMPANY
diegovalentin771
 
PDF
BeMetals_Presentation_July_2025 .pdf
DerekIwanaka2
 
PPTX
Why-Your-BPO-Startup-Must-Track-Attrition-from-Day-One.pptx.pptx
Orage technologies
 
PDF
FastnersFastnersFastnersFastnersFastners
mizhanw168
 
Smart Lead Magnet Review: Effortless Email List Growth with Automated Funnels...
Larry888358
 
Top Oil and Gas Companies in India Fuelling the Nation’s Growth.pptx
Essar Group
 
From Legacy to Velocity: how we rebuilt everything in 8 months.
Product-Tech Team
 
Agriculture Machinery PartsAgriculture Machinery Parts
mizhanw168
 
Securiport - A Global Leader
Securiport
 
2025 July - ABM for B2B in Hubspot - Demand Gen HUG.pptx
mjenkins13
 
Jordan Minnesota City Codes and Ordinances
Forklift Trucks in Minnesota
 
Gabino Barbosa - A Master Of Efficiency
Gabino Barbosa
 
Technical Analysis of 1st Generation Biofuel Feedstocks - 25th June 2025
TOFPIK
 
Key Neurology Coding Changes Every Physician Should Know (1).pptx
alicecarlos1
 
TCP Communication Flag Txzczczxcxzzxypes.docx
esso24
 
kcb-group-plc-2024-integrated-report-and-financial-statements (3).pdf
DanielNdegwa10
 
World First Cardiovascular & Thoracic CT Scanner
arineta37
 
SUMMER SAFETY FLYER SPECIAL Q3 - 16 Pages
One Source Industrial Supplies
 
25 Future Mega Trends Reshaping the World in 2025 and Beyond
presentifyai
 
"Complete Guide to the Partner Visa 2025
Zealand Immigration
 
HOW TO RECOVER LOST CRYPTOCURRENCY - VISIT iBOLT CYBER HACKER COMPANY
diegovalentin771
 
BeMetals_Presentation_July_2025 .pdf
DerekIwanaka2
 
Why-Your-BPO-Startup-Must-Track-Attrition-from-Day-One.pptx.pptx
Orage technologies
 
FastnersFastnersFastnersFastnersFastners
mizhanw168
 
Ad

Lut optimization for distributed arithmetic based block least mean square adaptive filter

  • 1. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter Abstract: In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)- based block least mean square (BLMS) adaptive filter (ADF) and based on that we propose intra- iteration LUT sharing to reduce its hardware resources, energy consumption, and iteration period. The proposed LUT optimization scheme offers a saving of 60% LUT content for block size 8 and still higher saving for larger block sizes over the conventional design approach. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2. Enhancement of the project: Existing System: Distributed arithmetic (DA)-based design approach has been proposed to derive low-complexity hardware structures for ADFs. The DA-based ADF uses lookup tables (LUTs) for the calculation of filter output and weight-increment terms, which constitute most of its hardware resources. The DA-based LMS ADF structure of uses two separate LUTs for the calculation of filter output and weight-increment terms. Few design schemes have been suggested in recent past for efficient realization of LMS ADF in FPGA. A DA-based pipelined structure is proposed for the realization of delayed LMS ADF with low adaptation delay. Subsequently, another DA-based design has been proposed for LMS ADFs, where a single LUT is used to perform both filtering and weight-updating and a parallel LUT- update method is used to reduce LUT-update time. Carry-save accumulation is used to further reduce the iteration period of the DA-based LMS structure. A few DA-based designs have also been proposed for the FPGA realization of BLMS ADF. We have proposed a DA structure for BLMS ADF. Although many DA-based designs have been suggested for LMS- and BLMS- based ADF, we do not find any LUT optimization scheme in the literature specific to BLMS DA-LUT. In this paper, we have made an analysis of intra-iteration LUT contents of DA-based BLMS ADF design to find the redundant LUT words which could be shared to minimize hardware resources, the number of LUT accesses, energy consumption and iteration period. Disadvantages:
  • 2. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications  The LUT size is large  LUT-update is complex Proposed System: Allred et al. have identified the LUT redundancy corresponding to successive iterations of the DA-based LMS ADF, and based on that the half of the auxiliary LUT contents is updated. No LUT optimization scheme, however, has been proposed to take advantage of redundant LUT values in the DA-LMS computation. We observe that, in DA-based LMS ADF, the redundant LUT values belong to different processing cycles and they need to be stored in LUT or outside LUT, which consumes the same amount of resource. Therefore, the redundant LUT values of DA-based LMS do not offer LUT optimization except LUT words to be updated. However, in the case of DA-based BLMS ADF, the redundant LUT values of L successive iterations are created within a processing cycle, which allow the possibility of LUT optimization, where L is the block size. Conventionally, 16 NP LUT words are required to implement NP LUTs of the LU matrix. For filter length N = 16, 256 LUT words are required to implement the LU matrix for L = 4. The contents of LU matrix of BLMS filter for block size L = 4 are shown in Fig. 1. The LUT content is represented by function E(.), which enumerates a sum of 16 possible combination of an input vector.
  • 3. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Fig. 1. LUT content of the LU matrix of block size L = 4 for four consecutive iterations [kth, (k + 1)th, (k + 2)th, and (k + 3)th]. Light gray color LUTs of successive iteration with identical content. The input argument s i,0 k for 0 ≤ i ≤ 3 of the first column of LU is defined for the kth iteration input-block {x(n) → x(n − 3)}, where n = k L. {x(n) → x(n − 3)}: input sequence {x(n), x(n − 1), x(n − 2), x(n − 3)}. Gray color: succeeding LUTs with overlapped input vectors.
  • 4. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Intra-iteration LUT Sharing The LUT content depends on the argument (sij k,p) of the LUT enumeration function E which does not change during an iteration. We analyze the arguments (sij k,p) corresponding to one column of the LU matrix to find the redundant values in the LUTs of one column of LU. Inter-iteration LUT Reuse As shown in Fig. 1, The LUT contents of the first (M − 1) columns of LUs of any given iteration can be reused by the last (M − 1) columns of LUs during the next iteration, which need not be updated. Proposed Design Strategy The entire LUT content needs to be available in the same cycle for the sharing of LUT words. The conventional RAM-based LUTs are not suitable for LUT sharing, since in any given cycle, they allow access to only one (or a few in the case of multiported RAM) of the stored LUT values. A register-based LUT (REG-LUT) could be used instead for the proposed DA-based design. Based on these facts, we have arrived at the following design strategy to derive an area-delay- power efficient structure for the DA-based BLMS ADF. 1) The register-based shared LUT is used instead of the conventional RAM-based LUT to exploit intra-iteration LUT sharing. 2) Based on the inter-iteration LUT reuse provision of BLMS ADF only one column out of (N/L) columns of the LU matrix is updated in every iteration. 3) A full-parallel design for LUT-update unit is used to generate update values of one LU column to update its contents in one cycle. The proposed structure is similar to the structure of at block level. However, the internal structures of LUT-update block and processing element (PE) of the DA module are different than that of due to shared LUTs used in the proposed design. The structure of the DA module of the proposed structure is shown in Fig. 2. Each PE of the DA- module uses REG-LUTs instead of RAM-LUTs as in the case to make the use of the LUT sharing property. It requires only (16L − 25) registers instead of 16P L RAM words as required.The LUT-update unit of the DA-module of the proposed structure computes a set of (16L−25) values to update LUTs of a PE in one cycle against 16 cycles required.
  • 5. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Fig. 2. Structure of DA module of the proposed DA BLMS ADF of filter length N and block size L, where N = M L. Advantages:  reduce the LUT-size  reduce LUT-update complexity Software implementation:  Modelsim  Xilinx ISE