SlideShare a Scribd company logo
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
LUT Optimization for Distributed Arithmetic-Based
Block Least Mean Square Adaptive Filter
Abstract:
In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)-
based block least mean square (BLMS) adaptive filter (ADF) and based on that we propose intra-
iteration LUT sharing to reduce its hardware resources, energy consumption, and iteration
period. The proposed LUT optimization scheme offers a saving of 60% LUT content for block
size 8 and still higher saving for larger block sizes over the conventional design approach. The
proposed architecture of this paper analysis the logic size, area and power consumption using
Xilinx 14.2.
Enhancement of the project:
Existing System:
Distributed arithmetic (DA)-based design approach has been proposed to derive low-complexity
hardware structures for ADFs. The DA-based ADF uses lookup tables (LUTs) for the calculation
of filter output and weight-increment terms, which constitute most of its hardware resources. The
DA-based LMS ADF structure of uses two separate LUTs for the calculation of filter output and
weight-increment terms. Few design schemes have been suggested in recent past for efficient
realization of LMS ADF in FPGA.
A DA-based pipelined structure is proposed for the realization of delayed LMS ADF with low
adaptation delay. Subsequently, another DA-based design has been proposed for LMS ADFs,
where a single LUT is used to perform both filtering and weight-updating and a parallel LUT-
update method is used to reduce LUT-update time. Carry-save accumulation is used to further
reduce the iteration period of the DA-based LMS structure. A few DA-based designs have also
been proposed for the FPGA realization of BLMS ADF. We have proposed a DA structure for
BLMS ADF. Although many DA-based designs have been suggested for LMS- and BLMS-
based ADF, we do not find any LUT optimization scheme in the literature specific to BLMS
DA-LUT. In this paper, we have made an analysis of intra-iteration LUT contents of DA-based
BLMS ADF design to find the redundant LUT words which could be shared to minimize
hardware resources, the number of LUT accesses, energy consumption and iteration period.
Disadvantages:
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
 The LUT size is large
 LUT-update is complex
Proposed System:
Allred et al. have identified the LUT redundancy corresponding to successive iterations of the
DA-based LMS ADF, and based on that the half of the auxiliary LUT contents is updated. No
LUT optimization scheme, however, has been proposed to take advantage of redundant LUT
values in the DA-LMS computation. We observe that, in DA-based LMS ADF, the redundant
LUT values belong to different processing cycles and they need to be stored in LUT or outside
LUT, which consumes the same amount of resource. Therefore, the redundant LUT values of
DA-based LMS do not offer LUT optimization except LUT words to be updated. However, in
the case of DA-based BLMS ADF, the redundant LUT values of L successive iterations are
created within a processing cycle, which allow the possibility of LUT optimization, where L is
the block size.
Conventionally, 16 NP LUT words are required to implement NP LUTs of the LU matrix. For
filter length N = 16, 256 LUT words are required to implement the LU matrix for L = 4. The
contents of LU matrix of BLMS filter for block size L = 4 are shown in Fig. 1. The LUT content
is represented by function E(.), which enumerates a sum of 16 possible combination of an input
vector.
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Fig. 1. LUT content of the LU matrix of block size L = 4 for four consecutive iterations [kth, (k + 1)th, (k
+ 2)th, and (k + 3)th]. Light gray color LUTs of successive iteration with identical content. The input
argument s i,0 k for 0 ≤ i ≤ 3 of the first column of LU is defined for the kth iteration input-block {x(n) →
x(n − 3)}, where n = k L. {x(n) → x(n − 3)}: input sequence {x(n), x(n − 1), x(n − 2), x(n − 3)}. Gray
color: succeeding LUTs with overlapped input vectors.
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Intra-iteration LUT Sharing
The LUT content depends on the argument (sij
k,p) of the LUT enumeration function E which
does not change during an iteration. We analyze the arguments (sij
k,p) corresponding to one
column of the LU matrix to find the redundant values in the LUTs of one column of LU.
Inter-iteration LUT Reuse
As shown in Fig. 1, The LUT contents of the first (M − 1) columns of LUs of any given iteration
can be reused by the last (M − 1) columns of LUs during the next iteration, which need not be
updated.
Proposed Design Strategy
The entire LUT content needs to be available in the same cycle for the sharing of LUT words.
The conventional RAM-based LUTs are not suitable for LUT sharing, since in any given cycle,
they allow access to only one (or a few in the case of multiported RAM) of the stored LUT
values. A register-based LUT (REG-LUT) could be used instead for the proposed DA-based
design.
Based on these facts, we have arrived at the following design strategy to derive an area-delay-
power efficient structure for the DA-based BLMS ADF.
1) The register-based shared LUT is used instead of the conventional RAM-based LUT to
exploit intra-iteration LUT sharing.
2) Based on the inter-iteration LUT reuse provision of BLMS ADF only one column out of
(N/L) columns of the LU matrix is updated in every iteration.
3) A full-parallel design for LUT-update unit is used to generate update values of one LU
column to update its contents in one cycle.
The proposed structure is similar to the structure of at block level. However, the internal
structures of LUT-update block and processing element (PE) of the DA module are different than
that of due to shared LUTs used in the proposed design.
The structure of the DA module of the proposed structure is shown in Fig. 2. Each PE of the DA-
module uses REG-LUTs instead of RAM-LUTs as in the case to make the use of the LUT
sharing property. It requires only (16L − 25) registers instead of 16P L RAM words as
required.The LUT-update unit of the DA-module of the proposed structure computes a set of
(16L−25) values to update LUTs of a PE in one cycle against 16 cycles required.
A High-Performance FIR Filter Architecture for Fixed
and Reconfigurable Applications
Fig. 2. Structure of DA module of the proposed DA BLMS ADF of filter length N and block size
L, where N = M L.
Advantages:
 reduce the LUT-size
 reduce LUT-update complexity
Software implementation:
 Modelsim
 Xilinx ISE

More Related Content

What's hot (20)

PDF
A Parallel Packed Memory Array to Store Dynamic Graphs
Subhajit Sahu
 
PDF
Ad04606184188
IJERA Editor
 
PDF
HIGH SPEED MULTIPLE VALUED LOGIC FULL ADDER USING CARBON NANO TUBE FIELD EFFE...
VLSICS Design
 
PPT
Protein structure alignment beyond spatial proximity 3 dsig_2012
Sheng Wang
 
PDF
B1030610
IJERD Editor
 
PDF
Domain adaptation
Tomoya Koike
 
DOCX
Analysis of parallel algorithms for energy consumption
are you
 
PDF
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
PPT
Brian
Daksh Bapna
 
PPTX
Floor planning
shaik sharief
 
PDF
A survey of low power wallace and dadda multipliers using different logic ful...
eSAT Journals
 
PDF
Energy efficient resources allocations for wireless communication systems
TELKOMNIKA JOURNAL
 
PDF
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
VLSICS Design
 
PDF
Learning global pooling operators in deep neural networks for image retrieval...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
PDF
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
PDF
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
Kyong-Ha Lee
 
PDF
Modified montgomery modular multiplier for cryptosystems
IAEME Publication
 
PDF
HIGH PERFORMANCE SPLIT RADIX FFT
AM Publications
 
DOCX
FractalTreeIndex
Akhil M Sreenath
 
PDF
Design & Implementation of LUT Based Multiplier Using APCOMS Technique
ijsrd.com
 
A Parallel Packed Memory Array to Store Dynamic Graphs
Subhajit Sahu
 
Ad04606184188
IJERA Editor
 
HIGH SPEED MULTIPLE VALUED LOGIC FULL ADDER USING CARBON NANO TUBE FIELD EFFE...
VLSICS Design
 
Protein structure alignment beyond spatial proximity 3 dsig_2012
Sheng Wang
 
B1030610
IJERD Editor
 
Domain adaptation
Tomoya Koike
 
Analysis of parallel algorithms for energy consumption
are you
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
Floor planning
shaik sharief
 
A survey of low power wallace and dadda multipliers using different logic ful...
eSAT Journals
 
Energy efficient resources allocations for wireless communication systems
TELKOMNIKA JOURNAL
 
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
VLSICS Design
 
Learning global pooling operators in deep neural networks for image retrieval...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
Kyong-Ha Lee
 
Modified montgomery modular multiplier for cryptosystems
IAEME Publication
 
HIGH PERFORMANCE SPLIT RADIX FFT
AM Publications
 
FractalTreeIndex
Akhil M Sreenath
 
Design & Implementation of LUT Based Multiplier Using APCOMS Technique
ijsrd.com
 

Similar to Lut optimization for distributed arithmetic based block least mean square adaptive filter (20)

PDF
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
IOSR Journals
 
PDF
D0341015020
inventionjournals
 
PDF
06340356
Narava Vivek
 
PDF
Review on Implementation of Fir Adaptive Filter Using Distributed Arithmatic...
IJMER
 
PDF
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
PDF
Vlsi design process for low power design methodology using reconfigurable fpga
eSAT Publishing House
 
PDF
Vlsi design process for low power design methodology using reconfigurable fpga
eSAT Journals
 
PDF
Ijecet 06 10_004
IAEME Publication
 
PDF
Parallel Processing Technique for Time Efficient Matrix Multiplication
IJERA Editor
 
PDF
Implementation and Impact of LNS MAC Units in Digital Filter Application
IJTET Journal
 
PDF
Design and implementation of DA FIR filter for bio-inspired computing archite...
IJECEIAES
 
PDF
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
VLSICS Design
 
PDF
Ad4103173176
IJERA Editor
 
PDF
Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay
Editor IJMTER
 
PDF
Low power tool paper
M Madan Gopal
 
PDF
Design and Verification of Area Efficient Carry Select Adder
ijsrd.com
 
PDF
A REVIEW OF THE 0.09 µm STANDARD FULL ADDERS
VLSICS Design
 
PDF
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
ijaceeejournal
 
PDF
paper3
Hammad Salam
 
PDF
PERFORMANCE ANALYSIS OF RESOURCE SCHEDULING IN LTE FEMTOCELLS NETWORKS
cscpconf
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
IOSR Journals
 
D0341015020
inventionjournals
 
06340356
Narava Vivek
 
Review on Implementation of Fir Adaptive Filter Using Distributed Arithmatic...
IJMER
 
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Vlsi design process for low power design methodology using reconfigurable fpga
eSAT Publishing House
 
Vlsi design process for low power design methodology using reconfigurable fpga
eSAT Journals
 
Ijecet 06 10_004
IAEME Publication
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
IJERA Editor
 
Implementation and Impact of LNS MAC Units in Digital Filter Application
IJTET Journal
 
Design and implementation of DA FIR filter for bio-inspired computing archite...
IJECEIAES
 
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
VLSICS Design
 
Ad4103173176
IJERA Editor
 
Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay
Editor IJMTER
 
Low power tool paper
M Madan Gopal
 
Design and Verification of Area Efficient Carry Select Adder
ijsrd.com
 
A REVIEW OF THE 0.09 µm STANDARD FULL ADDERS
VLSICS Design
 
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
ijaceeejournal
 
paper3
Hammad Salam
 
PERFORMANCE ANALYSIS OF RESOURCE SCHEDULING IN LTE FEMTOCELLS NETWORKS
cscpconf
 
Ad

Recently uploaded (20)

PDF
Van Aroma IFEAT - Clove Oils - Socio Economic Report .pdf
VanAroma
 
PDF
Why Unipac Equipment Leads the Way Among Gantry Crane Manufacturers in Singap...
UnipacEquipment
 
PDF
Connecting Startups to Strategic Global VC Opportunities.pdf
Google
 
PDF
MSOL's corporate profile materials_______
Management Soluions co.,ltd.
 
PPTX
DECODING AI AGENTS AND WORKFLOW AUTOMATION FOR MODERN RECRUITMENT
José Kadlec
 
PDF
Factors Influencing Demand For Plumbers In Toronto GTA:
Homestars
 
DOCX
How to Choose the Best Dildo for Men A Complete Buying Guide.docx
Glas Toy
 
PDF
CBV - GST Collection Report V16. pdf.
writer28
 
PPTX
6 Critical Factors to Evaluate Before Starting a Retail Business
RUPAL AGARWAL
 
PDF
From Legacy to Velocity: how we rebuilt everything in 8 months.
Product-Tech Team
 
PDF
Importance of Timely Renewal of Legal Entity Identifiers.pdf
MNS Credit Management Group Pvt. Ltd.
 
PDF
LDM Recording for Yogi Goddess Projects Summer 2025
LDMMia GrandMaster
 
PDF
Blind Spots in Business: Unearthing Hidden Challenges in Today's Organizations
Crimson Business Consulting
 
PDF
Flexible Metal Hose & Custom Hose Assemblies
McGill Hose & Coupling Inc
 
PDF
Dr. Enrique Segura Ense Group - A Philanthropist And Entrepreneur
Dr. Enrique Segura Ense Group
 
PDF
kcb-group-plc-2024-integrated-report-and-financial-statements (3).pdf
DanielNdegwa10
 
PPTX
LESSON2.Uniquesellingpropositionandvalueproposition-180725234133.pptx
dioselasolidor1
 
PPTX
Why-Your-BPO-Startup-Must-Track-Attrition-from-Day-One.pptx.pptx
Orage technologies
 
PDF
15 Essential Cloud Podcasts Every Tech Professional Should Know in 2025
Amnic
 
PDF
Raman Bhaumik - A Passion For Service
Raman Bhaumik
 
Van Aroma IFEAT - Clove Oils - Socio Economic Report .pdf
VanAroma
 
Why Unipac Equipment Leads the Way Among Gantry Crane Manufacturers in Singap...
UnipacEquipment
 
Connecting Startups to Strategic Global VC Opportunities.pdf
Google
 
MSOL's corporate profile materials_______
Management Soluions co.,ltd.
 
DECODING AI AGENTS AND WORKFLOW AUTOMATION FOR MODERN RECRUITMENT
José Kadlec
 
Factors Influencing Demand For Plumbers In Toronto GTA:
Homestars
 
How to Choose the Best Dildo for Men A Complete Buying Guide.docx
Glas Toy
 
CBV - GST Collection Report V16. pdf.
writer28
 
6 Critical Factors to Evaluate Before Starting a Retail Business
RUPAL AGARWAL
 
From Legacy to Velocity: how we rebuilt everything in 8 months.
Product-Tech Team
 
Importance of Timely Renewal of Legal Entity Identifiers.pdf
MNS Credit Management Group Pvt. Ltd.
 
LDM Recording for Yogi Goddess Projects Summer 2025
LDMMia GrandMaster
 
Blind Spots in Business: Unearthing Hidden Challenges in Today's Organizations
Crimson Business Consulting
 
Flexible Metal Hose & Custom Hose Assemblies
McGill Hose & Coupling Inc
 
Dr. Enrique Segura Ense Group - A Philanthropist And Entrepreneur
Dr. Enrique Segura Ense Group
 
kcb-group-plc-2024-integrated-report-and-financial-statements (3).pdf
DanielNdegwa10
 
LESSON2.Uniquesellingpropositionandvalueproposition-180725234133.pptx
dioselasolidor1
 
Why-Your-BPO-Startup-Must-Track-Attrition-from-Day-One.pptx.pptx
Orage technologies
 
15 Essential Cloud Podcasts Every Tech Professional Should Know in 2025
Amnic
 
Raman Bhaumik - A Passion For Service
Raman Bhaumik
 
Ad

Lut optimization for distributed arithmetic based block least mean square adaptive filter

  • 1. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter Abstract: In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)- based block least mean square (BLMS) adaptive filter (ADF) and based on that we propose intra- iteration LUT sharing to reduce its hardware resources, energy consumption, and iteration period. The proposed LUT optimization scheme offers a saving of 60% LUT content for block size 8 and still higher saving for larger block sizes over the conventional design approach. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2. Enhancement of the project: Existing System: Distributed arithmetic (DA)-based design approach has been proposed to derive low-complexity hardware structures for ADFs. The DA-based ADF uses lookup tables (LUTs) for the calculation of filter output and weight-increment terms, which constitute most of its hardware resources. The DA-based LMS ADF structure of uses two separate LUTs for the calculation of filter output and weight-increment terms. Few design schemes have been suggested in recent past for efficient realization of LMS ADF in FPGA. A DA-based pipelined structure is proposed for the realization of delayed LMS ADF with low adaptation delay. Subsequently, another DA-based design has been proposed for LMS ADFs, where a single LUT is used to perform both filtering and weight-updating and a parallel LUT- update method is used to reduce LUT-update time. Carry-save accumulation is used to further reduce the iteration period of the DA-based LMS structure. A few DA-based designs have also been proposed for the FPGA realization of BLMS ADF. We have proposed a DA structure for BLMS ADF. Although many DA-based designs have been suggested for LMS- and BLMS- based ADF, we do not find any LUT optimization scheme in the literature specific to BLMS DA-LUT. In this paper, we have made an analysis of intra-iteration LUT contents of DA-based BLMS ADF design to find the redundant LUT words which could be shared to minimize hardware resources, the number of LUT accesses, energy consumption and iteration period. Disadvantages:
  • 2. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications  The LUT size is large  LUT-update is complex Proposed System: Allred et al. have identified the LUT redundancy corresponding to successive iterations of the DA-based LMS ADF, and based on that the half of the auxiliary LUT contents is updated. No LUT optimization scheme, however, has been proposed to take advantage of redundant LUT values in the DA-LMS computation. We observe that, in DA-based LMS ADF, the redundant LUT values belong to different processing cycles and they need to be stored in LUT or outside LUT, which consumes the same amount of resource. Therefore, the redundant LUT values of DA-based LMS do not offer LUT optimization except LUT words to be updated. However, in the case of DA-based BLMS ADF, the redundant LUT values of L successive iterations are created within a processing cycle, which allow the possibility of LUT optimization, where L is the block size. Conventionally, 16 NP LUT words are required to implement NP LUTs of the LU matrix. For filter length N = 16, 256 LUT words are required to implement the LU matrix for L = 4. The contents of LU matrix of BLMS filter for block size L = 4 are shown in Fig. 1. The LUT content is represented by function E(.), which enumerates a sum of 16 possible combination of an input vector.
  • 3. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Fig. 1. LUT content of the LU matrix of block size L = 4 for four consecutive iterations [kth, (k + 1)th, (k + 2)th, and (k + 3)th]. Light gray color LUTs of successive iteration with identical content. The input argument s i,0 k for 0 ≤ i ≤ 3 of the first column of LU is defined for the kth iteration input-block {x(n) → x(n − 3)}, where n = k L. {x(n) → x(n − 3)}: input sequence {x(n), x(n − 1), x(n − 2), x(n − 3)}. Gray color: succeeding LUTs with overlapped input vectors.
  • 4. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Intra-iteration LUT Sharing The LUT content depends on the argument (sij k,p) of the LUT enumeration function E which does not change during an iteration. We analyze the arguments (sij k,p) corresponding to one column of the LU matrix to find the redundant values in the LUTs of one column of LU. Inter-iteration LUT Reuse As shown in Fig. 1, The LUT contents of the first (M − 1) columns of LUs of any given iteration can be reused by the last (M − 1) columns of LUs during the next iteration, which need not be updated. Proposed Design Strategy The entire LUT content needs to be available in the same cycle for the sharing of LUT words. The conventional RAM-based LUTs are not suitable for LUT sharing, since in any given cycle, they allow access to only one (or a few in the case of multiported RAM) of the stored LUT values. A register-based LUT (REG-LUT) could be used instead for the proposed DA-based design. Based on these facts, we have arrived at the following design strategy to derive an area-delay- power efficient structure for the DA-based BLMS ADF. 1) The register-based shared LUT is used instead of the conventional RAM-based LUT to exploit intra-iteration LUT sharing. 2) Based on the inter-iteration LUT reuse provision of BLMS ADF only one column out of (N/L) columns of the LU matrix is updated in every iteration. 3) A full-parallel design for LUT-update unit is used to generate update values of one LU column to update its contents in one cycle. The proposed structure is similar to the structure of at block level. However, the internal structures of LUT-update block and processing element (PE) of the DA module are different than that of due to shared LUTs used in the proposed design. The structure of the DA module of the proposed structure is shown in Fig. 2. Each PE of the DA- module uses REG-LUTs instead of RAM-LUTs as in the case to make the use of the LUT sharing property. It requires only (16L − 25) registers instead of 16P L RAM words as required.The LUT-update unit of the DA-module of the proposed structure computes a set of (16L−25) values to update LUTs of a PE in one cycle against 16 cycles required.
  • 5. A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Fig. 2. Structure of DA module of the proposed DA BLMS ADF of filter length N and block size L, where N = M L. Advantages:  reduce the LUT-size  reduce LUT-update complexity Software implementation:  Modelsim  Xilinx ISE