SlideShare a Scribd company logo
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Approximate Sum-of-Products Designs Based on Distributed Arithmetic
Abstract:
Approximate circuits provide high performance and require low power. Sum-of-products
(SOP) units are key elements in many digital signal processing applications. In this brief,
three approximate SOP (ASOP) models which are based on the distributed arithmetic are
proposed. They are designed for different levels of accuracy. First model of ASOP
achieves an improvement up to 64% on area and 70% on power, when compared with
conventional unit. Other two models provide an improvement of 32% and 48% on area
and 54% and 58% on power, respectively, with a reduced error rate compared with the
first model. Third model achieves the mean relative error and normalized error distance
as low as 0.05% and 0.009%, respectively. Performance of approximate units is evaluated
with a noisy image smoothing application, where the proposed models are capable of
achieving higher peak signal to-noise ratio than the existing state-of-the-art techniques. It
is shown that the proposed approximate models achieve higher processing accuracy than
existing works but with significant improvements in power and performance.
Software Implementation:
 Modelsim
 Xilinx 14.2
Existing System:
Approximate computing provides an efficient solution for the design of power efficient
digital systems. For applications, such as multimedia and data processing, approximate
circuits play an important role as a promising alternative for reducing area and power in
digital systems that can tolerate some loss of precision. As one of the key components in
arithmetic circuits, sum-of products (SOP) units have received less attention in terms of
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
approximate implementation. Distributed arithmetic is a very efficient means for
calculation of the inner products between vectors.
It implements multiplication by doing a series of table-lookups and shift-and-accumulate
operations. Due to the flexibility of the level of parallelism in the distributed arithmetic
structure, the area-speed tradeoff can be adjusted. Distributed arithmetic is a bit-serial
operation that computes the inner product of two vectors in parallel. It requires no
multiplication and it has an efficient mechanism to perform the SOP operation. Bit-
parallel versions of distributed arithmetic are proposed. In this brief, three models of SOP
units based on parallel distributed arithmetic are proposed. Their scheme simply involves
truncation in the number of lookup tables, by eliminating the least significant part of the
distributed arithmetic operation. Multipliers have been extensively studied for
approximate implementation. Two models of approximate compressors with reduced
erroneous outputs to accumulate partial products of the Dadda tree multiplier.
The probability-based multiplier is based on the altering the partial products and reducing
the generated partial product tree based on their probability. In partial product perforation
(PPP) multiplier reduces k partial products starting from j th
position, which in turn
reduces the number of adders used in the accumulation of partial products. In this brief,
the novel ASOP designs are proposed using the efficient distributed arithmetic structure.
Approximation involves changes with respect to word length, number of lookup tables,
and number of elements in the final accumulator. Three models are proposed. First model
provides significant power reduction with lower mean relative error (MRE) and
normalized error distance (NED).
Second and third models with increased area and power compared to first model provide
better accuracy. In the proposed approximate structures, reductions in the number of
lookup tables, length of adders, and accumulator size are employed for approximation.
Compared to the exact SOP unit, the proposed models have reduced circuit complexity.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
NED is an effective metric to quantify the approximation irrespective of the size of the
circuit.
Also, traditional MRE error metric is used to evaluate the impact of approximation. Error
distance is the difference between the exact value and the approximate value, whereas
relative error is the value of error distance divided by the exact value. NED is calculated
by normalizing the error distance by maximum possible exact output. MRE is calculated
from the mean of relative errors for all possible values.
Disadvantages:
 Low processing accuracy
 Poor performance
 Require High power
Proposed System:
Proposed approximate sum -of-products
In this brief, K is 3 and N is 16. For conventional implementation of SOP unit based on
the parallel distributed arithmetic [4], three two-input 16-bit adders, one three-input 16-
bit adder, 16 lookup tables with eight cases, and final accumulator with 16 elements are
required. In our approximation models, hardware requirements are considerably reduced.
Three models of ASOP: ASOP1, ASOP2, and ASOP3 are proposed.
Proposed Approximate Sum-of-Products Model ASOP1
In approximate model 1, K is 3 and N is reduced. m bits at the least significant part of a k
and b k for k = 1, 2, and 3 are truncated. m = 8, 6, and 4 bits are implemented. For this
implementation, three two-input 16 − m bit adders, one three-input 16 − m bit adder, 16 −
m lookup tables with eight cases, and final accumulator with 16−m elements are required.
This considerably reduces the hardware utilization at all the levels. The approximate
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
model with reduced elements is shown in Fig. 1. In by implementing with limits m to N
−1, the number of lookup tables reduces to 16−m and 16−m elements are sent to the final
accumulator (16 − m × 18). It should be noted that in ASOP1, the number of input bits to
the adders
Fig. 1. Approximate lookup table and corresponding ASOP (ASOP1) structure for K = 3 and N = 16.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Fig. 2. Approximate lookup table and corresponding ASOP (ASOP2) structure for K = 3 and N = 16. is
reduced, which further reduces the complexity of accumulator (16 − m × 18 − m), compared to [5].
Proposed Approximate Sum-of-Products Model ASOP2
ASOP2 is similar to ASOP1 with the addition of m-bit leading one predictor. This
increases the accuracy, and more suitable for DSP application which will be discussed
later in this section. In our method, leading one prediction of a k and b k for k = 1, 2, and 3
requires OR operation of most significant m bits of a k and b k for k = 1, 2, and 3 followed
by the priority encoder. The function of OR gates can be given as a mor = a 1m|a 2m|a 3m and
b mor = b 1m|b 2m|b 3m where km represents first m bits of k th
element, for m = 4, 6, or 8.
After the leading one prediction, ASOP1 structure is used for the computation of
elements starting from the leading one position. Fig. 2 shows Approximate lookup table
and corresponding ASOP (ASOP2) structure for K = 3 and N = 16. is reduced, which
further reduces the complexity of accumulator (16 − m × 18 − m), compared to [5].
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
For example, consider the input elements as a 1 = “00110010 00101110,” a 2 =
“0001011000101011,” a 3 = “0010011001 101000,” b1= “0001001011101001,” b2=
“0001101000101110,” and b3 = “0000101011101011.” For m = 4, amor = 0011, leading
one predictor predicts zeros in first two bits of bit positions “15” and “14” of a 1, a2, and
a3, 12-bit (16 − m) information starting from bit position “13” to “2” of a 1 , a2, and a3
(“110010001011,” “010110001010,” and “100110011010”) are taken and fed to the
inputs of the lookup tables. For m = 4, b mor = 0001, leading one predictor predicts zeros
in first three bits of bit positions “15,” “14,” and “13” of b1, b2, and b3, 12-bit (16 − m)
information starting from bit position “12” to “1” of b1, b2, and b3 (“100101110100,”
“110100010111,” and “010101110101”) are taken and fed as control signals of lookup
Fig. 3. Least significant part of the ASOP (ASOP3) structure.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
tables. The overall structure of ASOP2 is given in Fig. 3, where LZA refers to leading
zeros in a mor and LZB refers to leading zeros in b mor. ASOP2 reduces the negative
effects of truncation, especially when there is information only in least significant parts of
the inputs. In DSP applications, pixel values are highly correlated and the number of
initial zeros of a k and b k for k = 1, 2, 3 have high chances of being the same. Using OR
gate for combining the elements and using a leading one predictor afterward reduces the
hardware resources to be used.
Proposed Approximate Sum-of-Products Model ASOP3
In ASOP1, the least significant part m = 8, 6, and 4 bits are truncated. In ASOP1, m bits
are truncated from the 18-bit outputs of the lookup table contents. And also, m control
signals b 1n, b 2n, and b 3n of the lookup table for n = 0, 1, ..., m − 1 are truncated. In
ASOP3, instead of truncation, approximation is employed. Fig. 3 shows Least significant
part of the ASOP (ASOP3) structure. Lookup table output contents are divided into 18−m
bits and m bits. The inputs b are divided to 16 − m group and m group. ASOP1 is used
for the first 16 − m group. For the least m bits group of b k for k = 1, 2, 3, the control
signals are grouped in pair. m lookup tables are reduced to m/2 tables. The additional
hardware required for ASOP3 is given in Fig. 4. For example, consider the input
elements as a 1 = “00110010 00101110,” a2 = “0001011000101011,” a3 = “00100110011
01000,” b1 = “0001001011101001,” b2= “0001101000101110,” and b3=
“0000101011101011.” For m = 4, a 23, a 13, a 12, and a 123 are calculated, then except for
least m bits, other bits are given to ASOP1 structure, and 12-bit (16 − m) information
starting most significant bit of b1, b2, and b3are taken and fed as control signals of lookup
tables. For the least significant bits calculation, least significant m bits of a23, a13, a12, and
a 123 are used as inputs to the lookup table. The number of lookup tables are reduced by
half, by ORing each pair of control signals. In this scenario, for lookup table of n = 1 | 0,
the control signals would be 111.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Advantages:
 Higher processing accuracy
 High performance
 Require low power
References:
[1] J. Han and M. Orshansky, “Approximate computing: An emerging paradigm for energy-efficient
design,” in Proc. IEEE ETS, May 2013, pp. 1–6.
[2] S. A. White, “Applications of distributed arithmetic to digital signal processing: A tutorial review,”
IEEE ASSP Mag., vol. 6, no. 3, pp. 4–19, Jul. 1989.
[3] L. Yuan, S. Sana, H. J. Pottinger, and V. S. Rao, “Distributed arithmetic implementation of
multivariable controllers for smart structural systems,” Smart Mater. Struct., vol. 9, no. 4, p. 402, Jan.
2000.
[4] W. Li, J. B. Burr, and A. M. Peterson, “A fully parallel VLSI implementation of distributed
arithmetic,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 2. Jun. 1988, pp. 1511–1515.
[5] R. Amirtharajah and A. P. Chandrakasan, “A micropower programmable DSP using approximate
signal processing based on distributed arithmetic,” IEEE J. Solid-State Circuits, vol. 39, no. 2, pp. 337–
347, Feb. 2010.
[6] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate
compressors for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
[7] S. Venkatachalam and S.-B. Ko, “Design of power and area efficient approximate multipliers,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 5, pp. 1782–1786, May 2017.
[8] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, and K. Pekmestzi, “Design-efficient approximate
multiplication circuits through partial product perforation,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 24, no. 10, pp. 3105–3117, Oct. 2016.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
[9] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of approximate and probabilistic
adders,” IEEE Trans. Comput., vol. 63, no. 9, pp. 1760–1771, Sep. 2013.
[10] J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda, “Uniqueness of the Gaussian kernel for scale-
space filtering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 1, pp. 26–33, Jan. 1986.

More Related Content

Similar to Approximate sum of-products designs based on distributed arithmetic (20)

PDF
Feedback based low-power soft-error-tolerant design for dual-modular redundancy
Nxfee Innovation
 
PDF
A high accuracy programmable pulse generator with a 10-ps timing resolution
Nxfee Innovation
 
PDF
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Nxfee Innovation
 
PDF
The implementation of the improved omp for aic reconstruction based on parall...
Nxfee Innovation
 
PDF
Vector processing aware advanced clock-gating techniques for low-power fused ...
Nxfee Innovation
 
PDF
Efficient fpga mapping of pipeline sdf fft cores
Nxfee Innovation
 
PDF
IRJET- Design of 16 Bit Low Power Vedic Architecture using CSA & UTS
IRJET Journal
 
PDF
Low complexity methodology for complex square-root computation
Nxfee Innovation
 
PDF
A 12 bit 40-ms s sar adc with a fast-binary-window dac switching scheme
Nxfee Innovation
 
PDF
IRJET- Distribution Selection for Pump Manufacturing Companies
IRJET Journal
 
PDF
Al04605265270
IJERA Editor
 
PDF
IRJET- Efficient Design of Radix Booth Multiplier
IRJET Journal
 
PDF
DESIGN OF LOW POWER MULTIPLIER
IRJET Journal
 
PDF
A reconfigurable ldpc decoder optimized applications
Nxfee Innovation
 
PDF
Parallel Processing Technique for Time Efficient Matrix Multiplication
IJERA Editor
 
PDF
Optimization and implementation of parallel squarer
eSAT Publishing House
 
PDF
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...
IRJET Journal
 
PPTX
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
Hari M
 
PDF
Improvement of Process and Product Layout for Metro Coach using Craft Method...
IRJET Journal
 
PDF
Improvement of Process and Product Layout for Metro Coach using Craft Method...
IRJET Journal
 
Feedback based low-power soft-error-tolerant design for dual-modular redundancy
Nxfee Innovation
 
A high accuracy programmable pulse generator with a 10-ps timing resolution
Nxfee Innovation
 
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Nxfee Innovation
 
The implementation of the improved omp for aic reconstruction based on parall...
Nxfee Innovation
 
Vector processing aware advanced clock-gating techniques for low-power fused ...
Nxfee Innovation
 
Efficient fpga mapping of pipeline sdf fft cores
Nxfee Innovation
 
IRJET- Design of 16 Bit Low Power Vedic Architecture using CSA & UTS
IRJET Journal
 
Low complexity methodology for complex square-root computation
Nxfee Innovation
 
A 12 bit 40-ms s sar adc with a fast-binary-window dac switching scheme
Nxfee Innovation
 
IRJET- Distribution Selection for Pump Manufacturing Companies
IRJET Journal
 
Al04605265270
IJERA Editor
 
IRJET- Efficient Design of Radix Booth Multiplier
IRJET Journal
 
DESIGN OF LOW POWER MULTIPLIER
IRJET Journal
 
A reconfigurable ldpc decoder optimized applications
Nxfee Innovation
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
IJERA Editor
 
Optimization and implementation of parallel squarer
eSAT Publishing House
 
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...
IRJET Journal
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
Hari M
 
Improvement of Process and Product Layout for Metro Coach using Craft Method...
IRJET Journal
 
Improvement of Process and Product Layout for Metro Coach using Craft Method...
IRJET Journal
 

More from Nxfee Innovation (10)

PDF
VLSI IEEE Transaction 2018 - IEEE Transaction
Nxfee Innovation
 
DOCX
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Nxfee Innovation
 
PDF
Securing the present block cipher against combined side channel analysis and ...
Nxfee Innovation
 
PDF
Combating data leakage trojans in commercial and asic applications with time ...
Nxfee Innovation
 
PDF
Analysis and design of cost effective, high-throughput ldpc decoders
Nxfee Innovation
 
PDF
An energy efficient programmable many core accelerator for personalized biome...
Nxfee Innovation
 
PDF
A flexible wildcard pattern matching accelerator via simultaneous discrete fi...
Nxfee Innovation
 
PDF
A closed form expression for minimum operating voltage of cmos d flip-flop
Nxfee Innovation
 
PDF
A 128 tap highly tunable cmos if finite impulse response filter for pulsed ra...
Nxfee Innovation
 
PDF
Nxfee Innovation Brochure
Nxfee Innovation
 
VLSI IEEE Transaction 2018 - IEEE Transaction
Nxfee Innovation
 
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Nxfee Innovation
 
Securing the present block cipher against combined side channel analysis and ...
Nxfee Innovation
 
Combating data leakage trojans in commercial and asic applications with time ...
Nxfee Innovation
 
Analysis and design of cost effective, high-throughput ldpc decoders
Nxfee Innovation
 
An energy efficient programmable many core accelerator for personalized biome...
Nxfee Innovation
 
A flexible wildcard pattern matching accelerator via simultaneous discrete fi...
Nxfee Innovation
 
A closed form expression for minimum operating voltage of cmos d flip-flop
Nxfee Innovation
 
A 128 tap highly tunable cmos if finite impulse response filter for pulsed ra...
Nxfee Innovation
 
Nxfee Innovation Brochure
Nxfee Innovation
 
Ad

Recently uploaded (20)

PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PDF
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
PPTX
drones for disaster prevention response.pptx
NawrasShatnawi1
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Structural Functiona theory this important for the theorist
cagumaydanny26
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
Electron Beam Machining for Production Process
Rajshahi University of Engineering & Technology(RUET), Bangladesh
 
PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
REINFORCEMENT AS CONSTRUCTION MATERIALS.pptx
mohaiminulhaquesami
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PDF
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
PPTX
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
PDF
A presentation on the Urban Heat Island Effect
studyfor7hrs
 
PPTX
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
PPTX
Presentation on Foundation Design for Civil Engineers.pptx
KamalKhan563106
 
PPT
inherently safer design for engineering.ppt
DhavalShah616893
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
drones for disaster prevention response.pptx
NawrasShatnawi1
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Structural Functiona theory this important for the theorist
cagumaydanny26
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
Electron Beam Machining for Production Process
Rajshahi University of Engineering & Technology(RUET), Bangladesh
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
REINFORCEMENT AS CONSTRUCTION MATERIALS.pptx
mohaiminulhaquesami
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
A presentation on the Urban Heat Island Effect
studyfor7hrs
 
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
Presentation on Foundation Design for Civil Engineers.pptx
KamalKhan563106
 
inherently safer design for engineering.ppt
DhavalShah616893
 
Thermal runway and thermal stability.pptx
godow93766
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Ad

Approximate sum of-products designs based on distributed arithmetic

  • 1. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ Approximate Sum-of-Products Designs Based on Distributed Arithmetic Abstract: Approximate circuits provide high performance and require low power. Sum-of-products (SOP) units are key elements in many digital signal processing applications. In this brief, three approximate SOP (ASOP) models which are based on the distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of ASOP achieves an improvement up to 64% on area and 70% on power, when compared with conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared with the first model. Third model achieves the mean relative error and normalized error distance as low as 0.05% and 0.009%, respectively. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher peak signal to-noise ratio than the existing state-of-the-art techniques. It is shown that the proposed approximate models achieve higher processing accuracy than existing works but with significant improvements in power and performance. Software Implementation:  Modelsim  Xilinx 14.2 Existing System: Approximate computing provides an efficient solution for the design of power efficient digital systems. For applications, such as multimedia and data processing, approximate circuits play an important role as a promising alternative for reducing area and power in digital systems that can tolerate some loss of precision. As one of the key components in arithmetic circuits, sum-of products (SOP) units have received less attention in terms of
  • 2. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ approximate implementation. Distributed arithmetic is a very efficient means for calculation of the inner products between vectors. It implements multiplication by doing a series of table-lookups and shift-and-accumulate operations. Due to the flexibility of the level of parallelism in the distributed arithmetic structure, the area-speed tradeoff can be adjusted. Distributed arithmetic is a bit-serial operation that computes the inner product of two vectors in parallel. It requires no multiplication and it has an efficient mechanism to perform the SOP operation. Bit- parallel versions of distributed arithmetic are proposed. In this brief, three models of SOP units based on parallel distributed arithmetic are proposed. Their scheme simply involves truncation in the number of lookup tables, by eliminating the least significant part of the distributed arithmetic operation. Multipliers have been extensively studied for approximate implementation. Two models of approximate compressors with reduced erroneous outputs to accumulate partial products of the Dadda tree multiplier. The probability-based multiplier is based on the altering the partial products and reducing the generated partial product tree based on their probability. In partial product perforation (PPP) multiplier reduces k partial products starting from j th position, which in turn reduces the number of adders used in the accumulation of partial products. In this brief, the novel ASOP designs are proposed using the efficient distributed arithmetic structure. Approximation involves changes with respect to word length, number of lookup tables, and number of elements in the final accumulator. Three models are proposed. First model provides significant power reduction with lower mean relative error (MRE) and normalized error distance (NED). Second and third models with increased area and power compared to first model provide better accuracy. In the proposed approximate structures, reductions in the number of lookup tables, length of adders, and accumulator size are employed for approximation. Compared to the exact SOP unit, the proposed models have reduced circuit complexity.
  • 3. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ NED is an effective metric to quantify the approximation irrespective of the size of the circuit. Also, traditional MRE error metric is used to evaluate the impact of approximation. Error distance is the difference between the exact value and the approximate value, whereas relative error is the value of error distance divided by the exact value. NED is calculated by normalizing the error distance by maximum possible exact output. MRE is calculated from the mean of relative errors for all possible values. Disadvantages:  Low processing accuracy  Poor performance  Require High power Proposed System: Proposed approximate sum -of-products In this brief, K is 3 and N is 16. For conventional implementation of SOP unit based on the parallel distributed arithmetic [4], three two-input 16-bit adders, one three-input 16- bit adder, 16 lookup tables with eight cases, and final accumulator with 16 elements are required. In our approximation models, hardware requirements are considerably reduced. Three models of ASOP: ASOP1, ASOP2, and ASOP3 are proposed. Proposed Approximate Sum-of-Products Model ASOP1 In approximate model 1, K is 3 and N is reduced. m bits at the least significant part of a k and b k for k = 1, 2, and 3 are truncated. m = 8, 6, and 4 bits are implemented. For this implementation, three two-input 16 − m bit adders, one three-input 16 − m bit adder, 16 − m lookup tables with eight cases, and final accumulator with 16−m elements are required. This considerably reduces the hardware utilization at all the levels. The approximate
  • 4. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ model with reduced elements is shown in Fig. 1. In by implementing with limits m to N −1, the number of lookup tables reduces to 16−m and 16−m elements are sent to the final accumulator (16 − m × 18). It should be noted that in ASOP1, the number of input bits to the adders Fig. 1. Approximate lookup table and corresponding ASOP (ASOP1) structure for K = 3 and N = 16.
  • 5. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ Fig. 2. Approximate lookup table and corresponding ASOP (ASOP2) structure for K = 3 and N = 16. is reduced, which further reduces the complexity of accumulator (16 − m × 18 − m), compared to [5]. Proposed Approximate Sum-of-Products Model ASOP2 ASOP2 is similar to ASOP1 with the addition of m-bit leading one predictor. This increases the accuracy, and more suitable for DSP application which will be discussed later in this section. In our method, leading one prediction of a k and b k for k = 1, 2, and 3 requires OR operation of most significant m bits of a k and b k for k = 1, 2, and 3 followed by the priority encoder. The function of OR gates can be given as a mor = a 1m|a 2m|a 3m and b mor = b 1m|b 2m|b 3m where km represents first m bits of k th element, for m = 4, 6, or 8. After the leading one prediction, ASOP1 structure is used for the computation of elements starting from the leading one position. Fig. 2 shows Approximate lookup table and corresponding ASOP (ASOP2) structure for K = 3 and N = 16. is reduced, which further reduces the complexity of accumulator (16 − m × 18 − m), compared to [5].
  • 6. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ For example, consider the input elements as a 1 = “00110010 00101110,” a 2 = “0001011000101011,” a 3 = “0010011001 101000,” b1= “0001001011101001,” b2= “0001101000101110,” and b3 = “0000101011101011.” For m = 4, amor = 0011, leading one predictor predicts zeros in first two bits of bit positions “15” and “14” of a 1, a2, and a3, 12-bit (16 − m) information starting from bit position “13” to “2” of a 1 , a2, and a3 (“110010001011,” “010110001010,” and “100110011010”) are taken and fed to the inputs of the lookup tables. For m = 4, b mor = 0001, leading one predictor predicts zeros in first three bits of bit positions “15,” “14,” and “13” of b1, b2, and b3, 12-bit (16 − m) information starting from bit position “12” to “1” of b1, b2, and b3 (“100101110100,” “110100010111,” and “010101110101”) are taken and fed as control signals of lookup Fig. 3. Least significant part of the ASOP (ASOP3) structure.
  • 7. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ tables. The overall structure of ASOP2 is given in Fig. 3, where LZA refers to leading zeros in a mor and LZB refers to leading zeros in b mor. ASOP2 reduces the negative effects of truncation, especially when there is information only in least significant parts of the inputs. In DSP applications, pixel values are highly correlated and the number of initial zeros of a k and b k for k = 1, 2, 3 have high chances of being the same. Using OR gate for combining the elements and using a leading one predictor afterward reduces the hardware resources to be used. Proposed Approximate Sum-of-Products Model ASOP3 In ASOP1, the least significant part m = 8, 6, and 4 bits are truncated. In ASOP1, m bits are truncated from the 18-bit outputs of the lookup table contents. And also, m control signals b 1n, b 2n, and b 3n of the lookup table for n = 0, 1, ..., m − 1 are truncated. In ASOP3, instead of truncation, approximation is employed. Fig. 3 shows Least significant part of the ASOP (ASOP3) structure. Lookup table output contents are divided into 18−m bits and m bits. The inputs b are divided to 16 − m group and m group. ASOP1 is used for the first 16 − m group. For the least m bits group of b k for k = 1, 2, 3, the control signals are grouped in pair. m lookup tables are reduced to m/2 tables. The additional hardware required for ASOP3 is given in Fig. 4. For example, consider the input elements as a 1 = “00110010 00101110,” a2 = “0001011000101011,” a3 = “00100110011 01000,” b1 = “0001001011101001,” b2= “0001101000101110,” and b3= “0000101011101011.” For m = 4, a 23, a 13, a 12, and a 123 are calculated, then except for least m bits, other bits are given to ASOP1 structure, and 12-bit (16 − m) information starting most significant bit of b1, b2, and b3are taken and fed as control signals of lookup tables. For the least significant bits calculation, least significant m bits of a23, a13, a12, and a 123 are used as inputs to the lookup table. The number of lookup tables are reduced by half, by ORing each pair of control signals. In this scenario, for lookup table of n = 1 | 0, the control signals would be 111.
  • 8. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ Advantages:  Higher processing accuracy  High performance  Require low power References: [1] J. Han and M. Orshansky, “Approximate computing: An emerging paradigm for energy-efficient design,” in Proc. IEEE ETS, May 2013, pp. 1–6. [2] S. A. White, “Applications of distributed arithmetic to digital signal processing: A tutorial review,” IEEE ASSP Mag., vol. 6, no. 3, pp. 4–19, Jul. 1989. [3] L. Yuan, S. Sana, H. J. Pottinger, and V. S. Rao, “Distributed arithmetic implementation of multivariable controllers for smart structural systems,” Smart Mater. Struct., vol. 9, no. 4, p. 402, Jan. 2000. [4] W. Li, J. B. Burr, and A. M. Peterson, “A fully parallel VLSI implementation of distributed arithmetic,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 2. Jun. 1988, pp. 1511–1515. [5] R. Amirtharajah and A. P. Chandrakasan, “A micropower programmable DSP using approximate signal processing based on distributed arithmetic,” IEEE J. Solid-State Circuits, vol. 39, no. 2, pp. 337– 347, Feb. 2010. [6] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate compressors for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015. [7] S. Venkatachalam and S.-B. Ko, “Design of power and area efficient approximate multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 5, pp. 1782–1786, May 2017. [8] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, and K. Pekmestzi, “Design-efficient approximate multiplication circuits through partial product perforation,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 10, pp. 3105–3117, Oct. 2016.
  • 9. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : [email protected] _________________________________________________________________ [9] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of approximate and probabilistic adders,” IEEE Trans. Comput., vol. 63, no. 9, pp. 1760–1771, Sep. 2013. [10] J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda, “Uniqueness of the Gaussian kernel for scale- space filtering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 1, pp. 26–33, Jan. 1986.