Approximate sum of-products designs based on distributed arithmetic

NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Approximate Sum-of-Products Designs Based on Distributed Arithmetic
Abstract:
Approximate circuits provide high performance and require low power. Sum-of-products
(SOP) units are key elements in many digital signal processing applications. In this brief,
three approximate SOP (ASOP) models which are based on the distributed arithmetic are
proposed. They are designed for different levels of accuracy. First model of ASOP
achieves an improvement up to 64% on area and 70% on power, when compared with
conventional unit. Other two models provide an improvement of 32% and 48% on area
and 54% and 58% on power, respectively, with a reduced error rate compared with the
first model. Third model achieves the mean relative error and normalized error distance
as low as 0.05% and 0.009%, respectively. Performance of approximate units is evaluated
with a noisy image smoothing application, where the proposed models are capable of
achieving higher peak signal to-noise ratio than the existing state-of-the-art techniques. It
is shown that the proposed approximate models achieve higher processing accuracy than
existing works but with significant improvements in power and performance.
Software Implementation:
 Modelsim
 Xilinx 14.2
Existing System:
Approximate computing provides an efficient solution for the design of power efficient
digital systems. For applications, such as multimedia and data processing, approximate
circuits play an important role as a promising alternative for reducing area and power in
digital systems that can tolerate some loss of precision. As one of the key components in
arithmetic circuits, sum-of products (SOP) units have received less attention in terms of

NXFEE INNOVATION
_________________________________________________________________
approximate implementation. Distributed arithmetic is a very efficient means for
calculation of the inner products between vectors.
It implements multiplication by doing a series of table-lookups and shift-and-accumulate
operations. Due to the flexibility of the level of parallelism in the distributed arithmetic
structure, the area-speed tradeoff can be adjusted. Distributed arithmetic is a bit-serial
operation that computes the inner product of two vectors in parallel. It requires no
multiplication and it has an efficient mechanism to perform the SOP operation. Bit-
parallel versions of distributed arithmetic are proposed. In this brief, three models of SOP
units based on parallel distributed arithmetic are proposed. Their scheme simply involves
truncation in the number of lookup tables, by eliminating the least significant part of the
distributed arithmetic operation. Multipliers have been extensively studied for
approximate implementation. Two models of approximate compressors with reduced
erroneous outputs to accumulate partial products of the Dadda tree multiplier.
The probability-based multiplier is based on the altering the partial products and reducing
the generated partial product tree based on their probability. In partial product perforation
(PPP) multiplier reduces k partial products starting from j th
position, which in turn
reduces the number of adders used in the accumulation of partial products. In this brief,
the novel ASOP designs are proposed using the efficient distributed arithmetic structure.
Approximation involves changes with respect to word length, number of lookup tables,
and number of elements in the final accumulator. Three models are proposed. First model
provides significant power reduction with lower mean relative error (MRE) and
normalized error distance (NED).
Second and third models with increased area and power compared to first model provide
better accuracy. In the proposed approximate structures, reductions in the number of
lookup tables, length of adders, and accumulator size are employed for approximation.
Compared to the exact SOP unit, the proposed models have reduced circuit complexity.

NXFEE INNOVATION
_________________________________________________________________
NED is an effective metric to quantify the approximation irrespective of the size of the
circuit.
Also, traditional MRE error metric is used to evaluate the impact of approximation. Error
distance is the difference between the exact value and the approximate value, whereas
relative error is the value of error distance divided by the exact value. NED is calculated
by normalizing the error distance by maximum possible exact output. MRE is calculated
from the mean of relative errors for all possible values.
Disadvantages:
 Low processing accuracy
 Poor performance
 Require High power
Proposed System:
Proposed approximate sum -of-products
In this brief, K is 3 and N is 16. For conventional implementation of SOP unit based on
the parallel distributed arithmetic [4], three two-input 16-bit adders, one three-input 16-
bit adder, 16 lookup tables with eight cases, and final accumulator with 16 elements are
required. In our approximation models, hardware requirements are considerably reduced.
Three models of ASOP: ASOP1, ASOP2, and ASOP3 are proposed.
Proposed Approximate Sum-of-Products Model ASOP1
In approximate model 1, K is 3 and N is reduced. m bits at the least significant part of a k
and b k for k = 1, 2, and 3 are truncated. m = 8, 6, and 4 bits are implemented. For this
implementation, three two-input 16 − m bit adders, one three-input 16 − m bit adder, 16 −
m lookup tables with eight cases, and final accumulator with 16−m elements are required.
This considerably reduces the hardware utilization at all the levels. The approximate

NXFEE INNOVATION
_________________________________________________________________
model with reduced elements is shown in Fig. 1. In by implementing with limits m to N
−1, the number of lookup tables reduces to 16−m and 16−m elements are sent to the final
accumulator (16 − m × 18). It should be noted that in ASOP1, the number of input bits to
the adders
Fig. 1. Approximate lookup table and corresponding ASOP (ASOP1) structure for K = 3 and N = 16.

NXFEE INNOVATION
_________________________________________________________________
Fig. 2. Approximate lookup table and corresponding ASOP (ASOP2) structure for K = 3 and N = 16. is
reduced, which further reduces the complexity of accumulator (16 − m × 18 − m), compared to [5].
ASOP2 is similar to ASOP1 with the addition of m-bit leading one predictor. This
increases the accuracy, and more suitable for DSP application which will be discussed
later in this section. In our method, leading one prediction of a k and b k for k = 1, 2, and 3
requires OR operation of most significant m bits of a k and b k for k = 1, 2, and 3 followed
by the priority encoder. The function of OR gates can be given as a mor = a 1m|a 2m|a 3m and
b mor = b 1m|b 2m|b 3m where km represents first m bits of k th
element, for m = 4, 6, or 8.
After the leading one prediction, ASOP1 structure is used for the computation of
elements starting from the leading one position. Fig. 2 shows Approximate lookup table
and corresponding ASOP (ASOP2) structure for K = 3 and N = 16. is reduced, which
further reduces the complexity of accumulator (16 − m × 18 − m), compared to [5].

NXFEE INNOVATION
_________________________________________________________________
For example, consider the input elements as a 1 = “00110010 00101110,” a 2 =
“0001011000101011,” a 3 = “0010011001 101000,” b1= “0001001011101001,” b2=
“0001101000101110,” and b3 = “0000101011101011.” For m = 4, amor = 0011, leading
one predictor predicts zeros in first two bits of bit positions “15” and “14” of a 1, a2, and
a3, 12-bit (16 − m) information starting from bit position “13” to “2” of a 1 , a2, and a3
(“110010001011,” “010110001010,” and “100110011010”) are taken and fed to the
inputs of the lookup tables. For m = 4, b mor = 0001, leading one predictor predicts zeros
in first three bits of bit positions “15,” “14,” and “13” of b1, b2, and b3, 12-bit (16 − m)
information starting from bit position “12” to “1” of b1, b2, and b3 (“100101110100,”
“110100010111,” and “010101110101”) are taken and fed as control signals of lookup
Fig. 3. Least significant part of the ASOP (ASOP3) structure.

NXFEE INNOVATION
_________________________________________________________________
tables. The overall structure of ASOP2 is given in Fig. 3, where LZA refers to leading
zeros in a mor and LZB refers to leading zeros in b mor. ASOP2 reduces the negative
effects of truncation, especially when there is information only in least significant parts of
the inputs. In DSP applications, pixel values are highly correlated and the number of
initial zeros of a k and b k for k = 1, 2, 3 have high chances of being the same. Using OR
gate for combining the elements and using a leading one predictor afterward reduces the
hardware resources to be used.
In ASOP1, the least significant part m = 8, 6, and 4 bits are truncated. In ASOP1, m bits
are truncated from the 18-bit outputs of the lookup table contents. And also, m control
signals b 1n, b 2n, and b 3n of the lookup table for n = 0, 1, ..., m − 1 are truncated. In
ASOP3, instead of truncation, approximation is employed. Fig. 3 shows Least significant
part of the ASOP (ASOP3) structure. Lookup table output contents are divided into 18−m
bits and m bits. The inputs b are divided to 16 − m group and m group. ASOP1 is used
for the first 16 − m group. For the least m bits group of b k for k = 1, 2, 3, the control
signals are grouped in pair. m lookup tables are reduced to m/2 tables. The additional
hardware required for ASOP3 is given in Fig. 4. For example, consider the input
elements as a 1 = “00110010 00101110,” a2 = “0001011000101011,” a3 = “00100110011
01000,” b1 = “0001001011101001,” b2= “0001101000101110,” and b3=
“0000101011101011.” For m = 4, a 23, a 13, a 12, and a 123 are calculated, then except for
least m bits, other bits are given to ASOP1 structure, and 12-bit (16 − m) information
starting most significant bit of b1, b2, and b3are taken and fed as control signals of lookup
tables. For the least significant bits calculation, least significant m bits of a23, a13, a12, and
a 123 are used as inputs to the lookup table. The number of lookup tables are reduced by
half, by ORing each pair of control signals. In this scenario, for lookup table of n = 1 | 0,
the control signals would be 111.

NXFEE INNOVATION
_________________________________________________________________
Advantages:
 Higher processing accuracy
 High performance
 Require low power
References:
[1] J. Han and M. Orshansky, “Approximate computing: An emerging paradigm for energy-efficient
design,” in Proc. IEEE ETS, May 2013, pp. 1–6.
[2] S. A. White, “Applications of distributed arithmetic to digital signal processing: A tutorial review,”
IEEE ASSP Mag., vol. 6, no. 3, pp. 4–19, Jul. 1989.
[3] L. Yuan, S. Sana, H. J. Pottinger, and V. S. Rao, “Distributed arithmetic implementation of
multivariable controllers for smart structural systems,” Smart Mater. Struct., vol. 9, no. 4, p. 402, Jan.
2000.
[4] W. Li, J. B. Burr, and A. M. Peterson, “A fully parallel VLSI implementation of distributed
arithmetic,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 2. Jun. 1988, pp. 1511–1515.
[5] R. Amirtharajah and A. P. Chandrakasan, “A micropower programmable DSP using approximate
signal processing based on distributed arithmetic,” IEEE J. Solid-State Circuits, vol. 39, no. 2, pp. 337–
347, Feb. 2010.
[6] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate
compressors for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
[7] S. Venkatachalam and S.-B. Ko, “Design of power and area efficient approximate multipliers,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 5, pp. 1782–1786, May 2017.
[8] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, and K. Pekmestzi, “Design-efficient approximate
multiplication circuits through partial product perforation,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 24, no. 10, pp. 3105–3117, Oct. 2016.

NXFEE INNOVATION
_________________________________________________________________
[9] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of approximate and probabilistic
adders,” IEEE Trans. Comput., vol. 63, no. 9, pp. 1760–1771, Sep. 2013.
[10] J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda, “Uniqueness of the Gaussian kernel for scale-
space filtering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 1, pp. 26–33, Jan. 1986.

Approximate sum of-products designs based on distributed arithmetic

More Related Content

Similar to Approximate sum of-products designs based on distributed arithmetic (20)

More from Nxfee Innovation (10)

Recently uploaded (20)

Approximate sum of-products designs based on distributed arithmetic