A fast and low complexity operator for the computation of the arctangent of a complex number

NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
A Fast and Low-Complexity Operator for the Computation of the
Arctangent of a Complex Number
Abstract:
The computation of the arctangent of a complex number, i.e., the atan2 function, is
frequently needed in hardware systems that could profit from an optimized operator. In
this brief, we present a novel method to compute the atan2 function and a hardware
architecture for its implementation. The method is based on a first stage that performs a
coarse approximation of the atan2 function and a second stage that improves the output
accuracy by means of a lookup table. We present results for fixed-point implementations
in a field-programmable gate array device, all of them guaranteeing last-bit accuracy,
which provide an advantage in latency, speed, and use of resources, when compared with
well-established fixed-point options.
Software Implementation:
 Modelsim
 Xilinx 14.2
Existing System:
The computation of the arctangent function atan2(a, b) (see Fig. 1), i.e., obtaining the
angle of a complex number c = b + ja, has been the subject of extensive study, because
this computation is required in many applications. In hardware approximations for
atan2(a, b), there is often a tradeoff between the use of resources and the computation
speed and/or latency. For example, the fastest option for the computation of any function
may always be the direct implementation with a lookup table (LUT), but, since the
atan2(a, b) is a function of two input variables, in such a case, if the precision of the input
data increases by one bit, the amount of memory needed increases by a factor of four. On

NXFEE INNOVATION
_________________________________________________________________
the other hand, iterative algorithms, such as the Coordinated Rotation Digital Computer
(CORDIC), can be implemented with a minimal use of resources, but at the cost of a low
processing speed. If more parallelism is introduced in the implementation, much higher
throughputs can be achieved, but the use of resources and the computational delay are
increased.
Fig. 1. 3-D plot of atan2(a, b)/2π for the first octant
Fig. 1(b). Simplified scheme of the proposed approximation for atan2(a, b)/2π
Smaller LUTs can be achieved by using the recip-mult-atan method (RMAM), first, z =
a/b is calculated by computing 1/b with an LUT (avoiding thus the large LUT needed for
a two-variable function) and multiplying the result by a, and finally, the one-variable
function atan(z) is computed using another LUT. Another option is to use high-order
algebraic polynomials, like Chebyshev polynomials or Taylor series. These methods offer

NXFEE INNOVATION
_________________________________________________________________
good precision, but, since the arctangent is highly nonlinear, they lead to long
polynomials and intensive computations. In other cases, approximations based on rational
functions are used, as they may provide good enough results with a few elementary
operations. As a general rule, in this kind of approximations, the division operation is the
main contributor to their computational cost, but in addition to that division, they usually
require one or more multiplication operations
The architecture we propose is essentially a two-stage method, as shown in Fig. 1
(b). The First Stage uses a low-complexity coarse approximation for the two-input
atan2(a, b)/2π. The Second Stage improves the accuracy by means of a small LUT that
stores precomputed error values, as a function of the output of the First Stage. This table
does not depend on the two inputs a and b of the atan2 operator and is, therefore,
comparatively much smaller. As will be shown, the resulting operator is small and can
compute the arctangent faster than other popular options, for the same output accuracy.
Disadvantages:
 Less Latency
 Less speed and efficiency
Proposed System:
Hardware Architecture
Fig. 2(a) shows the scheme of the proposed hardware architecture. It works with the
absolute values of the inputs a and b, and their signs are used later in the scheme. The
“sel” signal selects the operands for the division operation according to the signs of a + b
and a − b. The division required for the computation of | fr| is implemented with a table
that stores 1/x and a multiplier that completes the division. A scaling stage is added in
order to reduce the size of the reciprocal table. The most relevant implementation details

NXFEE INNOVATION
_________________________________________________________________
are commented upon in Section IV-A–E. Specifically, we give details for three
accuracies: w = {8, 12, 16} bit.
Fig. 2 (a) Implementation scheme for the proposed approximation for the atan2(a, b)/2π function. (b)
Simplified model for error analysis.
Datapath Dimensioning
Fig. 2(a) shows the binary formats used in different buses of the system, where s(q, t) and
u(q, t) denote signed and unsigned fixed point formats, respectively. 2q is the weight of
the most significant bit (MSB) and 2t is the weight of the LSB
Note that truncating the output of the multiplier does not introduce an additional error
unless the truncated bits are used in the following processing steps. In a more realistic
scenario, LBA can still be achieved with smaller LUTs that either do not use all the
available bits as their address word or that are implemented as bipartite tables. Although
in these cases the tables introduce bigger errors (as explained in the following), LBA
could still be achieved, since and represent an upper bound that could not be reached. For
this reason, we performed exhaustive tests for different LUT sizes looking for optimized
implementations. Fig. 3 is an example of the error pattern obtained in one of the w = 12
operators.

NXFEE INNOVATION
_________________________________________________________________
Range Reduction Obtaining
| fr| requires the computation of y/x [see Fig. 2(a)], which involves the computation using
an LUT of 1/x. Since 1/x can
Fig. 3. Absolute error for one of our w = 12 atan2(a, b)/2π operators.
be extremely large for small values of x, a scaling operation is performed on |a| and |b|.
This block detects the leading-zeros in both |a| and |b|, scaling both by the same factor
(2s) so the MSB of x, the biggest one of both outputs, is always 1.Thanks to this block, x
is always in the [0.5, 1) range and the biggest possible value of 1/x is 2. C. Computation
of the Reciprocal For the computation of 1/x, two different strategies, both table based,
are used: direct tabulation for the w = 8 bit operator and bipartite tables for w = {12, 16}
bits. In both cases, all the bits from the input word are used. Therefore, for the direct
tabulation the only errors are those created by rounding the words stored in the table, and
for the bipartite tables, the maximum absolute value of the error can be estimated from
the second derivative of the stored function and also from the rounding errors. The value
stored in the first address of this table should be 2, but 2 − 2−n+1 is stored for a table
with n-bit words, so the MSB of all the stored words is the same, and therefore, it does
not need to be stored. The size of this table is 64 × 6 for w = 8, 128 × 14 + 128 × 7 for w
= 12, and 1k × 18 + 512 × 8 for w = 16 bit.

NXFEE INNOVATION
_________________________________________________________________
Advantages:
 Latency is more
 Speed and efficiency is more
References:
[1] J.-M. Muller, Elementary Functions: Algorithms and Implementation. Cambridge, MA, USA:
Birkhäuser, 1997.
[2] R. Gutierrez and J. Valls, “Low-power FPGA-implementation of atan(Y/X) using look-up table
methods for communication applications,” J. Signal Process. Syst., vol. 56, no. 1, pp. 25–33, 2009.
[3] F. de Dinechin and M. Istoan, “Hardware implementations of fixed-point atan2,” in Proc. 22nd
Symp. Comput. Arithmetic, Jun. 2015, pp. 34–40.
[4] R. Lyons, “Another contender in the arctangent race,” IEEE Signal Process. Mag., vol. 21, no. 1, pp.
109–110, Jan. 2004.
[5] S. Rajan, S. Wang, R. Inkol, and A. Joyal, “Efficient approximations for the arctangent function,”
IEEE Signal Process. Mag., vol. 23, no. 3, pp. 108–111, May 2006.
[6] X. Girones, C. Julia, and D. Puig, “Full quadrant approximations for the arctangent function,” IEEE
Signal Process. Mag., vol. 30, no. 1, pp. 130–135, Jan. 2013.
[7] J. M. Shima, “FM demodulation using a digital radio and digital signal processing,” M.S. thesis,
Univ. Florida, Gainesville, FL, USA, 1995.
[8] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions With Formulas, Graphs, and
Mathematical Tables, 10th ed. New York, NY, USA: Dover, 1964.
[9] M. Arnold, T. Bailey, and J. Cowles, “Error analysis of the kmetz/maenner algorithm,” J. VLSI
Signal Process. Syst. Signal, Image Video Technol., vol. 33, no. 1, pp. 37–53, 2003. [Online]. Available:
http://dx.doi.org/10.1023/A:1021189701352

NXFEE INNOVATION
_________________________________________________________________
[10] D. D. Sarma and D. W. Matula, “Faithful bipartite ROM reciprocal tables,” in Proc. 12th Symp.
Comput. Arithmetic, Jul. 1995, pp. 17–28.
[11] M. J. Schulte and J. E. Stine, “Symmetric bipartite tables for accurate function approximation,” in
Proc. 13th IEEE Symp. Comput. Arithmetic, Jul. 1997, pp. 175–183.
[12] M. J. Schulte and J. E. Stine, “Approximating elementary functions with symmetric bipartite
tables,” IEEE Trans. Comput., vol. 48, no. 8, pp. 842–847, Aug. 1999.

A fast and low complexity operator for the computation of the arctangent of a complex number

More Related Content

Similar to A fast and low complexity operator for the computation of the arctangent of a complex number (20)

More from Nxfee Innovation (20)

Recently uploaded (20)

A fast and low complexity operator for the computation of the arctangent of a complex number