The implementation of the improved omp for aic reconstruction based on parallel index selection

NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
The Implementation of the Improved OMP for AIC Reconstruction
Based on Parallel Index Selection
Abstract:
Sparse signal recovery becomes extremely challenging for a variety of real-time
applications. In this paper, we improve the orthogonal matching pursuit (OMP) algorithm
based on parallel correlation indices selection mechanism in each iteration and
Goldschmidt algorithm. Simulation results show that the improved OMP algorithm with a
reduced number of iterations and low hardware complexity of matrix operations has
higher success rate and recovery signal-to-noise-ratio (RSNR) for sparse signal recovery.
This paper presents an efficient complex valued system hardware architecture of the
recovery algorithm for analog-to-information structure based on compressive sensing.
The proposed architecture is implemented and validated on the Xilinx Virtex6 field-
programmable gate array (FPGA) for signal reconstruction with N = 1024, K = 36, and M
= 256. The implementation results showed that the improved OMP algorithm achieved a
higher RSNR of 31.04 dB compared with the original OMP algorithm. This synthesized
design consumes a few percentages of the hardware resources of the FPGA chip with the
clock frequency of 135.4 MHZ and reconstruction time of 170 µs, which is faster than the
existing design.
Software Implementation:
 Modelsim
 Xilinx 14.2
Existing System:
Sensing and processing information have traditionally relied on the Shannon sampling
theorem, one of the central tenets of digital signal processing. However, in many

NXFEE INNOVATION
_________________________________________________________________
applications, including digital image, video cameras, and high-speed analog-to-digital
converters (ADCs), increasing the sampling rate is very expensive. The recent theory of
compressive sensing (CS), is presented to capture and represent compressible signals at a
rate significantly below the Nyquist rate. This is the new sampling architecture is
proposed aimed at implementing sub-Nyquist signal acquisition system such as analog-
to-information converters (AICs). In this structure, the high-dimensional signal is
projected into the low-dimensional space with an incoherent measurement matrix by
exploiting the sparsity of the signal. The signal is then reconstructed from these
projections using an optimization process.

NXFEE INNOVATION
_________________________________________________________________
At present, there are different structures of the AIC model including the random
demodulator structure proposed, related to hardware implementation. However, these
applications do not have a complete realization of the system and mainly focus on the
front-end analogy information processing hardware implementation. The back-end
digitization and reconstruction are carried out using external units such as oscilloscopes,
Agilent Infiniium 54855A, Tektronix oscilloscope, and 8-bit oscilloscopes. The collected
information is sent to the PC for reconstruction. Therefore, the current research to
achieve reconstruction of the signal is mostly in the CPU, GPU, or DSP platform.
However, offline processing results in excessive I/O data-rates and storage requirements.
More importantly, it prohibits timely (or real-time) decisions based on the recovered
information and prevents the use of adaptive sensing strategies. Hence, it is necessary to
design dedicated hardware architectures for the very large-scale integration (VLSI) or
field-programmable gate arrays (FPGAs) connected with the analog front-end hardware
for real-time processing. Recently, various algorithms have been proposed for
reconstructing signals from the compressively sensed samples. As efficient methods to
calculate a matrix decomposition exists, orthogonal matching pursuit (OMP) algorithm is
suitable for VLSI implementation, mostly due to the regular structure of the least-
squares(LSs) optimization.
In recent years, many structures of OMP algorithm hardware implementations have been
proposed. However, few designs can present a low-complexity and high-speed hardware
implementation. An implementation of OMP algorithm with an FPGA has been
proposed. However, the design uses a 128-length vector for sparsity of 5. The
architecture contains higher cycle period due to the path delay in the dot product of
matrix. Stanislaus and Mohsenin, have proposed and improved the hardware architecture
of OMP algorithm based on QR decomposition. This design finds the inverse of a matrix
involving square root units. A more efficient approach is to use the modified Cholesky
decomposition which avoids the use of square roots. Numerous works related to the

NXFEE INNOVATION
_________________________________________________________________
acceleration of these operations individually have been proposed. In the design presented,
the matrix inversion is based on CORDIC divider with latency and a sequential execution
of several parts of matrix multiplication. Inversion based on Newton–Raphson iteration,
but the two multiplication modules in this algorithm are also
Fig. 1. Complete structure of signal recovery Chky-mat, which is Cholesky decomposition and is used to
get the inversion matrix
executed sequentially. Besides, existing designs are mainly implemented with the real-
valued system. They are mostly aimed at the hardware implementation of the OMP
algorithm, but not for the actual applications. The processed signals are usually sparse in
time domain and do not need to be converted to frequency domain processing. However,
a complex-valued system of reconstruction algorithm is required for practical
applications because many man-made or natural signals are sparse in an appropriate
orthonormal basis (such as Fourier basis). The sensing matrix of the OMP algorithm is
usually complex-valued matrix, so the proposed design in this paper is a complex-valued
system for practical applications. Furthermore, we find the high computational
complexity of matrix operations based on OMP is a major concern on achieving real-time
reconstruction of compressively sensed signal.
In this paper, we mainly proposed the complex-valued system of the signal reconstruction
for the CS-based AIC structure. Additionally, we reduce the computational complexity of
matrix operations and number of iterations to improve the OMP algorithm. We have
developed a MATLAB code of the algorithm and compared the success rate and recovery

NXFEE INNOVATION
_________________________________________________________________
signal to-noise-ratio (RSNR) of input signals recovered correctly base on different
sparsities by verification. The improved OMP algorithm has higher success rate and
RSNR than the OMP algorithm. In order to validate the improved OMP algorithm
efficiently, a complete signal recovery structure using SIMULINK for AIC model based
on CS is built in this paper. Besides, we design VLSI architecture of the improved OMP
algorithm with real time, low power, and low complexity requirements. Additionally, we
propose an efficient matrix inversion design based on the Goldschmidt algorithm.
Compared with the inversion method of the Newton–Raphson, the design of the inversion
circuit based on the Goldschmidt algorithm can improve the parallelism of the operation
and reduce the quantization error of each iteration.
Disadvantages:
 Number of iterations is higher
 Hardware complexity is higher
Proposed System:
VLSI implementation of the improved OMP algorithm
In this paper, the proposed structure for the implementation of the improved algorithm
with the degree of sparsity K = 36, the measurements M = 256, and the length of the
original signal N = 1024. In hardware implementation, we completed the complex-valued
system of the improved algorithm. The whole can be divided into real part and imaginary
part with data precision of 24 bits.
The hardware implementation of OMP algorithm is mainly divided into three units as
shown in Fig. 2: 1) parallel complex multiplication; 2) matrix inversion; and 3) signal
estimation and residual calculation. In addition, it has dual-port RAMs and control
circuit. The measurement matrix and the measurements vector y are stored in the RAMs,
respectively. In order to increase the throughput, the correlation is computed on the

NXFEE INNOVATION
_________________________________________________________________
parallel complex multiplication. The most relevant two columns are found by comparison
circuit (sorting algorithm unit) and extract them according to the associated indices
Fig. 2. Architecture of the improved OMP algorithm

NXFEE INNOVATION
_________________________________________________________________
λmax = {λ(i1,i2)} from RAM of the matrix . We just save the addresses of two columns
to the new RAM (index unit) and update it after each iteration. The control unit in this
design operations on the RAM units by generating a read or write address, and generates
enable signals (en) for some computing units to initiate their operations.
Parallel Complex-Valued Multiplication
In this design, the computationally most expensive operation is the matrix-vector product
corresponding to the real and imaginary parts, which correlates the measurements
with the dictionary elements. In order to increase the throughput, the correlation is
computed on parallel complex-valued multiply accumulators. In this design, the real part
and the imaginary part have the same multiplication operation, we need to get the real
part and imaginary part of the inner product. One complex-valued multiply add unit can
be divided into three real-valued multipliers and three adders. We can compare the square
sum of the real part and the imaginary part to find the index of the maximum value of the
inner product. Note that there is no need the extra square root operation in this unit to get
the mode value of inner product, because the maximum value of the square sum will have
the maximum square root value.

NXFEE INNOVATION
_________________________________________________________________
Fig. 2. Structure of the sorting algorithm
Compared to real-valued system, the extra operations are multiplication of imaginary part
and square sum in the complex-valued system. This requires the coefficients of the
measurements matrix as the parallel inputs. Therefore, we need to store the matrix into a
RAM where we put a column of the matrix into the same address, and ensure that all the
inputs required by the multiplication circuit are given in one clock cycle.
During the first N cycles of each iteration for the computation of this unit, N columns of
are retrieved from the memory unit in serial order and fed to this unit. Finally, the result
is sent to the comparison circuit to obtain the indexes of the relevant columns. In order to
improve the operating speed of the circuit, the pipeline circuit model is used. The input
data of the sorting unit is positive and serial and we can scan sequentially the input data.
In this design, we only select the two maximum values to find two most correlated

NXFEE INNOVATION
_________________________________________________________________
columns. Two registers (reg1 and reg2) are required to store the two most relevant values,
the maximum and the second largest value.
First, we compare the input data to the maximum value in numerical comparison unit and
then control the data distributor. If the input data is larger than the maximum value, we
can replace the secondary register (reg2) with the maximum value and store input data
into the first register (reg1). If the input data is smaller than the maximum value and
larger than the second largest value, we can replace secondary register (reg2) with the
input data. The rest of the situation remains unchanged. The internal structure of the
sorting system has been shown in Fig. 3.
Advantages
 Number of iterations is reduced
 Hardware complexity is higher
References
[1] H. Mamaghanian, N. Khaled, D. Atienza, and P. Vandergheynst, “Design and exploration of low-
power analog to information conversion based on compressed sensing,” IEEE J. Emerg. Sel. Topics
Circuits Syst., vol. 2, no. 3, pp. 493–501, Sep. 2012.
[2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction
from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509,
Feb. 2006.
[3] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr.
2006.
[4] P. Daponte, L. De Vito, S. Rapuano, and I. Tudosa, “Analog-toinformation converters in the
wideband RF measurement for aerospace applications: Current situation and perspectives,” IEEE
Instrum. Meas. Mag., vol. 20, no. 1, pp. 20–28, Feb. 2017.

NXFEE INNOVATION
_________________________________________________________________
[5] J. Laska, S. Kirolos, M. Duarte, T. Ragheb, R. Baraniuk, and Y. Massoud, “Theory and
implementation of an analog-to-information converter using random demodulation,” in Proc. IEEE Int.
Symp. Circuits Syst., May 2007, pp. 1959–1962.
[6] A. Dutta and R. K. Mangang, “Analog to information converter based on random demodulation,” in
Proc. Int. Conf. Electron. Design, 2015, pp. 105–109.
[7] S. Kirolos et al., “Analog-to-information conversion via random demodulation,” in Proc. IEEE
Dallas Circuits Syst. Workshop (DCAS), Oct. 2006, pp. 71–74.
[8] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk, “Beyond Nyquist:
Efficient sampling of sparse bandlimited signals,” IEEE Trans. Inf. Theory, vol. 56, no. 1, pp. 520–544,
Jan. 2010.
[9] D. Bao, P. Daponte, L. De Vito, and S. Rapuano, “Frequency-domain characterization of random
demodulation analog-to-information converters,” Acta IMEKO, vol. 4, no. 1, pp. 111–120, Feb. 2015.
[10] D. Gangopadhyay, E. G. Allstot, A. M. R. Dixon, K. Natarajan, S. Gupta, and D. J. Allstot,
“Compressed sensing analog front-end for bio-sensor applications,” IEEE J. Solid-State Circuits, vol.
49, no. 2, pp. 426–438, Feb. 2014.
[11] T. R. Braun, “An evaluation of GPU acceleration for sparse reconstruction,” Proc. SPIE, vol. 7697,
no. 15, pp. 769715-1–769715-10, Apr. 2010.
[12] J.-W. Jhang and Y.-H. Huang, “A high-SNR projection-based atom selection OMP processor for
compressive sensing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 12, pp. 3477–
3488, Dec. 2016.
[13] J. D. Blanchard, M. Cermak, D. Hanle, Y. Jing, “Greedy algorithms for joint sparse recovery,”
IEEE Trans. Signal Process., vol. 62, no. 7, pp. 1694–1704, Apr. 2014.
[14] Y.-M. Lin, Y. Chen, N.-S. Huang, and A.-Y. Wu, “Low-complexity stochastic gradient pursuit
algorithm and architecture for robust compressive sensing reconstruction,” IEEE Trans. Signal Process.,
vol. 65, no. 3, pp. 638–650, Feb. 2017.

NXFEE INNOVATION
_________________________________________________________________
[15] A. Septimus and R. Steinberg, “Compressive sampling hardware reconstruction,” in Proc. IEEE Int.
Symp. Circuits Syst., May/Jun. 2010, pp. 3316–3319

The implementation of the improved omp for aic reconstruction based on parallel index selection

More Related Content

What's hot (17)

Similar to The implementation of the improved omp for aic reconstruction based on parallel index selection (20)

More from Nxfee Innovation (20)

Recently uploaded (20)

The implementation of the improved omp for aic reconstruction based on parallel index selection