Analysis and design of cost effective, high-throughput ldpc decoders

NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Analysis and Design of Cost-Effective, High-Throughput LDPC Decoders
Abstract:
This paper introduces a new approach to cost effective, high-throughput hardware
designs for low-density parity-check (LDPC) decoders. The proposed approach, called
nonsurjective finite alphabet iterative decoders (NS-FAIDs), exploits the robustness of
message-passing LDPC decoders to inaccuracies in the calculation of exchanged
messages, and it is shown to provide a unified framework for several designs previously
proposed in the literature. NS-FAIDs are optimized by density evolution for regular and
irregular LDPC codes, and are shown to provide different tradeoffs between hardware
complexity and decoding performance. Two hardware architectures targeting high-
throughput applications are also proposed, integrating both Min-Sum (MS) and NS-FAID
decoding kernels. ASIC post synthesis implementation results on 65-nm CMOS
technology show that NS-FAIDs yield significant improvements in the throughput to area
ratio, by up to 58.75% with respect to the MS decoder, with even better or only slightly
degraded error correction performance.
Software Implementation:
 Modelsim
 Xilinx 14.2
Existing System:
The increasing demand of massive data rates in wireless communication systems will
require significantly higher processing speed of the baseband signal, as compared with
conventional solutions. This is especially challenging for forward error correction (FEC)
mechanisms, since FEC decoding is one of the most computationally intensive baseband
processing tasks, consuming a large amount of hardware resources and energy. The use

NXFEE INNOVATION
_________________________________________________________________
of very large bandwidths will also result in stringent, application-specific, requirements
in terms of both throughput and latency. The conventional approach to increase
throughput is to use massively parallel architectures. In this context, low-density parity-
check (LDPC) codes are recognized as the foremost solution, due to the intrinsic capacity
of their decoders to accommodate various degrees of parallelism. They have found
extensive applications in modern communication systems, due to their excellent decoding
performance, high-throughput capabilities, and power efficiency, and have been adopted
in several recent communication standards. This paper targets the design of cost-
effective, high throughput LDPC decoders. One important characteristic of LDPC
decoders is that the memory and interconnect blocks dominate the overall
area/delay/power performance of the hardware design. To address this issue, we build
upon the concept of finite alphabet iterative decoders (FAIDs) introduced. While FAIDs
have been previously investigated for variable-node (VN) regular LDPC codes over the
binary symmetric channel, this paper extends their use to any channel model and to both
regular and irregular LDPC codes.
The approach considered in this paper, referred to as nonsurjective finite FAIDs (NS-
FAIDs), is to allow storing the exchanged messages using a lower precision (a smaller
number of bits) than that used by the processing units. The basic idea is to reduce the size
of the exchanged messages, once they have been updated by the processing units. Hence,
to some extent, the proposed approach is akin to the use of imprecise storage, which is
seen as an enabler for cost and throughput optimizations. Moreover, NS-FAIDs are
shown to provide a unified framework for several designs previously proposed in the
literature, including the normalized and offset Min-Sum (OMS) decoders, the partially
OMS (POMS) decoder, the MS-based decoders proposed, or the recently introduced
dual-quantization domain MS decoder.
This paper refines and extends some of the concepts we previously introduced. In
particular, the definition of NS-FAIDs is extended such as to cover a larger class of

NXFEE INNOVATION
_________________________________________________________________
decoders, which is shown to significantly improve the decoding performance in case that
the exchanged messages are quantized on a small number of bits (e.g., 2 bits per
exchanged message). We show that NS-FAIDs can be optimized by using the density
evolution (DE) technique, so as to obtain
Disadvantages:
 Cost is high
 Error correction performance is poor
Proposed System:
Full-Layer Architecture
A different possibility to increase throughput is to increase the hardware parallelism, by
including several non overlapping

NXFEE INNOVATION
_________________________________________________________________
Fig. 1. Mapping between VNs and VNUs. Black: VNs of degree 2. Red: VNs of degree 3. Blue: VNs of
degree 6
rows of the base matrix in one decoding layer. For instance, for the base matrix, we may
consider RPL = 4 consecutive rows per decoding layer, and thus the number of decoding
layers is L = 3. In this case, each column of the base matrix has one (and only one)
nonzero entry in each decoding layer; such a decoding layer is referred to as being full.
Full layers correspond to the maximum hardware parallelism that can be exploited by
layered architectures, but they also prevent the pipelining of the data path. Fig. 1 shows
Mapping between VNs and VNUs. Black: VNs of degree 2. Red: VNs of degree 3. Blue:
VNs of degree 6 One possibility to implement a full-layer decoder is to use a similar
architecture to the pipelined one, by removing the registers inserted after the VNU (since
pipelining is incompatible with the use of full layers), and updating the control unit.
However, in such an architecture, read/write operations from/to the β_memory would
occur at the same memory location, corresponding to the current layer being processed .
This would require the use of asynchronous dual-port RAM to implement the β_memory,
which in general is known to be slower than synchronous dual port RAM. The
architecture proposed in this section is aimed at avoiding the use of asynchronous RAM,
while providing an effective way to benefit from the increased hardware parallelism
enabled by the use of full layers. We discuss below the main changes with respect to the
pipelined architecture, consisting of the α_memory and the barrel shifters blocks (the
other blocks are the same as for the pipelined architecture), as well as a complete
reorganization of the data path. However, it can be easily verified that both architectures
are logically equivalent, i.e., they both implement the same decoding algorithm.
1) α_Memory: This memory is used to store the VN-messages for the current decoding
layer (unlike the previous architecture, the AP-LLR values are not stored in memory).
Since only one -bit (unsaturated) VN-message is stored for each VN, this memory has

NXFEE INNOVATION
_________________________________________________________________
exactly the same size as the _memory used within the previous pipelined architecture.
VN-messages for the current layer are read from the α_memory, then saturated or framed
depending on the decoding kernel, and supplied to the corresponding CNUs. CN-
messages computed by the CNUs are stored in the β_memory (location corresponding to
layer ), and also forwarded to the AP-LLR unit, through the DCP (decompress) and DE-
FRA (deframing) blocks, according to the CNU implementation (compressed or
uncompressed) and the decoding
Fig. 2. High-level description of the proposed HW architectures, with both MS and NS-FAID kernels.
(a) Pipelined architecture. (b) Full-layer architecture
kernel (MS of NS-FAID). The AP-LLR unit computes the sum of the incoming VN- and
CN-messages, which corresponds to the AP-LLR value to be used at layer + 1 (since
already updated by layer ). The AP-LLR value is forwarded to the VNU, through
corresponding BS and PER blocks. Eventually, the VN-message for the layer + 1 is
computed as the difference between the incoming AP-LLR and the corresponding layer-(

NXFEE INNOVATION
_________________________________________________________________
+ 1) CN-message computed at the previous iteration, the latter being read from the
β_memory.
2) PER/BS Blocks: PER_1 / BS_1 blocks permute / shift the data read from the input
buffer, according to the positions / values of the nonnegative entries in the first decoding
layer. Similar to the BS_R blocks in the pipelined architecture, the PER_WR / BS_WR
blocks permute / shift the AP-LLR values, according to the difference between the
positions / values of the current layer’s ( ) nonnegative entries and those of the next layer
( + 1). This way, VN-messages stored in the α_memory are already permuted and shifted
for the subsequent decoding layer. Finally, PER_L / BS_L blocks permute / shift the hard
decision bits (sign of AP-LLR values), according to the positions / values of the
nonnegative entries in the last decoding layer.
Advantages:
 Cost effective
 Error correction performance is good
References:
[1] M. Karkooti and J. R. Cavallaro, ―Semi-parallel reconfigurable architectures for real-time LDPC
decoding,‖ in Proc. Int. Conf. Inf. Technol., Coding Comput. (ITCC), vol. 1. Apr. 2004, pp. 579–585.
[2] X. Chen, J. Kang, S. Lin, and V. Akella, ―Memory system optimization for FPGA-based
implementation of quasi-cyclic LDPC codes decoders,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol.
58, no. 1, pp. 98–111, Jan. 2011.
[3] V. A. Chandrasetty and S. M. Aziz, ―Resource efficient LDPC decoders for multimedia
communication,‖ Integr., VLSI J., vol. 48, pp. 213–220, Jan. 2015.
[4] K. Zhang, X. Huang, and Z. Wang, ―High-throughput layered decoder implementation for quasi-
cyclic LDPC codes,‖ IEEE J. Sel. Areas Commun., vol. 27, no. 6, pp. 985–994, Aug. 2009.

NXFEE INNOVATION
_________________________________________________________________
[5] X. Peng, Z. Chen, X. Zhao, D. Zhou, and S. Goto, ―A 115 mW 1 Gbps QC-LDPC decoder ASIC for
WiMAX in 65 nm CMOS,‖ in Proc. IEEE Asian Solid State Circuits Conf. (A-SSCC), Nov. 2011, pp.
317–320.
[6] B. Xiang, D. Bao, S. Huang, and X. Zeng, ―An 847–955 Mb/s 342–397 mW dual-path fully-
overlapped QC-LDPC decoder for WiMAX system in 0.13 μm CMOS,‖ IEEE J. Solid-State Circuits,
vol. 46, no. 6, pp. 1416–1432, Jun. 2011.
[7] E. Boutillon and G. Masera, ―Hardware design and realization for iteratively decodable codes,‖ in
Channel Coding: Theory, Algorithms, and Applications. Amsterdam, The Netherlands: Elsevier, 2014,
pp. 583–642.
[8] S. K. Planjery, S. K. Chilappagari, B. Vasić, D. Declercq, and L. Danjean, ―Iterative decoding
beyond belief propagation,‖ in Proc. IEEE Inf. Theory Appl. Workshop (ITA), Jan. 2010, pp. 1–10.
[9] S. K. Planjery, D. Declercq, L. Danjean, and B. Vasić, ―Finite alphabet iterative decoders for LDPC
codes surpassing floating-point iterative decoders,‖ Electron. Lett., vol. 47, no. 16, pp. 919–921, 2011.
[10] S. K. Planjery, D. Declercq, L. Danjean, and B. Vasić, ―Finite alphabet iterative decoders—Part I:
Decoding beyond belief propagation on the binary symmetric channel,‖ IEEE Trans. Commun., vol. 61,
no. 10, pp. 4033–4045, Oct. 2013.
[11] J. Chen, A. Dholakia, E. Eleftheriou, M. P. C. Fossorier, and X.-Y. Hu, ―Reduced-complexity
decoding of LDPC codes,‖ IEEE Trans. Commun., vol. 53, no. 8, pp. 1288–1299, Aug. 2005.
[12] V. Savin, ―LDPC decoders,‖ in Channel Coding: Theory, Algorithms, and Applications.
Amsterdam, The Netherlands: Elsevier, 2014, pp. 211–260.

Analysis and design of cost effective, high-throughput ldpc decoders

More Related Content

Similar to Analysis and design of cost effective, high-throughput ldpc decoders (20)

More from Nxfee Innovation (12)

Recently uploaded (20)

Analysis and design of cost effective, high-throughput ldpc decoders