Google Scholar

VCNPU: An Algorithm-Hardware Co-Optimized Framework for Accelerating Neural Video Compression

S Zhang, W Mao, Z Wang - IEEE Transactions on Very Large …, 2024 - ieeexplore.ieee.org

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2024•ieeexplore.ieee.org

Video compression is essential for storing and transmitting video content. Real-time decoding is indispensable for delivering a seamless user experience. Neural video compression (NVC) integrates traditional coding techniques with deep learning, resulting in impressive compression efficiency. However, the real-time deployment of advanced NVC models encounters challenges due to their high complexity and extensive off-chip memory access. This article presents a novel NVC accelerator, called video compression neural processing unit (VCNPU), via an algorithm-hardware co-design framework. First, at the algorithmic level, a reparameterizable video compression network (RepVCN) is proposed to aggregate multiscale features and boost video compression quality. RepVCN can be equivalently transformed into a streamlined structure without extra computations after training. Second, a mask-sharing pruning strategy is proposed to compress RepVCN in the fast transform domain. It effectively prevents the destruction of sparse patterns caused by model simplification, maintaining the model capacity. Third, at the hardware level, a reconfigurable sparse computing module is designed to flexibly support sparse fast convolutions and deconvolutions of the compact RepVCN. Besides, a hybrid layer fusion pipeline is advocated to reduce off-chip data communication caused by extensive motion and residual features. Finally, based on the joint optimization of computation and communication, our VCNPU is constructed to realize adaptive adjustments of various decoding qualities and is implemented under TSMC 28-nm CMOS technology. Extensive experiments demonstrate that our RepVCN provides superior coding quality over other video compression baselines. Meanwhile, our VCNPU achieves improvements in throughput, in area efficiency, and in energy efficiency compared to prior video processors.

ieeexplore.ieee.org

Show moreShow less

Create alert

Cite

Advanced search

Saved to My library

VCNPU: An Algorithm-Hardware Co-Optimized Framework for Accelerating Neural Video Compression