VCNPU: An Algorithm-Hardware Co-Optimized Framework for Accelerating Neural Video Compression
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2024•ieeexplore.ieee.org
Video compression is essential for storing and transmitting video content. Real-time
decoding is indispensable for delivering a seamless user experience. Neural video
compression (NVC) integrates traditional coding techniques with deep learning, resulting in
impressive compression efficiency. However, the real-time deployment of advanced NVC
models encounters challenges due to their high complexity and extensive off-chip memory
access. This article presents a novel NVC accelerator, called video compression neural …
decoding is indispensable for delivering a seamless user experience. Neural video
compression (NVC) integrates traditional coding techniques with deep learning, resulting in
impressive compression efficiency. However, the real-time deployment of advanced NVC
models encounters challenges due to their high complexity and extensive off-chip memory
access. This article presents a novel NVC accelerator, called video compression neural …
Video compression is essential for storing and transmitting video content. Real-time decoding is indispensable for delivering a seamless user experience. Neural video compression (NVC) integrates traditional coding techniques with deep learning, resulting in impressive compression efficiency. However, the real-time deployment of advanced NVC models encounters challenges due to their high complexity and extensive off-chip memory access. This article presents a novel NVC accelerator, called video compression neural processing unit (VCNPU), via an algorithm-hardware co-design framework. First, at the algorithmic level, a reparameterizable video compression network (RepVCN) is proposed to aggregate multiscale features and boost video compression quality. RepVCN can be equivalently transformed into a streamlined structure without extra computations after training. Second, a mask-sharing pruning strategy is proposed to compress RepVCN in the fast transform domain. It effectively prevents the destruction of sparse patterns caused by model simplification, maintaining the model capacity. Third, at the hardware level, a reconfigurable sparse computing module is designed to flexibly support sparse fast convolutions and deconvolutions of the compact RepVCN. Besides, a hybrid layer fusion pipeline is advocated to reduce off-chip data communication caused by extensive motion and residual features. Finally, based on the joint optimization of computation and communication, our VCNPU is constructed to realize adaptive adjustments of various decoding qualities and is implemented under TSMC 28-nm CMOS technology. Extensive experiments demonstrate that our RepVCN provides superior coding quality over other video compression baselines. Meanwhile, our VCNPU achieves improvements in throughput, in area efficiency, and in energy efficiency compared to prior video processors.
ieeexplore.ieee.org