Google Scholar

Efficient FFT-Based CNN Acceleration with Intra-Patch Parallelization and Flex-Stationary Dataflow

SP Sunny, S Das - 2024 IEEE International Symposium on …, 2024 - ieeexplore.ieee.org

SP Sunny, S Das

2024 IEEE International Symposium on Circuits and Systems (ISCAS), 2024•ieeexplore.ieee.org

This paper presents a novel Hadamard Product Generator (HPG) for FFT-based CNN acceleration that effectively addresses computation and energy bottlenecks. The proposed block uses Intra-Patch parallelization to optimize Complex Multiply and Accumulate (CMAC) unit utilization and maintains identical reuse behavior across patch elements. This scheme also offers multiple spatial unrolling schemes to increase resource reuse. Additionally, the proposed HPG leverages the Flex-Stationary dataflow to adaptively store tensors with high reuse opportunities in the on-chip memory. The prototype is implemented on the Zynq MPSoC (XCZU7CG). It showcases throughput gains of 8.16× for VGG-16 and 9.30× for AlexNet compared to the state-of-the-art frequency domain accelerator with an area overhead of only 28.51%. It achieves an 8.82× average improvement over Eyeriss and a 7.87× improvement over Flexflow in EDP with a similar hardware configuration.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 1 Related articles

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Efficient FFT-Based CNN Acceleration with Intra-Patch Parallelization and Flex-Stationary Dataflow