The document proposes a low power, high speed parallel architecture for cyclic convolution based on the Fermat Number Transform (FNT). It introduces techniques like Code Conversion without Addition (CCWA) and Butterfly Operation without Addition (BOWA) to perform FNT and inverse FNT without additions except for the final stages. This avoids modulo 2n+1 carry save additions to reduce power and delay. Modulo 2n+1 Partial Products Multipliers are used for pointwise multiplications to further improve efficiency. Simulation results show the proposed 4-2 compressor architecture achieves lower power compared to existing designs.