Skip to content

Commit e22cf50

Browse files
pittmanebeid
andauthored
AVX-512 support for RSA Signing (#1273)
This change adds AVX-512 support for RSA 2k, 3k and 4k signing. It is built around the use of AVX512_IFMA within the [(Almost) Montgomery Multiplication](https://eprint.iacr.org/2011/239) implementation that comprises the modular exponentiation part of the RSA algorithm. It is ported from the [OpenSSL patch](openssl/openssl#13750). On C6i instance, clang 12, Release build: Before: Did 832 RSA 2048 signing operations in 1009511us (824.2 ops/sec) Did 41000 RSA 2048 verify (same key) operations in 1019103us (40231.5 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1007956us (29763.2 ops/sec) Did 3684 RSA 2048 private key parse operations in 1067692us (3450.4 ops/sec) Did 340 RSA 3072 signing operations in 1051690us (323.3 ops/sec) Did 13000 RSA 3072 verify (same key) operations in 1087695us (11951.9 ops/sec) Did 16000 RSA 3072 verify (fresh key) operations in 1005781us (15908.0 ops/sec) Did 1870 RSA 3072 private key parse operations in 1017467us (1837.9 ops/sec) Did 128 RSA 4096 signing operations in 1015724us (126.0 ops/sec) Did 10000 RSA 4096 verify (same key) operations in 1071670us (9331.2 ops/sec) Did 6952 RSA 4096 verify (fresh key) operations in 1016484us (6839.3 ops/sec) Did 1110 RSA 4096 private key parse operations in 1092991us (1015.6 ops/sec) After: Did 1690 RSA 2048 signing operations in 1025072us (1648.7 ops/sec) Did 63000 RSA 2048 verify (same key) operations in 1008785us (62451.4 ops/sec) Did 54000 RSA 2048 verify (fresh key) operations in 1000298us (53983.9 ops/sec) Did 8000 RSA 2048 private key parse operations in 1000938us (7992.5 ops/sec) Did 550 RSA 3072 signing operations in 1012078us (543.4 ops/sec) Did 30000 RSA 3072 verify (same key) operations in 1022061us (29352.5 ops/sec) Did 27000 RSA 3072 verify (fresh key) operations in 1037663us (26020.0 ops/sec) Did 4140 RSA 3072 private key parse operations in 1006526us (4113.2 ops/sec) Did 253 RSA 4096 signing operations in 1050767us (240.8 ops/sec) Did 18000 RSA 4096 verify (same key) operations in 1057742us (17017.4 ops/sec) Did 15000 RSA 4096 verify (fresh key) operations in 1000483us (14992.8 ops/sec) Did 2510 RSA 4096 private key parse operations in 1004408us (2499.0 ops/sec) There is currently no support for 8k, so no change there. However, this could be a follow on if there is interest in that. Call-outs: This patch is primarily additive modulo a small logic change that occurs in `mod_exp()` in `rsa_impl.c`, where, previously, the calls to `mod_montgomery` and `BN_mod_exp_mont_consttime` were interleaved. Here, in order to make possible the parallel exponentiations, `r1` is kept around and a new `BIGNUM`, `r2`, is created on the context. --------- Co-authored-by: Nevine Ebeid <[email protected]> Co-authored-by: Nevine Ebeid <[email protected]>
1 parent 9d21f38 commit e22cf50

31 files changed

+17998
-2730
lines changed

crypto/fipsmodule/CMakeLists.txt

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ if(ARCH STREQUAL "x86_64")
3838
p256_beeu-x86_64-asm.${ASM_EXT}
3939
rdrand-x86_64.${ASM_EXT}
4040
rsaz-avx2.${ASM_EXT}
41+
rsaz-2k-avx512.${ASM_EXT}
42+
rsaz-3k-avx512.${ASM_EXT}
43+
rsaz-4k-avx512.${ASM_EXT}
4144
sha1-x86_64.${ASM_EXT}
4245
sha256-x86_64.${ASM_EXT}
4346
sha512-x86_64.${ASM_EXT}
@@ -147,6 +150,9 @@ if(PERL_EXECUTABLE)
147150
perlasm(p256_beeu-armv8-asm.${ASM_EXT} ec/asm/p256_beeu-armv8-asm.pl)
148151
perlasm(rdrand-x86_64.${ASM_EXT} rand/asm/rdrand-x86_64.pl)
149152
perlasm(rsaz-avx2.${ASM_EXT} bn/asm/rsaz-avx2.pl)
153+
perlasm(rsaz-2k-avx512.${ASM_EXT} bn/asm/rsaz-2k-avx512.pl)
154+
perlasm(rsaz-3k-avx512.${ASM_EXT} bn/asm/rsaz-3k-avx512.pl)
155+
perlasm(rsaz-4k-avx512.${ASM_EXT} bn/asm/rsaz-4k-avx512.pl)
150156
perlasm(sha1-586.${ASM_EXT} sha/asm/sha1-586.pl)
151157
perlasm(sha1-armv4-large.${ASM_EXT} sha/asm/sha1-armv4-large.pl)
152158
perlasm(sha1-armv8.${ASM_EXT} sha/asm/sha1-armv8.pl)
@@ -175,6 +181,9 @@ if (CLANG AND (CMAKE_ASM_COMPILER_ID MATCHES "Clang" OR CMAKE_ASM_COMPILER MATCH
175181
(CMAKE_C_COMPILER_VERSION VERSION_LESS "7.0.0") AND (ARCH STREQUAL "x86_64"))
176182
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/aesni-gcm-avx512.${ASM_EXT} PROPERTIES COMPILE_FLAGS "-mavx512f -mavx512bw -mavx512dq -mavx512vl")
177183
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/aesni-xts-avx512.${ASM_EXT} PROPERTIES COMPILE_FLAGS "-mavx512f -mavx512bw -mavx512dq -mavx512vl")
184+
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/rsaz-2k-avx512.${ASM_EXT} PROPERTIES COMPILE_FLAGS "-mavx512f -mavx512bw -mavx512dq -mavx512vl -mavx512ifma")
185+
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/rsaz-3k-avx512.${ASM_EXT} PROPERTIES COMPILE_FLAGS "-mavx512f -mavx512bw -mavx512dq -mavx512vl -mavx512ifma")
186+
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/rsaz-4k-avx512.${ASM_EXT} PROPERTIES COMPILE_FLAGS "-mavx512f -mavx512bw -mavx512dq -mavx512vl -mavx512ifma")
178187
endif()
179188

180189
# s2n-bignum files can be compiled on Unix platforms only (except Apple),
@@ -384,7 +393,7 @@ if(FIPS_DELOCATE)
384393
# The flags are not required for any other compiler we are running in the CI.
385394
if (CLANG AND (CMAKE_ASM_COMPILER_ID MATCHES "Clang" OR CMAKE_ASM_COMPILER MATCHES "clang") AND
386395
(CMAKE_C_COMPILER_VERSION VERSION_LESS "7.0.0") AND (ARCH STREQUAL "x86_64"))
387-
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/bcm-delocated.S PROPERTIES COMPILE_FLAGS "-mavx512f -mavx512bw -mavx512dq -mavx512vl")
396+
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/bcm-delocated.S PROPERTIES COMPILE_FLAGS "-mavx512f -mavx512bw -mavx512dq -mavx512vl -mavx512ifma")
388397
endif()
389398
390399
add_library(

crypto/fipsmodule/bcm.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
#include "bn/prime.c"
6464
#include "bn/random.c"
6565
#include "bn/rsaz_exp.c"
66+
#include "bn/rsaz_exp_x2.c"
6667
#include "bn/shift.c"
6768
#include "bn/sqrt.c"
6869
#include "cipher/aead.c"

0 commit comments

Comments
 (0)