Disabling AES-NI on Linux OpenSSL

Ever since the Sandy Bridge microarchitecture, Intel CPUs have been coming with hardware-accelerated AES support (aka “AES-NI”, new instructions). I figured it would be interesting see a comparison between AES with and without the hardware acceleration on my Intel Core i5-3317U CPU (Ivy Bridge) on Arch Linux.

According to a post on the OpenSSL Users mailing list, you can force openssl to avoid hardware AES instructions using the OPENSSL_ia32cap environment variable.

Benchmarks

First, with AES-NI enabled (the default, on hardware that supports it):

$ openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 57196857 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 15343650 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 3897351 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 978726 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 122310 aes-128-cbc's in 3.00s
OpenSSL 1.0.1e 11 Feb 2013
built on: Sun Oct 20 14:49:13 CEST 2013
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     305049.90k   327331.20k   332573.95k   334071.81k   333987.84k

Then, setting the capability mask to turn off the hardware AES features:

$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 27883366 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 7736907 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 1949328 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 498847 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 62446 aes-128-cbc's in 3.00s
OpenSSL 1.0.1e 11 Feb 2013
built on: Sun Oct 20 14:49:13 CEST 2013
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     148711.29k   165054.02k   166342.66k   170273.11k   170519.21k

You can see that hardware-accelerated AES is pretty consistently twice as fast as the implementation without aesni. So it’s not an exponential win, but getting twice the performance is certainly very serious! This is great for not only for servers using AES encryption (SSL/TLS, hello!), but also for consumers wanting to connect to said servers as well as things like full-disk encryption.

Note: It seems Arch Linux’s OpenSSL is built with AES-NI support but not as an engine, so openssl speed could be misleading (ie, you’d see no difference with or without the capabilities masked). To get the AES-NI support you need to use -evp (“envelope”) mode, which is some sort of high-level interface for crypto functions in OpenSSL.

11 thoughts on “Disabling AES-NI on Linux OpenSSL

  1. This is some seriously good stuff.
    It seems to scale in a similar fashion in older chips such as Arrandale, with and without AES-NI acceleration.

    Posting benchmarks once I get off from the bus.

  2. As promised, here are my benchmarks, with AES-NI HWaccel activated:

    openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 113937936 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 33138098 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 8888395 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 2265871 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 283941 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1e-fips 11 Feb 2013
    built on: Tue Oct 29 16:13:07 UTC 2013
    options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
    compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc     607668.99k   706946.09k   758476.37k   773417.30k   775348.22k

    Seems my Intel Core i7 640M is 2.32X faster than your ULV core i5.

  3. Without AES-Ni acceleration, my scores are:

    OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 41539249 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 12116789 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 3137564 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 806232 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 99932 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1e-fips 11 Feb 2013
    built on: Tue Oct 29 16:13:07 UTC 2013
    options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
    compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc     221542.66k   258491.50k   267738.79k   275193.86k   272880.98k

    With AES-NI HWaccel enabled, the OpenSSL benchmark scales by a factor of 2.84x. Thats’ on a Nehalem 😉

  4. Finally, on Arch Linux.
    With AES-NI H/W acceleration (AES-NI) badassery:

    openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 118663645 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 31966150 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 8718466 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 2136029 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 269127 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1e 11 Feb 2013
    built on: Sat Nov  2 22:31:48 CET 2013
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
    compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc     632872.77k   681944.53k   743975.77k   729097.90k   734896.13k

    Without AES-NI H/W Acceleration:

    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 41971398 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 12335693 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 3235378 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 839065 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 103946 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1e 11 Feb 2013
    built on: Sat Nov  2 22:31:48 CET 2013
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
    compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc     223847.46k   263161.45k   276085.59k   286400.85k   283841.88k

    On Arch (everything up to date and using the latest OpenSSL build).
    Processor:
    Intel Core i7 640M, Arrandale.

    It seems the build on Arch performs much, much faster than what Fedora 19 packages, haha.

  5. After an update today:

    openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 123841672 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 34140840 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 8555372 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 2246614 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 282338 aes-128-cbc’s in 3.00s
    OpenSSL 1.0.1f 6 Jan 2014
    built on: Mon Jan 6 21:23:11 CET 2014
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,–noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector –param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 660488.92k 728337.92k 730058.41k 766844.25k 770970.97k

    With AES-NI only.

  6. Without AES-NI, just to catch up with your slow ULV:

    OPENSSL_ia32cap=”~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 46593943 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 12691229 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 3338377 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 854280 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 107370 aes-128-cbc’s in 3.00s
    OpenSSL 1.0.1f 6 Jan 2014
    built on: Mon Jan 6 21:23:11 CET 2014
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,–noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector –param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 248501.03k 270746.22k 284874.84k 291594.24k 293191.68k

  7. And to mess up with the results, hail unto:

    Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz

    Results with all the zestinesss:

    openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 71153330 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 24093729 aes-128-cbc’s in 3.01s
    Doing aes-128-cbc for 3s on 256 size blocks: 6118901 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 1536165 aes-128-cbc’s in 3.01s
    Doing aes-128-cbc for 3s on 8192 size blocks: 192017 aes-128-cbc’s in 3.00s
    OpenSSL 1.0.1 14 Mar 2012
    built on: Mon Apr 15 15:27:18 UTC 2013
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector –param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,–noexecstack -Wall -DOPENSSL_NO_TLS1_2_CLIENT -DOPENSSL_MAX_TLS1_2_CIPHER_LENGTH=50 -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 379484.43k 512291.91k 522146.22k 522602.31k 524334.42k

    Now, without AES-NI:

    OPENSSL_ia32cap=”~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 30915053 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 12543885 aes-128-cbc’s in 3.01s
    Doing aes-128-cbc for 3s on 256 size blocks: 3204918 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 812027 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 80782 aes-128-cbc’s in 3.01s
    OpenSSL 1.0.1 14 Mar 2012
    built on: Mon Apr 15 15:27:18 UTC 2013
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector –param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,–noexecstack -Wall -DOPENSSL_NO_TLS1_2_CLIENT -DOPENSSL_MAX_TLS1_2_CIPHER_LENGTH=50 -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 164880.28k 266713.83k 273486.34k 277171.88k 219855.86k

    Thats’ on Ubuntu 12.04LTS.

    And yes….My Arrandale kicks its’ 16-core monstrous ass.

  8. Alan,

    Its’ all about the code optimizations. In Windows cygwin64, where these geniuses saw it wisest to compile openssl using mtune=generic and with no AES-NI acceleration, probably, here are my results. Very interesting:

    Brainiarc7@Brainiarc7-PC ~
    $ openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 23181988 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 6403056 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 1653828 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 420094 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 52060 aes-128-cbc’s in 3.00s
    OpenSSL 1.0.1e 11 Feb 2013
    built on: Thu Mar 7 05:51:56 CST 2013
    options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: x86_64-pc-cygwin-gcc -D_WINDLL -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DTERMIOS -DL_ENDIAN -O3 -Wall
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 123637.27k 136462.07k 140985.67k 143248.84k 141969.21k

    AES-NI acceleration disabled via OPENSSL ia32cap:

    Brainiarc7@Brainiarc7-PC ~
    $ OPENSSL_ia32cap=”~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 23114849 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 6451259 aes-128-cbc’s in 3.01s
    Doing aes-128-cbc for 3s on 256 size blocks: 1658972 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 418238 aes-128-cbc’s in 3.01s
    Doing aes-128-cbc for 3s on 8192 size blocks: 52383 aes-128-cbc’s in 3.00s
    OpenSSL 1.0.1e 11 Feb 2013
    built on: Thu Mar 7 05:51:56 CST 2013
    options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: x86_64-pc-cygwin-gcc -D_WINDLL -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DTERMIOS -DL_ENDIAN -O3 -Wall
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 123279.19k 137352.15k 141377.11k 142473.62k 142850.05k

    Conclusion: With AES-NI disabled in the cygwin64 build of OpenSSL, I get better results. This concludes to the fact that cygwin sucks.
    But so does Windows.

  9. Lets’ see how Ubuntu 12.04 fares on an IvyBridge Intel Core i5:

    model name : Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz

    Scores with AES-NI:

    openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 107115069 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 28602044 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 7269322 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 1824450 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 227851 aes-128-cbc’s in 3.00s
    OpenSSL 1.0.1 14 Mar 2012
    built on: Wed Jan 8 20:45:51 UTC 2014
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector –param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,–noexecstack -Wall -DOPENSSL_NO_TLS1_2_CLIENT -DOPENSSL_MAX_TLS1_2_CIPHER_LENGTH=50 -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 571280.37k 610176.94k 620315.48k 622745.60k 622185.13k

    With AES-NI explicitly disabled:

    OPENSSL_ia32cap=”~0×200000200000000″ openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 20751858 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 5770370 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 1497761 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 801904 aes-128-cbc’s in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 102059 aes-128-cbc’s in 3.00s
    OpenSSL 1.0.1 14 Mar 2012
    built on: Wed Jan 8 20:45:51 UTC 2014
    options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector –param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,–noexecstack -Wall -DOPENSSL_NO_TLS1_2_CLIENT -DOPENSSL_MAX_TLS1_2_CIPHER_LENGTH=50 -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The ‘numbers’ are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 110676.58k 123101.23k 127808.94k 273716.57k 278689.11k

    Meh.

Comments are closed.