MbedTLS optimization through vectorization
Walk you through the usage of Codee to optimize MbedTLS, an open source library of cryptographic algorithms intended for embedded systems.
Getting started
Start by cloning the codee-demos
repository and navigating to the source
code:
git clone https://github.com/codee-com/codee-demos.git && \
cd codee-demos/C/MbedTLS && \
git submodule update --init .
Walkthrough
1. Generate the compile_commands.json
This project uses CMake, which has native support for exporting compilation
databases. Add the -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
flag to the CMake
invocation:
cmake -DENABLE_TESTING=ON \
-DUSE_SHARED_MBEDTLS_LIBRARY=ON -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DMBEDTLS_FATAL_WARNINGS=OFF \
-DCMAKE_C_FLAGS=-fopenmp-simd -B build && \
cmake --build build -j
2. Run the global screening report
To explore the recommendations of the Open
Catalog that are applicable to
MbedTLS, let's run Codee's screening report; use --compile-commands
to point
to the compilation database:
codee screening --compile-commands build/compile_commands.json
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2025-01-02 Codee version: 2024.4.2 License type: Full
Performing Fortran module dependency analysis... Done
[ 1/264] tests/src/asn1_helpers.c ... Done
[ 2/264] tests/src/certs.c ... Done
<...>
[264/264] build/tests/test_suite_x509write.c ... Done
SCREENING REPORT
----Number of files----
Total | C C++ Fortran
----- | --- --- -------
264 | 264 0 0
RANKING OF QUALITY CHECKERS
Checker Category Priority AutoFixes # Title
------- ----------- -------- --------- --- ---------------------------------------------------------
PWR003 modern P6 (L2) 119 Explicitly declare pure functions
PWR002 correctness P3 (L3) 31 Declare scalar variables in the smallest possible scope
PWR012 modern P2 (L3) 108 Pass only required fields from derived type as parameters
PWR001 correctness P1 (L3) 458 Declare global variables as function parameters
------- ----------- -------- --------- --- ---------------------------------------------------------
Total 716
RANKING OF OPTIMIZATION CHECKERS
Checker Category Priority AutoFixes # Title
------- -------- -------- --------- --- ----------------------------------------------------------------------------------------------------------------------------
RMK015 other P18 (L1) 264 Tune compiler optimization flags to increase the speed of the code
PWR053 vector P12 (L1) 72 72 Consider applying vectorization to forall loop
PWR054 vector P12 (L1) 4 4 Consider applying vectorization to scalar reduction loop
PWR023 other P6 (L2) 3 Add 'restrict' for pointer function parameters to hint the compiler that vectorization is safe
PWR018 control P6 (L2) 1 Call to recursive function within a loop inhibits vectorization
PWR010 memory P4 (L3) 91 Avoid column-major array access in C/C++
PWR034 memory P4 (L3) 12 Avoid strided array access to improve performance
PWR024 other P4 (L3) 3 Loop can be rewritten in OpenMP canonical form
PWR029 other P3 (L3) 1 Remove integer increment preventing performance optimization
PWR035 memory P2 (L3) 95 Avoid non-consecutive array access to improve performance
PWR016 other P2 (L3) 63 Use separate arrays instead of an Array-of-Structs
PWR028 other P2 (L3) 17 Remove pointer increment preventing performance optimization
PWR036 memory P2 (L3) 14 Avoid indirect array access to improve performance
PWR049 control P2 (L3) 6 Move iterator-dependent condition outside of the loop
RMK010 vector P0 (L4) 6 The vectorization cost model states the loop is not a SIMD opportunity due to strided memory accesses in the loop body
RMK014 vector P0 (L4) 2 The vectorization cost model states the loop is not a SIMD opportunity due to unpredictable memory accesses in the loop body
------- -------- -------- --------- --- ----------------------------------------------------------------------------------------------------------------------------
Total 76 654
SUGGESTIONS
Use 'roi' to get a return of investment estimation report:
codee roi --compile-commands build/compile_commands.json
Get a breakdown per file of the Screening Report (--verbose), focusing on one specific checker (--check-id), e.g.:
codee screening --verbose --check-id RMK015 --compile-commands build/compile_commands.json
264 files, 5462 functions, 12980 loops, 248231 LOCs successfully analyzed (1370 checkers) and 0 non-analyzed files in 36.73 s
All the source files were successfully analyzed and 1370 checkers were
reported. The different types of checkers reported can be seen in the RANKING
section of the output.
3. Run the screening report for specific files
When using Codee for performance optimization, it is important to have the code hotspots identified to target Codee's reports. Let's run the screening report again, restricting the analysis to one of those hotspots:
codee screening --compile-commands build/compile_commands.json library/aes.c
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2025-01-02 Codee version: 2024.4.2 License type: Full
Performing Fortran module dependency analysis... Done
[1/1] library/aes.c (2 entries) ... Done
[Dep] library/aes.c ... Done
SCREENING REPORT
---Number of files---
Total | C C++ Fortran
----- | - --- -------
1 | 1 0 0
RANKING OF QUALITY CHECKERS
Checker Category Priority AutoFixes # Title
------- ----------- -------- --------- -- -------------------------------------------------------
PWR002 correctness P3 (L3) 5 Declare scalar variables in the smallest possible scope
PWR001 correctness P1 (L3) 6 Declare global variables as function parameters
------- ----------- -------- --------- -- -------------------------------------------------------
Total 11
RANKING OF OPTIMIZATION CHECKERS
Checker Category Priority AutoFixes # Title
------- -------- -------- --------- -- ----------------------------------------------------------------------------------------------------------------------------
RMK015 other P18 (L1) 1 Tune compiler optimization flags to increase the speed of the code
PWR053 vector P12 (L1) 7 7 Consider applying vectorization to forall loop
PWR024 other P4 (L3) 1 Loop can be rewritten in OpenMP canonical form
PWR036 memory P2 (L3) 9 Avoid indirect array access to improve performance
PWR028 other P2 (L3) 5 Remove pointer increment preventing performance optimization
RMK010 vector P0 (L4) 1 The vectorization cost model states the loop is not a SIMD opportunity due to strided memory accesses in the loop body
RMK014 vector P0 (L4) 1 The vectorization cost model states the loop is not a SIMD opportunity due to unpredictable memory accesses in the loop body
------- -------- -------- --------- -- ----------------------------------------------------------------------------------------------------------------------------
Total 7 25
SUGGESTIONS
Use 'roi' to get a return of investment estimation report:
codee roi --compile-commands build/compile_commands.json library/aes.c
Use 'checks' to find out details about the detected checks:
codee checks --compile-commands build/compile_commands.json library/aes.c
1 file, 21 functions, 88 loops, 1657 LOCs successfully analyzed (36 checkers) and 0 non-analyzed files in 428 ms
Note how the ranking of checkers shown at the bottom lists the different types of checkers reported, ordered by priority. In this case, the screening report indicates that the PWR053 checker was reported 7 times, it has high priority, and Codee also provides AutoFixes for them.
4. Run the checks report
Now we need to see the entire list of occurrences of the PWR053, each one
pointing at specific lines of code. We can use Codee's checks report report to
obtain such list, including the --check-id PWR053
flag to filter by results
of PWR053.
codee checks --compile-commands build/compile_commands.json library/aes.c --check-id PWR053
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2024-11-14 Codee version: 2024.4 License type: Full
[1/1] library/aes.c (2 entries) ... Done
CHECKS REPORT
library/aes.c:1048:13 [PWR053] (level: L1): Consider applying vectorization to forall loop
library/aes.c:1062:13 [PWR053] (level: L1): Consider applying vectorization to forall loop
library/aes.c:1162:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
library/aes.c:1169:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
library/aes.c:1194:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
library/aes.c:1202:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
library/aes.c:1211:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
SUGGESTIONS
Use --verbose to get more details, e.g:
codee checks --verbose --compile-commands build/compile_commands.json library/aes.c --check-id PWR053
1 file, 21 functions, 88 loops successfully analyzed (7 checkers) and 0 non-analyzed files in 193 ms
5. Run the checks report in verbose mode
Re-run the checks report with --verbose
to get more details for each checker,
including the different autofix options:
codee checks --compile-commands build/compile_commands.json library/aes.c --check-id PWR053 --verbose
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2024-11-14 Codee version: 2024.4 License type: Full
[1/1] library/aes.c (2 entries) ... Done
CHECKS REPORT
library/aes.c:1048:13 [PWR053] (level: L1): Consider applying vectorization to forall loop
Suggestion: Use 'rewrite' to automatically optimize the code
Documentation: https://github.com/codee-com/open-catalog/tree/main/Checks/PWR053
AutoFix (choose one option):
* Using OpenMP pragmas (recommended):
codee rewrite --vector omp --in-place library/aes.c:1048:13 --compile-commands build/compile_commands.json
* Using Clang compiler pragmas:
codee rewrite --vector clang --in-place library/aes.c:1048:13 --compile-commands build/compile_commands.json
* Using GCC pragmas:
codee rewrite --vector gcc --in-place library/aes.c:1048:13 --compile-commands build/compile_commands.json
* Using ICC pragmas:
codee rewrite --vector icc --in-place library/aes.c:1048:13 --compile-commands build/compile_commands.json
* Using combined pragmas, for example (for GCC and Clang pragmas):
codee rewrite --vector gcc,clang --in-place library/aes.c:1048:13 --compile-commands build/compile_commands.json
<...>
1 file, 21 functions, 88 loops successfully analyzed (7 checkers) and 0 non-analyzed files in 182 ms
5. Autofix
Use Codee's autofix capabilities to automatically optimize the code. The recommended rewriting option is to apply vectorization with OpenMP pragmas.
Aditionally, we will remove the loop filter. This way, the autofix will be
applied to all the loops reported as vectorizable within aes.c
, saving us
from having to run codee rewrite
for each loop individually.
codee rewrite --vector omp --in-place library/aes.c --compile-commands build/compile_commands.json
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2024-11-14 Codee version: 2024.4 License type: Full
Results for file 'library/aes.c':
Successfully applied AutoFix to the loop at 'library/aes.c:aes_gen_tables:424:5' [using SIMD]:
<...>
Successfully applied AutoFix to the loop at 'library/aes.c:mbedtls_aes_setkey_enc:568:5' [using SIMD]:
<...>
Successfully applied AutoFix to the loop at 'library/aes.c:mbedtls_aes_crypt_cbc:1048:13' [using SIMD]:
<...>
Successfully applied AutoFix to the loop at 'library/aes.c:mbedtls_aes_crypt_cbc:1062:13' [using SIMD]:
<...>
Successfully applied AutoFix to the loop at 'library/aes.c:mbedtls_aes_crypt_xts:1162:9' [using SIMD]:
<...>
Successfully applied AutoFix to the loop at 'library/aes.c:mbedtls_aes_crypt_xts:1169:9' [using SIMD]:
<...>
Successfully applied AutoFix to the loop at 'library/aes.c:mbedtls_aes_crypt_xts:1194:9' [using SIMD]:
<...>
Could not apply AutoFix to the loop at 'library/aes.c:mbedtls_aes_crypt_xts:1202:9' [using SIMD]:
[WARNING] library/aes.c:1202:9 Loop is not OpenMP compliant
<...>
Successfully applied AutoFix to the loop at 'library/aes.c:mbedtls_aes_crypt_xts:1211:9' [using SIMD]:
<...>
Successfully updated library/aes.c
Minimum software stack requirements: OpenMP version 4.0 with simd capabilities
Review the source code changes, for instance, using control version systems:
git diff .
diff --git a/library/aes.c b/library/aes.c
index 4afc3c48a..9b51789e9 100644
--- a/library/aes.c
+++ b/library/aes.c
@@ -421,6 +421,9 @@ static void aes_gen_tables( void )
/*
* generate the forward and reverse tables
*/
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd private(x, y, z)
for( i = 0; i < 256; i++ )
{
x = FSb[i];
@@ -565,6 +568,9 @@ int mbedtls_aes_setkey_enc( mbedtls_aes_context *ctx, const unsigned char *key,
return( mbedtls_aesni_setkey_enc( (unsigned char *) ctx->rk, key, keybits ) );
#endif
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < ( keybits >> 5 ); i++ )
{
RK[i] = MBEDTLS_GET_UINT32_LE( key, i << 2 );
@@ -1045,6 +1051,9 @@ int mbedtls_aes_crypt_cbc( mbedtls_aes_context *ctx,
if( ret != 0 )
goto exit;
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < 16; i++ )
output[i] = (unsigned char)( output[i] ^ iv[i] );
@@ -1059,6 +1068,9 @@ int mbedtls_aes_crypt_cbc( mbedtls_aes_context *ctx,
{
while( length > 0 )
{
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < 16; i++ )
output[i] = (unsigned char)( input[i] ^ iv[i] );
@@ -1159,6 +1171,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
mbedtls_gf128mul_x_ble( tweak, tweak );
}
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < 16; i++ )
tmp[i] = input[i] ^ tweak[i];
@@ -1166,6 +1181,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
if( ret != 0 )
return( ret );
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
@@ -1059,6 +1068,9 @@ int mbedtls_aes_crypt_cbc( mbedtls_aes_context *ctx,
{
while( length > 0 )
{
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < 16; i++ )
output[i] = (unsigned char)( input[i] ^ iv[i] );
@@ -1159,6 +1171,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
mbedtls_gf128mul_x_ble( tweak, tweak );
}
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < 16; i++ )
tmp[i] = input[i] ^ tweak[i];
@@ -1166,6 +1181,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
if( ret != 0 )
return( ret );
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < 16; i++ )
output[i] = tmp[i] ^ tweak[i];
@@ -1191,6 +1209,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
* byte of cyphertext we won't steal. At the same time, copy the
* remainder of the input for this final round (since the loop bounds
* are the same). */
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd lastprivate(i)
for( i = 0; i < leftover; i++ )
{
output[i] = prev_output[i];
@@ -1208,6 +1229,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
/* Write the result back to the previous block, overriding the previous
* output we copied. */
+ // Codee: Loop modified by Codee (2024-09-03 00:49:13)
+ // Codee: Technique applied: vectorization with 'omp' pragmas
+ #pragma omp simd
for( i = 0; i < 16; i++ )
prev_output[i] = tmp[i] ^ t[i];
}
6. Execution
Compile the code with the optimizations applied by Codee:
cmake --build build -j
And run the benchmark AES_XTS, corresponding to aes.c
:
./build/programs/test/benchmark aes_xts
AES-XTS-128 : 874306 KiB/s, 4 cycles/byte
AES-XTS-256 : 715399 KiB/s, 4 cycles/byte
Lastly, revert the changes applied by Codee so that the code returns to the original version:
git restore .
Re-compile again, so the binaries return to their original version as well:
cmake --build build -j
And run the benchmark AES_XTS, to compare the results with the ones obtained earlier with the code optimized by Codee:
./build/programs/test/benchmark aes_xts
AES-XTS-128 : 715509 KiB/s, 4 cycles/byte
AES-XTS-256 : 653419 KiB/s, 5 cycles/byte
Note how Codee's optimization managed to obtain an speedup of 20% between the original version and the one optimized by Codee.
Testing machine: AMD Ryzen 7 7840HS laptop.
Appendix
Compiler Efficiency Report for GCC-11
Codee provides the Compiler Efficiency Report (codee diagnose --compiler-efficiency
) for comparing the different optimizations that the
compiler performs when changing from the current build settings to the
optimizations flags suggested by Codee. In a nutshell, it allows the user to
compare the compiler optimizations performed under the current -OX
flag with
the results using the optimization flag suggested by Codee. E.g: -O2
vs
-O3
.
Let's try the Compiler Efficiency Report with an example. Invoke codee diagnose --compiler-efficiency
to see the different optimizations that the
compiler applies to the loops of the function mbedtls_aes_crypt_xts
, which is located
in the library/aes.c
file:
codee diagnose --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2024-11-14 Codee version: 2024.4 License type: Full
[1/1] /user/codee-demos/C/MbedTLS/library/aes.c (2 entries) ... Done
COMPILER EFFICIENCY REPORT
Compiler: gcc-11.4
Optimization flags detected in the build: -fopenmp-simd -O2
Maximum optimization flags suggested by Codee: -ftree-vectorize -O3
Loop Current Suggested Codee optimizations
---------------------------------- ------- --------- --------------------
/user/codee-demos/C/MbedTLS/library/aes.c
|- mbedtls_aes_crypt_xts:1126:5 n/a n/a
|- mbedtls_aes_crypt_xts:1127:5 n/a n/a
|- mbedtls_aes_crypt_xts:1129:5 n/a n/a
|- mbedtls_aes_crypt_xts:1130:5 n/a n/a
|- mbedtls_aes_crypt_xts:1131:5 n/a n/a
|- mbedtls_aes_crypt_xts:1147:5 n/a n/a
| |- mbedtls_aes_crypt_xts:1162:9 n/a no (unrl) PWR053(1)
| `- mbedtls_aes_crypt_xts:1169:9 n/a no (unrl) PWR053(1)
|- mbedtls_aes_crypt_xts:1194:9 n/a auto* PWR053(1)
|- mbedtls_aes_crypt_xts:1202:9 n/a auto* PWR053(1), PWR024(1)
`- mbedtls_aes_crypt_xts:1211:9 n/a auto PWR053(1)
SUMMARY
Vectorized Current Suggested
---------- ------- ---------
auto 0 3
no 0 2
n/a 11 6
---------- ------- ---------
Total number of loops: 11
The Codee compiler efficiency analysis revealed +27.27% increase in the number of loops optimized by the compilers, switching from the current optimization flags to the ones suggested by Codee (from 0/11 up to 3/11 optimized loops)
LEGEND
Current: Compiler optimization report with the optimization level used on the build
Suggested: Compiler optimization report with the optimization level suggested by Codee
Codee optimizations: list of Codee checkers reported for each loop
auto: loop automatically vectorized by the compiler
no: loop not vectorized by the compiler. Could happen for different reasons:
no (cost): the compiler's cost model recommends so
no (ctrl): complex control flow inhibits vectorization
no (dep) : there is (or seems to be) a dependency inhibiting vectorization
no (prec): potential precision loss if vectorized
no (vgen): SIMD instruction generator not supported by the compiler
no (outr): unsupported outer loop
no (unrl): the loop was fully unrolled by the compiler
no (call): the loop was replaced by a library call
no (othr): any other reason
n/a: no information was provided by the compiler for this loop
SUGGESTIONS
Use --show-messages to get details on the messages reported by each compiler:
codee diagnose --show-messages --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts
Find out the actionable insights related to vectorization:
codee checks --only-categories vector --verbose --compile-commands build/compile_commands.json library/aes.c:mbedtls_aes_crypt_xts
1 file, 1 function, 11 loops, 1657 LOCs successfully analyzed (6 checkers) and 0 non-analyzed files in 1163 ms
In this example we can see how the compiler changed its optimization behavior
when replacing the -O2
flag with -O3
.
With -O2
the compiler did not auto-vectorize any of the loops, while with
-O3
it vectorized three of the loops (from lines 1194, 1202 and 1211).
Compiler Efficiency Report for GCC-13
codee diagnose --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2024-11-14 Codee version: 2024.4 License type: Full
[1/1] library/aes.c (2 entries) ... Done
COMPILER EFFICIENCY REPORT
Compiler: gcc-13.2
Optimization flags detected in the build: -fopenmp-simd -O2
Maximum optimization flags suggested by Codee: -ftree-vectorize -O3
Loop Current Suggested Codee optimizations
---------------------------------- --------- --------- --------------------
/home/tamara/Escritorio/codee-demos/C/MbedTLS/library/aes.c
|- mbedtls_aes_crypt_xts:1126:5 n/a n/a
|- mbedtls_aes_crypt_xts:1127:5 n/a n/a
|- mbedtls_aes_crypt_xts:1129:5 n/a n/a
|- mbedtls_aes_crypt_xts:1130:5 n/a n/a
|- mbedtls_aes_crypt_xts:1131:5 n/a n/a
|- mbedtls_aes_crypt_xts:1147:5 n/a n/a
| |- mbedtls_aes_crypt_xts:1162:9 auto no (unrl) PWR053(1)
| `- mbedtls_aes_crypt_xts:1169:9 auto no (unrl) PWR053(1)
|- mbedtls_aes_crypt_xts:1194:9 no (othr) auto* PWR053(1)
|- mbedtls_aes_crypt_xts:1202:9 no (othr) auto* PWR053(1), PWR024(1)
`- mbedtls_aes_crypt_xts:1211:9 auto auto PWR053(1)
SUMMARY
Vectorized Current Suggested
---------- ------- ---------
auto 3 3
no 2 2
n/a 6 6
---------- ------- ---------
Total number of loops: 11
LEGEND
Current: Compiler optimization report with the optimization level used on the build
Suggested: Compiler optimization report with the optimization level suggested by Codee
Codee optimizations: list of Codee checkers reported for each loop
auto: loop automatically vectorized by the compiler
no: loop not vectorized by the compiler. Could happen for different reasons:
no (cost): the compiler's cost model recommends so
no (ctrl): complex control flow inhibits vectorization
no (dep) : there is (or seems to be) a dependency inhibiting vectorization
no (prec): potential precision loss if vectorized
no (vgen): SIMD instruction generator not supported by the compiler
no (outr): unsupported outer loop
no (unrl): the loop was fully unrolled by the compiler
no (call): the loop was replaced by a library call
no (othr): any other reason
n/a: no information was provided by the compiler for this loop
SUGGESTIONS
Use --show-messages to get details on the messages reported by each compiler:
codee diagnose --show-messages --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts
Find out the actionable insights related to vectorization:
codee checks --only-categories vector --verbose --compile-commands build/compile_commands.json library/aes.c:mbedtls_aes_crypt_xts
1 file, 1 function, 11 loops, 1657 LOCs successfully analyzed (6 checkers) and 0 non-analyzed files in 1633 ms
In this case, with GCC-13. loops of lines 1162 and 1169 were autovectorized
under -O2
, but when using -O3
the vectorization was replaced by an
aggressive loop unrolling.
On the other hand, loops of lines 1194 and 1202 were not autovectorized before,
but when changing to -O3
the compiler started to vectorize them.
The rest of the loops of the given function did not
suffer changes between -O2
and -O3
.