MbedTLS optimization through vectorization

Goal

Walk you through the usage of Codee to optimize MbedTLS, an open source library of cryptographic algorithms intended for embedded systems.

Getting started

Start by cloning the codee-demos repository and navigating to the source code:

git clone https://github.com/codee-com/codee-demos.git && \
    cd codee-demos/C/MbedTLS && \
    git submodule update --init .

Walkthrough

1. Generate the `compile_commands.json`

This project uses CMake, which has native support for exporting compilation databases. Add the -DCMAKE_EXPORT_COMPILE_COMMANDS=ON flag to the CMake invocation:

CMake invocation
cmake -DENABLE_TESTING=ON \
    -DUSE_SHARED_MBEDTLS_LIBRARY=ON -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DMBEDTLS_FATAL_WARNINGS=OFF \
    -DCMAKE_C_FLAGS=-fopenmp-simd -B build && \
    cmake --build build -j

2. Run the global screening report

New feature

Running codee commands with the additional --db codee.db flag enables Incremental Static Analysis. This reduces runtime by storing analysis results and reusing them in subsequent analysis, reanalyzing only the source code that has changed.

To explore the recommendations of the Open Catalog that are applicable to MbedTLS, let's run Codee's screening report; use --compile-commands to point to the compilation database:

Codee command
codee screening --compile-commands build/compile_commands.json --db codee.db

Codee output
Date: 2026-03-25 Codee version: 2026.1 License type: Team

Searching Incremental Static Analysis database... Enabled

[  1/264] /home/user/codee-demos/C/MbedTLS/tests/src/asn1_helpers.c ... Done: new
[  2/264] /home/user/codee-demos/C/MbedTLS/tests/src/certs.c ... Done: new
<...>
[264/264] /home/user/codee-demos/C/MbedTLS/build/tests/test_suite_x509write.c ... Done: new

SCREENING REPORT

-------Number of files-------
Total | C   C++ Fortran Other
----- | --- --- ------- -----
264   | 264 0   0       0

RANKING OF QUALITY CHECKERS

Checker Category                      Priority AutoFixes #    Title
------- ----------------------------- -------- --------- ---- ---------------------------------------------------------------------------------
PWR001  correctness, modern, security P4 (L3)            458  Pass global variables as function arguments
PWR082  correctness, security         P4 (L3)            371  Remove unused variables
PWR085  security                      P4 (L3)            1    Favor iterative implementations over recursion to prevent stack overflows
PWR003  modern, security              P3 (L3)            122  Explicitly declare pure functions
PWR002  correctness, modern, security P3 (L3)            32   Declare scalar variables in the smallest possible scope
PWR086  security                      P3 (L3)            17   Prefer array-based notation over pointer-based notation for readability
PWR074  modern                        P2 (L3)            109  Pass only required fields from derived type as arguments to increase code clarity
------- ----------------------------- -------- --------- ---- ---------------------------------------------------------------------------------
Total                                                    1110

RANKING OF OPTIMIZATION CHECKERS

Checker Category Priority AutoFixes #   Title
------- -------- -------- --------- --- ---------------------------------------------------------------------------------------------
RMK015  other    P18 (L1)           264 Tune compiler optimization flags to increase the speed of the code
PWR035  memory   P12 (L1)           96  Avoid non-consecutive array access to improve performance
PWR053  vector   P12 (L1) 72        72  Consider applying vectorization to forall loop
PWR034  memory   P12 (L1)           13  Avoid strided array access to improve performance
PWR023  memory   P12 (L1)           3   Add 'restrict' for pointer function arguments to hint the compiler that vectorization is safe
PWR054  vector   P12 (L1) 2         2   Consider applying vectorization to scalar reduction loop
PWR036  memory   P8 (L2)            14  Avoid indirect array access to improve performance
PWR049  control  P6 (L2)            6   Move iterator-dependent condition outside of the loop
PWR016  memory   P4 (L3)            63  Use separate arrays instead of an Array-of-Structs
PWR018  control  P4 (L3)            1   Call to recursive function within a loop inhibits vectorization
PWR028  control  P3 (L3)            17  Remove pointer increment preventing performance optimization
PWR024  multi    P3 (L3)            3   Loop can be rewritten in OpenMP canonical form
PWR029  control  P3 (L3)            1   Remove integer increment preventing performance optimization
PWR012  memory   P2 (L3)            109 Pass only required fields from derived type as arguments to minimize data movements
RMK010  memory   P0 (L4)            6   Strided memory accesses in the loop body may prevent vectorization
RMK014  memory   P0 (L4)            2   Unpredictable memory accesses in the loop body may prevent vectorization
------- -------- -------- --------- --- ---------------------------------------------------------------------------------------------
Total                     74        672

SUGGESTIONS

  Get a breakdown per file of the Screening Report (--verbose), focusing on one specific checker (--check-id), e.g.:
        codee screening --verbose --check-id RMK015 --compile-commands build/compile_commands.json --db codee.db

264 built files (0 results taken from cache), 0 dependencies (0 reused from cache)
264 target files, 5462 functions, 12980 loops, 248231 SLOCs successfully analyzed (1782 checkers) and 0 non-analyzed files in 1 m 17 s

All the source files were successfully analyzed and 1371 checkers were reported. The different types of checkers reported can be seen in the RANKING section of the output.

3. Run the screening report for specific files

When using Codee for performance optimization, it is important to have the code hotspots identified to target Codee's reports. Let's run the screening report again, restricting the analysis to one of those hotspots:

Codee command
codee screening --compile-commands build/compile_commands.json library/aes.c --db codee.db

Codee output
Date: 2026-03-25 Codee version: 2026.1 License type: Team

Searching Incremental Static Analysis database... Enabled

[1/1] /home/user/codee-demos/C/MbedTLS/library/aes.c (2 entries) ... Cached

SCREENING REPORT

------Number of files------
Total | C C++ Fortran Other
----- | - --- ------- -----
1     | 1 0   0       0

RANKING OF QUALITY CHECKERS

Checker Category                      Priority AutoFixes #  Title
------- ----------------------------- -------- --------- -- -----------------------------------------------------------------------
PWR001  correctness, modern, security P4 (L3)            6  Pass global variables as function arguments
PWR002  correctness, modern, security P3 (L3)            5  Declare scalar variables in the smallest possible scope
PWR086  security                      P3 (L3)            5  Prefer array-based notation over pointer-based notation for readability
------- ----------------------------- -------- --------- -- -----------------------------------------------------------------------
Total                                                    16

RANKING OF OPTIMIZATION CHECKERS

Checker Category Priority AutoFixes #  Title
------- -------- -------- --------- -- ------------------------------------------------------------------------
RMK015  other    P18 (L1)           1  Tune compiler optimization flags to increase the speed of the code
PWR053  vector   P12 (L1) 7         7  Consider applying vectorization to forall loop
PWR036  memory   P8 (L2)            9  Avoid indirect array access to improve performance
PWR028  control  P3 (L3)            5  Remove pointer increment preventing performance optimization
PWR024  multi    P3 (L3)            1  Loop can be rewritten in OpenMP canonical form
RMK010  memory   P0 (L4)            1  Strided memory accesses in the loop body may prevent vectorization
RMK014  memory   P0 (L4)            1  Unpredictable memory accesses in the loop body may prevent vectorization
------- -------- -------- --------- -- ------------------------------------------------------------------------
Total                     7         25

SUGGESTIONS

  Use 'checks' to find out details about the detected checks:
        codee checks --compile-commands build/compile_commands.json library/aes.c --db codee.db

1 built file (1 result taken from cache), 0 dependencies (0 reused from cache)
1 target file, 21 functions, 88 loops, 1657 SLOCs successfully analyzed (41 checkers) and 0 non-analyzed files in 91 ms

Note how the ranking of checkers shown at the bottom lists the different types of checkers reported, ordered by priority. In this case, the screening report indicates that the PWR053 checker was reported 7 times, it has high priority, and Codee also provides AutoFixes for them.

4. Run the checks report

Now we need to see the entire list of occurrences of the PWR053, each one pointing at specific lines of code. We can use Codee's checks report report to obtain such list, including the --check-id PWR053 flag to filter by results of PWR053.

Codee command
codee checks --compile-commands build/compile_commands.json library/aes.c --check-id PWR053 --db codee.db

Codee output
Date: 2026-03-20 Codee version: 2025.4.9-51 License type: Team

Searching Incremental Static Analysis database... Enabled

[1/1] /home/tamara/codee-demos/C/MbedTLS/library/aes.c (2 entries) ... Cached

QUALITY CHECKS REPORT

No actionable items were found

OPTIMIZATION CHECKS REPORT

/home/user/codee-demos/C/MbedTLS/library/aes.c:1048:13 [PWR053] (level: L1): Consider applying vectorization to forall loop
/home/user/codee-demos/C/MbedTLS/library/aes.c:1062:13 [PWR053] (level: L1): Consider applying vectorization to forall loop
/home/user/codee-demos/C/MbedTLS/library/aes.c:1162:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
/home/user/codee-demos/C/MbedTLS/library/aes.c:1169:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
/home/user/codee-demos/C/MbedTLS/library/aes.c:1194:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
/home/user/codee-demos/C/MbedTLS/library/aes.c:1202:9 [PWR053] (level: L1): Consider applying vectorization to forall loop
/home/user/codee-demos/C/MbedTLS/library/aes.c:1211:9 [PWR053] (level: L1): Consider applying vectorization to forall loop

SUGGESTIONS

  Use --check-id and --verbose to focus on specific subsets of checkers, e.g.:
        codee checks --check-id PWR053 --verbose --compile-commands build/compile_commands.json library/aes.c --db codee.db

1 built file (1 result taken from cache), 0 dependencies (0 reused from cache)
1 target file, 21 functions, 88 loops, 1657 SLOCs successfully analyzed (7 checkers) and 0 non-analyzed files in 92 ms

5. Run the checks report in verbose mode

Re-run the checks report with --verbose to get more details for each checker, including the different autofix options:

Codee command
codee checks --compile-commands build/compile_commands.json library/aes.c --check-id PWR053 --verbose --db codee.db

Codee output
Date: 2026-03-25 Codee version: 2026.1 License type: Team

Searching Incremental Static Analysis database... Enabled

[1/1] /home/user/codee-demos/C/MbedTLS/library/aes.c (2 entries) ... Cached

QUALITY CHECKS REPORT

No actionable items were found

OPTIMIZATION CHECKS REPORT

/home/user/codee-demos/C/MbedTLS/library/aes.c:1048:13 [PWR053] (level: L1): Consider applying vectorization to forall loop
  Suggestion: Use 'rewrite' to automatically optimize the code
  Documentation:
    https://open-catalog.codee.com/Checks/PWR053
  AutoFix (choose one option):
    * Using OpenMP pragmas (recommended):
        codee rewrite --check-id pwr053 --variant omp --in-place /home/user/codee-demos/C/MbedTLS/library/aes.c:1048:13 --compile-commands build/compile_commands.json --db codee.db
    * Using GNU pragmas:
        codee rewrite --check-id pwr053 --variant gnu --in-place /home/user/codee-demos/C/MbedTLS/library/aes.c:1048:13 --compile-commands build/compile_commands.json --db codee.db
    * Using Intel pragmas:
        codee rewrite --check-id pwr053 --variant intel --in-place /home/user/codee-demos/C/MbedTLS/library/aes.c:1048:13 --compile-commands build/compile_commands.json --db codee.db
    * Using LLVM compiler pragmas:
        codee rewrite --check-id pwr053 --variant llvm --in-place /home/user/codee-demos/C/MbedTLS/library/aes.c:1048:13 --compile-commands build/compile_commands.json --db codee.db
    * Using combined pragmas, for example (for GNU and Intel pragmas):
        codee rewrite --check-id pwr053 --variant gnu,intel --in-place /home/user/codee-demos/C/MbedTLS/library/aes.c:1048:13 --compile-commands build/compile_commands.json --db codee.db

<...>

1 built file (1 result taken from cache), 0 dependencies (0 reused from cache)
1 target file, 21 functions, 88 loops, 1657 SLOCs successfully analyzed (7 checkers) and 0 non-analyzed files in 91 ms

5. Autofix

Use Codee's autofix capabilities to automatically optimize the code. The recommended rewriting option is to apply vectorization with OpenMP pragmas.

Aditionally, we will remove the loop filter. This way, the autofix will be applied to all the loops reported as vectorizable within aes.c, saving us from having to run codee rewrite for each loop individually.

Codee command
codee rewrite --check-id pwr053 --variant omp --in-place library/aes.c --all --compile-commands build/compile_commands.json --db codee.db

Codee output
Date: 2026-03-25 Codee version: 2026.1 License type: Team

Searching Incremental Static Analysis database... Enabled

[1/1] /home/user/codee-demos/C/MbedTLS/library/aes.c (2 entries) ... Done
[2/1] /home/user/codee-demos/C/MbedTLS/library/aes.c (2 entries) ... Done

Results for file '/home/tamara/codee-demos/C/MbedTLS/library/aes.c':
  Successfully applied AutoFix to the loop at '/home/user/codee-demos/C/MbedTLS/library/aes.c:aes_gen_tables:424:5' [using SIMD]:
      <...>

  Successfully applied AutoFix to the loop at '/home/user/codee-demos/C/MbedTLS/library/aes.c:mbedtls_aes_setkey_enc:568:5' [using SIMD]:
      <...>

  Successfully applied AutoFix to the loop at '/home/user/codee-demos/C/MbedTLS/library/aes.c:mbedtls_aes_crypt_cbc:1048:13' [using SIMD]:
      <...>

  Successfully applied AutoFix to the loop at '/home/user/codee-demos/C/MbedTLS/library/aes.c:mbedtls_aes_crypt_cbc:1062:13' [using SIMD]:
      <...>

Review the source code changes, for instance, using control version systems:

git diff .

diff --git a/library/aes.c b/library/aes.c
index 4afc3c48ae..84516f62d7 100644
--- a/library/aes.c
+++ b/library/aes.c
@@ -421,6 +421,9 @@ static void aes_gen_tables( void )
     /*
      * generate the forward and reverse tables
      */
+    // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+    // Codee: Technique applied: vectorization with 'omp' pragmas
+    #pragma omp simd private(x, y, z)
     for( i = 0; i < 256; i++ )
     {
         x = FSb[i];
@@ -565,6 +568,9 @@ int mbedtls_aes_setkey_enc( mbedtls_aes_context *ctx, const unsigned char *key,
         return( mbedtls_aesni_setkey_enc( (unsigned char *) ctx->rk, key, keybits ) );
 #endif

+    // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+    // Codee: Technique applied: vectorization with 'omp' pragmas
+    #pragma omp simd
     for( i = 0; i < ( keybits >> 5 ); i++ )
     {
         RK[i] = MBEDTLS_GET_UINT32_LE( key, i << 2 );
@@ -1045,6 +1051,9 @@ int mbedtls_aes_crypt_cbc( mbedtls_aes_context *ctx,
             if( ret != 0 )
                 goto exit;

+            // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+            // Codee: Technique applied: vectorization with 'omp' pragmas
+            #pragma omp simd
             for( i = 0; i < 16; i++ )
                 output[i] = (unsigned char)( output[i] ^ iv[i] );

@@ -1059,6 +1068,9 @@ int mbedtls_aes_crypt_cbc( mbedtls_aes_context *ctx,
     {
         while( length > 0 )
         {
+            // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+            // Codee: Technique applied: vectorization with 'omp' pragmas
+            #pragma omp simd
             for( i = 0; i < 16; i++ )
                 output[i] = (unsigned char)( input[i] ^ iv[i] );

@@ -1159,6 +1171,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
             mbedtls_gf128mul_x_ble( tweak, tweak );
         }

+        // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+        // Codee: Technique applied: vectorization with 'omp' pragmas
+        #pragma omp simd
         for( i = 0; i < 16; i++ )
             tmp[i] = input[i] ^ tweak[i];

@@ -1166,6 +1181,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
         if( ret != 0 )
             return( ret );

+        // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+        // Codee: Technique applied: vectorization with 'omp' pragmas
+        #pragma omp simd
         for( i = 0; i < 16; i++ )
             output[i] = tmp[i] ^ tweak[i];

@@ -1191,6 +1209,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,
          * byte of cyphertext we won't steal. At the same time, copy the
          * remainder of the input for this final round (since the loop bounds
          * are the same). */
+        // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+        // Codee: Technique applied: vectorization with 'omp' pragmas
+        #pragma omp simd lastprivate(i)
         for( i = 0; i < leftover; i++ )
         {
             output[i] = prev_output[i];
@@ -1208,6 +1229,9 @@ int mbedtls_aes_crypt_xts( mbedtls_aes_xts_context *ctx,

         /* Write the result back to the previous block, overriding the previous
          * output we copied. */
+        // Codee: Loop modified by Codee (2026-03-25 11:47:12)
+        // Codee: Technique applied: vectorization with 'omp' pragmas
+        #pragma omp simd
         for( i = 0; i < 16; i++ )
             prev_output[i] = tmp[i] ^ t[i];
     }

6. Execution

Compile the code with the optimizations applied by Codee:

cmake --build build -j

And run the benchmark AES_XTS, corresponding to aes.c:

MbedTLS optimized benchmark invocation
./build/programs/test/benchmark aes_xts

Optimized benchmark output
  AES-XTS-128              :     874306 KiB/s,          4 cycles/byte
  AES-XTS-256              :     715399 KiB/s,          4 cycles/byte

Lastly, revert the changes applied by Codee so that the code returns to the original version:

git command
git restore .

Re-compile again, so the binaries return to their original version as well:

cmake --build build -j

And run the benchmark AES_XTS, to compare the results with the ones obtained earlier with the code optimized by Codee:

MbedTLS original benchmark invocation
./build/programs/test/benchmark aes_xts

Original benchmark output
  AES-XTS-128              :     715509 KiB/s,          4 cycles/byte
  AES-XTS-256              :     653419 KiB/s,          5 cycles/byte

Note how Codee's optimization managed to obtain an speedup of 20% between the original version and the one optimized by Codee.

Testing machine: AMD Ryzen 7 7840HS laptop.

Appendix

Compiler Efficiency Report for GCC-11

Codee provides the Compiler Efficiency Report (codee diagnose --compiler-efficiency) for comparing the different optimizations that the compiler performs when changing from the current build settings to the optimizations flags suggested by Codee. In a nutshell, it allows the user to compare the compiler optimizations performed under the current -OX flag with the results using the optimization flag suggested by Codee. E.g: -O2 vs -O3.

Let's try the Compiler Efficiency Report with an example. Invoke codee diagnose --compiler-efficiency to see the different optimizations that the compiler applies to the loops of the function mbedtls_aes_crypt_xts, which is located in the library/aes.c file:

Codee command
codee diagnose --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts

Codee output
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2025-03-20 Codee version: 2025.1.3 License type: Full

[1/1] /user/codee-demos/C/MbedTLS/library/aes.c (2 entries) ... Done

COMPILER EFFICIENCY REPORT

Compiler: gcc-11.4
Optimization flags detected in the build: -fopenmp-simd -O2
Maximum optimization flags suggested by Codee: -ftree-vectorize -O3

Loop                               Current Suggested    Codee optimizations
---------------------------------- ------- ---------    --------------------
/user/codee-demos/C/MbedTLS/library/aes.c
|- mbedtls_aes_crypt_xts:1126:5    n/a     n/a
|- mbedtls_aes_crypt_xts:1127:5    n/a     n/a
|- mbedtls_aes_crypt_xts:1129:5    n/a     n/a
|- mbedtls_aes_crypt_xts:1130:5    n/a     n/a
|- mbedtls_aes_crypt_xts:1131:5    n/a     n/a
|- mbedtls_aes_crypt_xts:1147:5    n/a     n/a
|  |- mbedtls_aes_crypt_xts:1162:9 n/a     no (unrl)    PWR053(1)
|  `- mbedtls_aes_crypt_xts:1169:9 n/a     no (unrl)    PWR053(1)
|- mbedtls_aes_crypt_xts:1194:9    n/a     auto*        PWR053(1)
|- mbedtls_aes_crypt_xts:1202:9    n/a     auto*        PWR053(1), PWR024(1)
`- mbedtls_aes_crypt_xts:1211:9    n/a     auto         PWR053(1)

SUMMARY

Vectorized Current Suggested
---------- ------- ---------
auto       0       3
no         0       2
n/a        11      6
---------- ------- ---------
Total number of loops: 11

The Codee compiler efficiency analysis revealed +27.27% increase in the number of loops optimized by the compilers, switching from the current optimization flags to the ones suggested by Codee (from 0/11 up to 3/11 optimized loops)

LEGEND

  Current: Compiler optimization report with the optimization level used on the build
  Suggested: Compiler optimization report with the optimization level suggested by Codee
  Codee optimizations: list of Codee checkers reported for each loop
    auto: loop automatically vectorized by the compiler
    no: loop not vectorized by the compiler. Could happen for different reasons:
      no (cost): the compiler's cost model recommends so
      no (ctrl): complex control flow inhibits vectorization
      no (dep) : there is (or seems to be) a dependency inhibiting vectorization
      no (prec): potential precision loss if vectorized
      no (vgen): SIMD instruction generator not supported by the compiler
      no (outr): unsupported outer loop
      no (unrl): the loop was fully unrolled by the compiler
      no (call): the loop was replaced by a library call
      no (othr): any other reason
    n/a: no information was provided by the compiler for this loop

SUGGESTIONS

  Use --show-messages to get details on the messages reported by each compiler:
        codee diagnose --show-messages --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts

  Find out the actionable insights related to vectorization:
        codee checks --only-categories vector --verbose --compile-commands build/compile_commands.json library/aes.c:mbedtls_aes_crypt_xts

1 file, 1 function, 11 loops, 1657 LOCs successfully analyzed (6 checkers) and 0 non-analyzed files in 1163 ms

In this example we can see how the compiler changed its optimization behavior when replacing the -O2 flag with -O3.

With -O2 the compiler did not auto-vectorize any of the loops, while with -O3 it vectorized three of the loops (from lines 1194, 1202 and 1211).

Compiler Efficiency Report for GCC-13

Codee command
codee diagnose --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts

Codee output
Configuration file 'build/compile_commands.json' successfully parsed.
Date: 2025-03-20 Codee version: 2025.1.3 License type: Full

[1/1] library/aes.c (2 entries) ... Done

COMPILER EFFICIENCY REPORT

Compiler: gcc-13.2
Optimization flags detected in the build: -fopenmp-simd -O2
Maximum optimization flags suggested by Codee: -ftree-vectorize -O3

Loop                               Current   Suggested    Codee optimizations
---------------------------------- --------- ---------    --------------------
/home/tamara/Escritorio/codee-demos/C/MbedTLS/library/aes.c
|- mbedtls_aes_crypt_xts:1126:5    n/a       n/a
|- mbedtls_aes_crypt_xts:1127:5    n/a       n/a
|- mbedtls_aes_crypt_xts:1129:5    n/a       n/a
|- mbedtls_aes_crypt_xts:1130:5    n/a       n/a
|- mbedtls_aes_crypt_xts:1131:5    n/a       n/a
|- mbedtls_aes_crypt_xts:1147:5    n/a       n/a
|  |- mbedtls_aes_crypt_xts:1162:9 auto      no (unrl)    PWR053(1)
|  `- mbedtls_aes_crypt_xts:1169:9 auto      no (unrl)    PWR053(1)
|- mbedtls_aes_crypt_xts:1194:9    no (othr) auto*        PWR053(1)
|- mbedtls_aes_crypt_xts:1202:9    no (othr) auto*        PWR053(1), PWR024(1)
`- mbedtls_aes_crypt_xts:1211:9    auto      auto         PWR053(1)

SUMMARY

Vectorized Current Suggested
---------- ------- ---------
auto       3       3
no         2       2
n/a        6       6
---------- ------- ---------
Total number of loops: 11

LEGEND

  Current: Compiler optimization report with the optimization level used on the build
  Suggested: Compiler optimization report with the optimization level suggested by Codee
  Codee optimizations: list of Codee checkers reported for each loop
    auto: loop automatically vectorized by the compiler
    no: loop not vectorized by the compiler. Could happen for different reasons:
      no (cost): the compiler's cost model recommends so
      no (ctrl): complex control flow inhibits vectorization
      no (dep) : there is (or seems to be) a dependency inhibiting vectorization
      no (prec): potential precision loss if vectorized
      no (vgen): SIMD instruction generator not supported by the compiler
      no (outr): unsupported outer loop
      no (unrl): the loop was fully unrolled by the compiler
      no (call): the loop was replaced by a library call
      no (othr): any other reason
    n/a: no information was provided by the compiler for this loop

SUGGESTIONS

  Use --show-messages to get details on the messages reported by each compiler:
        codee diagnose --show-messages --compile-commands build/compile_commands.json --compiler-efficiency library/aes.c:mbedtls_aes_crypt_xts

  Find out the actionable insights related to vectorization:
        codee checks --only-categories vector --verbose --compile-commands build/compile_commands.json library/aes.c:mbedtls_aes_crypt_xts

1 file, 1 function, 11 loops, 1657 LOCs successfully analyzed (6 checkers) and 0 non-analyzed files in 1633 ms

In this case, with GCC-13. loops of lines 1162 and 1169 were autovectorized under -O2, but when using -O3 the vectorization was replaced by an aggressive loop unrolling.

On the other hand, loops of lines 1194 and 1202 were not autovectorized before, but when changing to -O3 the compiler started to vectorize them.

The rest of the loops of the given function did not suffer changes between -O2 and -O3.

Getting started​

Walkthrough​

1. Generate the compile_commands.json​

2. Run the global screening report​

3. Run the screening report for specific files​

4. Run the checks report​

5. Run the checks report in verbose mode​

5. Autofix​

6. Execution​

Appendix​

Compiler Efficiency Report for GCC-11​

Compiler Efficiency Report for GCC-13​