Skip to main content

Codee Advanced Reports

Goal

Walk you through some of the Codee advanced reports, which provide insights to help you manually optimize and parallelize the code.

Overview

Codee provides some reports (codee --diagnose) that are oriented towards expert performance optimization developers. These codee diagnose reports show hints and insights that can be used to see if a loop is parallelizable or not, and it can also help when deciding the right OpenMP/OpenACC pragmas.

These reports become mainly valuable when Codee is not able to generate the pragmas automatically for a given loop, and therefore, the user has no alternative but to write the pragmas themselves. Codee can help the user in this manual process with these codee --diagnose advanced reports.

Codee Diagnose Summary Report

The codee diagnose --summary report shows insights at loop level. Here is an example with MATMUL:

Codee command
codee diagnose --summary --target-arch cpu -- gfortran matmul.f90
Codee output
Date: 2024-09-20 Codee version: 2024.3.2 License type: Full
Compiler invocation: gfortran matmul.f90

[1/1] matmul.f90 ... Done

LOOP SUMMARY REPORT

Loop Analyzable Compute patterns Has AutoFix Checks
------------------------------- ---------- ---------------- ----------- --------------
matmul.f90
|- calculate_matmul:11:7 x forall PWR035
| `- calculate_matmul:12:10 x forall RMK010
`- calculate_matmul:17:7 x forall Available PWR035, PWR050
`- calculate_matmul:18:10 x forall
`- calculate_matmul:19:13 x reduction RMK010

Loop : loop name following the syntax <file>:<function>:<line>:<column>
Analyzable : all C/C++/Fortran language features present in the loop are supported by Codee
Compute patterns : compute patterns found in the loop ('forall', 'scalar' or 'sparse' reduction, 'recurrence', 'dep(endency)')
Has AutoFix : loop can be optimized by Codee
Checks : list of checks reported for the loop

SUGGESTIONS

Get more details about the data scoping of each variable within a loop, e.g.:
codee diagnose --datascoping matmul.f90:11:7 --target-arch cpu -- gfortran matmul.f90

1 file, 1 function, 5 loops successfully analyzed (9 checkers) and 0 non-analyzed files in 20 ms

The information shown in the different columns of the table is explained in the table legend. The most important part of this report is the Compute patterns section, which shows relevant insights about the dependencies among loop iterations. For example, the outermost loop, which is represented as calculate_matmul:17 (loop of line #17, inside the function calculate_matmul) has a forall compute pattern. This means that there are no dependencies among its iterations, which is a crucial aspect to take into account when parallelizing a loop.

For more information about the compute patterns that Codee reports, you can take a look at the Open Catalog: Patterns for Performance Optimization.

Codee Diagnose Datascoping Report

The codee diagnose --datascoping report is another example of the advanced reports that Codee can offer:

Codee command
codee diagnose --datascoping --target-arch cpu -- gfortran matmul.f90
Codee output
Date: 2024-09-20 Codee version: 2024.3.2 License type: Full
Compiler invocation: gfortran matmul.f90

[1/1] matmul.f90 ... Done

Loop Variable Kind Read/Write Temporary Compute Pattern Memory Pattern OpenMP (multi) OpenACC (offload)
------------------------------- -------- ------ ---------- --------- --------------- -------------- ---------------- -----------------
matmul.f90
|- calculate_matmul:11:7
| |-> C()() array wo forall col-major shared(C) copyout(C)
| |-> i scalar rw private(i)
| |-> j scalar rw x private(j)
| `- calculate_matmul:12:10
| |-> C()() array wo forall col-major shared(C) copyout(C)
| |-> i scalar ro shared(i) copyin(i)
| `-> j scalar rw private(j)
`- calculate_matmul:17:7
|-> A()() array ro n/a shared(A) copyin(A)
|-> B()() array ro n/a shared(B) copyin(B)
|-> C()() array rw forall n/a shared(C) copy(C)
|-> i scalar rw private(i)
|-> j scalar rw x private(j)
|-> k scalar rw x private(k)
`- calculate_matmul:18:10
|-> A()() array ro col-major shared(A) copyin(A)
|-> B()() array ro row-major shared(B) copyin(B)
|-> C()() array rw forall n/a shared(C) copy(C)
|-> i scalar ro shared(i) copyin(i)
|-> j scalar rw private(j)
|-> k scalar rw x private(k)
`- calculate_matmul:19:13
|-> A()() array ro col-major shared(A) copyin(A)
|-> B()() array ro row-major shared(B) copyin(B)
|-> C()() array rw reduction n/a shared/atomic(C) shared/atomic(C)
|-> i scalar ro shared(i) copyin(i)
|-> j scalar ro shared(j) copyin(j)
`-> k scalar rw private(k)

Loop : loop name following the syntax <file>:<function>:<line>:<column>
Variable : name of the variable
Kind : variable datatype kind (scalar, pointer, array, dynarray, derived, other)
Read/Write : whether the variable is read-only ("ro"), write-only ("wo"), read-write ("rw") within the loop
Temporary : specifies whether the variable is internal to the loop (i.e. in C/C++ it is declared within the loop body)
Compute Pattern : parallel pattern (read-only, forall, scalar reduction, sparse reduction)
Memory Pattern : access pattern (linear, row-major, column-major, other)
OpenMP (multi) : OpenMP scoping (valid values: shared, private, reduction; special value: shared/atomic when atomic directive is also required)
OpenACC (offload) : OpenACC scoping (valid values: copy, copyin, copyout; special values: copy/atomic, copyout/atomic when atomic directive is also required)

1 file, 1 function, 5 loops successfully analyzed (0 checkers) and 0 non-analyzed files in 19 ms

With this codee diagnose --datascoping report the user can see how the variables behave throughout the different loop scopes (Read/Write, Temporary, Compute Pattern and Memory Pattern columns). Furthermore, Codee can also help to identify the correct data sharing attributes of each variable, for both OpenMP and OpenACC.