Codee Advanced Reports
Walk you through some of the Codee advanced reports, which provide insights to help you manually optimize and parallelize the code.
Overview
Codee provides some reports (codee --diagnose
) that are oriented towards
expert performance optimization developers. These codee diagnose
reports show
hints and insights that can be used to see if a loop is parallelizable or not,
and it can also help when deciding the right OpenMP/OpenACC pragmas.
These reports become mainly valuable when Codee is not able to generate the
pragmas automatically for a given loop, and therefore, the user has no
alternative but to write the pragmas themselves. Codee can help the user in
this manual process with these codee --diagnose
advanced reports.
Codee Diagnose Summary Report
The codee diagnose --summary
report shows insights at loop level. Here is an
example with
MATMUL:
codee diagnose --summary --target-arch cpu -- gfortran matmul.f90
Date: 2024-09-20 Codee version: 2024.3.2 License type: Full
Compiler invocation: gfortran matmul.f90
[1/1] matmul.f90 ... Done
LOOP SUMMARY REPORT
Loop Analyzable Compute patterns Has AutoFix Checks
------------------------------- ---------- ---------------- ----------- --------------
matmul.f90
|- calculate_matmul:11:7 x forall PWR035
| `- calculate_matmul:12:10 x forall RMK010
`- calculate_matmul:17:7 x forall Available PWR035, PWR050
`- calculate_matmul:18:10 x forall
`- calculate_matmul:19:13 x reduction RMK010
Loop : loop name following the syntax <file>:<function>:<line>:<column>
Analyzable : all C/C++/Fortran language features present in the loop are supported by Codee
Compute patterns : compute patterns found in the loop ('forall', 'scalar' or 'sparse' reduction, 'recurrence', 'dep(endency)')
Has AutoFix : loop can be optimized by Codee
Checks : list of checks reported for the loop
SUGGESTIONS
Get more details about the data scoping of each variable within a loop, e.g.:
codee diagnose --datascoping matmul.f90:11:7 --target-arch cpu -- gfortran matmul.f90
1 file, 1 function, 5 loops successfully analyzed (9 checkers) and 0 non-analyzed files in 20 ms
The information shown in the different columns of the table is explained in the
table legend. The most important part of this report is the Compute patterns
section, which shows relevant insights about the dependencies among loop
iterations. For example, the outermost loop, which is represented as
calculate_matmul:17
(loop of line #17, inside the function
calculate_matmul
) has a forall
compute pattern. This means that there are
no dependencies among its iterations, which is a crucial aspect to take into
account when parallelizing a loop.
For more information about the compute patterns that Codee reports, you can take a look at the Open Catalog: Patterns for Performance Optimization.
Codee Diagnose Datascoping Report
The codee diagnose --datascoping
report is another example of the advanced
reports that Codee can offer:
codee diagnose --datascoping --target-arch cpu -- gfortran matmul.f90
Date: 2024-09-20 Codee version: 2024.3.2 License type: Full
Compiler invocation: gfortran matmul.f90
[1/1] matmul.f90 ... Done
Loop Variable Kind Read/Write Temporary Compute Pattern Memory Pattern OpenMP (multi) OpenACC (offload)
------------------------------- -------- ------ ---------- --------- --------------- -------------- ---------------- -----------------
matmul.f90
|- calculate_matmul:11:7
| |-> C()() array wo forall col-major shared(C) copyout(C)
| |-> i scalar rw private(i)
| |-> j scalar rw x private(j)
| `- calculate_matmul:12:10
| |-> C()() array wo forall col-major shared(C) copyout(C)
| |-> i scalar ro shared(i) copyin(i)
| `-> j scalar rw private(j)
`- calculate_matmul:17:7
|-> A()() array ro n/a shared(A) copyin(A)
|-> B()() array ro n/a shared(B) copyin(B)
|-> C()() array rw forall n/a shared(C) copy(C)
|-> i scalar rw private(i)
|-> j scalar rw x private(j)
|-> k scalar rw x private(k)
`- calculate_matmul:18:10
|-> A()() array ro col-major shared(A) copyin(A)
|-> B()() array ro row-major shared(B) copyin(B)
|-> C()() array rw forall n/a shared(C) copy(C)
|-> i scalar ro shared(i) copyin(i)
|-> j scalar rw private(j)
|-> k scalar rw x private(k)
`- calculate_matmul:19:13
|-> A()() array ro col-major shared(A) copyin(A)
|-> B()() array ro row-major shared(B) copyin(B)
|-> C()() array rw reduction n/a shared/atomic(C) shared/atomic(C)
|-> i scalar ro shared(i) copyin(i)
|-> j scalar ro shared(j) copyin(j)
`-> k scalar rw private(k)
Loop : loop name following the syntax <file>:<function>:<line>:<column>
Variable : name of the variable
Kind : variable datatype kind (scalar, pointer, array, dynarray, derived, other)
Read/Write : whether the variable is read-only ("ro"), write-only ("wo"), read-write ("rw") within the loop
Temporary : specifies whether the variable is internal to the loop (i.e. in C/C++ it is declared within the loop body)
Compute Pattern : parallel pattern (read-only, forall, scalar reduction, sparse reduction)
Memory Pattern : access pattern (linear, row-major, column-major, other)
OpenMP (multi) : OpenMP scoping (valid values: shared, private, reduction; special value: shared/atomic when atomic directive is also required)
OpenACC (offload) : OpenACC scoping (valid values: copy, copyin, copyout; special values: copy/atomic, copyout/atomic when atomic directive is also required)
1 file, 1 function, 5 loops successfully analyzed (0 checkers) and 0 non-analyzed files in 19 ms
With this codee diagnose --datascoping
report the user can see how the
variables behave throughout the different loop scopes (Read/Write
,
Temporary
, Compute Pattern
and Memory Pattern
columns). Furthermore,
Codee can also help to identify the correct data sharing attributes of each
variable, for both OpenMP and OpenACC.