I need to translate a C function into CUDA and OpenMP, both need to have good documentation about why and how the code is translated, for example which part of the function takes more time in the C code.
- 2BSM & 2BXD benchmarkings
- Computational perfomance (GFLOPS)
and test for different number of threads.
Also there should be an study about nested for loops and how they can affect the performance both parallel and secuencial code.