Hi , I have a part of code to do matrix multiplication by using Cuda & Openmp . I want run three copies of matrix multiplication (same inputs) at parallel on three kernel . Then comparing the results for three kernels .
Three versions of multiplication arrays are as follows:
One of kernel do with shared memory, other without shared memory and last one do like this equation:
A * B "transpose of B" = C.
- Review the previous code
- Run the three copies in parallel "use Openmp". Run each kernels on a separate threads then compare three results and show me the results in Commands windows .After verification of the preliminary results of the program
Create an error in one of the kernels like this:
Change the addition process to subtract:
Cvalue + = ([url removed, login to view] [row * width + e]) * ([url removed, login to view] [e * width + col]);
To ensure that the program can detect the error , change the addition sign to subtract __>
Cvalue - = ([url removed, login to view] [row * width + e]) * ([url removed, login to view] [e * width + col]);
And determine where the error occurred in which kernels one or two or three .
Create multithreading from correct kernel and run it on the kernel where the error located and compared the results with previous results for the same kernel .So that we can determine the type of error .