Hello.
I wrote a simple kernel that multiplies a 3x3 matrix to vectors (x,y,1).
The kernel works good if I set the matrix values with simple ones, like
[1.0 2.0 3.0]
[4.0 5.0 6.0]
[7.0 8.0 9.0]
My kernel also works well when I tested it on the CPU.
However, when I set the matrix values like below, I get wrong results.
[0.000000 0.109586 1068.300049]
[41760.031250 0.438342 2670.750000]
[83520.062500 0.767098 4273.200195]
For example,
For a vector: (15, 0, 1)
GPU: 1068.300049, 629071.250000, 1257074.125000
True value: 1068.300049, 629071.250000, 1257074.250000
Diff: 0.000000, 0.000000, -0.125000
For: (124, 0, 1)
GPU: 1068.300049, 5180914.500000, 10360761.000000
True value: 1068.300049, 5180915.000000, 10360761.000000
Diff: 0.000000, -0.500000, 0.000000 --> Large errors.
The errors are not consistent and it is unpredictable.
Anybody knows why this happens ?? Please give me a clue for this.
I attached my kernel here.
struct my_vec4 {
float x;
float y;
float z;
float w;
};
typedef struct my_vec4 MyVec4;
//----------------------------------------------------
__kernel void compute_ep_lines(
__global MyVec4 *g_dst,
__constant float *c_fmat, //--> 3x3 matrix
int N)
{
// just get a global id and use them as a vector
int x = get_global_id(0) ;
int y = get_global_id(1) ;
int index = y * N + x ;
float e1 = c_fmat[0] * (float)x + c_fmat[1] * (float)y + c_fmat[2] ;
float e2 = c_fmat[3] * (float)x + c_fmat[4] * (float)y + c_fmat[5] ;
float e3 = c_fmat[6] * (float)x + c_fmat[7] * (float)y + c_fmat[8] ;
// assign result to global mem
g_dst[index].x = e1 ;
g_dst[index].y = e2 ;
g_dst[index].z = e3 ;
}