Different results CPU & GPU

I’m new in openCL and have weird problem. hope someone will be able to help me.

I have a kernel that find intersection between polylines and single polygon.
The code is running well. i’m not using local memory, simply global parameters that being passed from method to method inside my open cl code.

The problem is that the CPU & GPU provides different results.

I’ve investigated the code and found out that in some point i have this code

	int length = convert_int(polygon[4]);

	int s = 5;
	int j = length - 1;
	for (uint i = 0; i < length; i++) {
		uint realIdx = s + i * 2;
		uint realInvIdx = s + j * 2;
		j = i;

length variable equals to 4 in my case.

So j variable is set once to 3 (4-1), But inside the loop, for some reason, it is not being changed.
If i set length to 4 explicitly, and not reading it from my buffer than it is working. Although the loop is running 4 times, and printf shows the value 4 as well.

In addition, this is only happening in my CPU, and only when local size is not 1. Which mean that there must be some local memory issues i think… but not by me.

So i put barrier(CLK_LOCAL_MEM_FENCE); doesn’t matter where in my code and it is working better - less differences but still there are. and although, there is no need for it. even if i’m putting the barrier in my first line of the kernel.

I really thought i’m getting into the OpencL… but this is freaking me out…
Thanks in advance

Can you post more code, please? This piece doesn’t really show what exactly you are trying to parallelize here. Unless you’ve hit a really obscure compiler bug, j goes like 3 -> 0 -> 1 -> 2 -> 3 (unused) as it is supposed to.

This is exactly what i expected. but unfortunately i get 3->3->3->3->3…

Here is all the code… check out the first function isPointInPolygon

bool isPointInPolygon(float x, float y, __global float* polygon) {
    bool blnInside = false;
    uint length = convert_uint(polygon[4]);
    int s = 5;
    uint j = length - 1;
    for (uint i = 0; i < length; j = i++) {
        uint realIdx = s + i * 2;
        uint realInvIdx = s + j * 2;
        if (((polygon[realIdx + 1] > y) != (polygon[realInvIdx + 1] > y)) &&
            (x < (polygon[realInvIdx] - polygon[realIdx]) * (y - polygon[realIdx + 1]) / (polygon[realInvIdx + 1] - polygon[realIdx + 1]) + polygon[realIdx]))
            blnInside = !blnInside;
    return blnInside;

bool isRectanglesIntersected(float p_dblMinX1, float p_dblMinY1,
    float p_dblMaxX1, float p_dblMaxY1,
    float p_dblMinX2, float p_dblMinY2,
    float p_dblMaxX2, float p_dblMaxY2) {
    bool blnResult = true;

    if (p_dblMinX1 > p_dblMaxX2 ||
        p_dblMaxX1 < p_dblMinX2 ||
        p_dblMinY1 > p_dblMaxY2 ||
        p_dblMaxY1 < p_dblMinY2) {
        blnResult = false;

    return blnResult;

bool isLinesIntersects(
    double Ax, double Ay,
    double Bx, double By,
    double Cx, double Cy,
    double Dx, double Dy) {

    double  distAB, theCos, theSin, newX, ABpos;

    //  Fail if either line is undefined.
    if (Ax == Bx && Ay == By || Cx == Dx && Cy == Dy) 
        return false;

    //  (1) Translate the system so that point A is on the origin.
    Bx -= Ax; By -= Ay;
    Cx -= Ax; Cy -= Ay;
    Dx -= Ax; Dy -= Ay;

    //  Discover the length of segment A-B.
    distAB = sqrt(Bx*Bx + By*By);

    //  (2) Rotate the system so that point B is on the positive X axis.
    theCos = Bx / distAB;
    theSin = By / distAB;
    newX = Cx*theCos + Cy*theSin;
    Cy = Cy*theCos - Cx*theSin; Cx = newX;
    newX = Dx*theCos + Dy*theSin;
    Dy = Dy*theCos - Dx*theSin; Dx = newX;

    //  Fail if the lines are parallel.
    return (Cy != Dy);

bool isPolygonInersectsPolyline(__global float* polygon, __global float* polylines, uint startIdx) {

    uint polylineLength = convert_uint(polylines[startIdx]);
    uint start = startIdx + 1;

    float x1 = polylines[start];
    float y1 = polylines[start + 1];
    float x2;
    float y2;

    int polygonLength = convert_uint(polygon[4]);
    int polygonLength2 = polygonLength * 2;
    int startPolygonIdx = 5;

    for (int currPolyineIdx = 0; currPolyineIdx < polylineLength - 1; currPolyineIdx++)
        x2 = polylines[start + (currPolyineIdx*2) + 2];
        y2 = polylines[start + (currPolyineIdx*2) + 3];

        float polyX1 = polygon[0];
        float polyY1 = polygon[1];
        for (int currPolygonIdx = 0; currPolygonIdx < polygonLength; ++currPolygonIdx)
            float polyX2 = polygon[startPolygonIdx + (currPolygonIdx * 2 + 2) % polygonLength2];
            float polyY2 = polygon[startPolygonIdx + (currPolygonIdx * 2 + 3) % polygonLength2];

            if (isLinesIntersects(x1, y1, x2, y2, polyX1, polyY1, polyX2, polyY2)) {
                return true;

            polyX1 = polyX2;
            polyY1 = polyY2;

        x1 = x2;
        y1 = y2;

    // No intersection found till now so we check containing
    return isPointInPolygon(x1, y1, polygon);

__kernel void calcIntersections(__global float* polylines, // My flat points array - [pntCount, x,y,x,y,...., pntCount, x,y,... ]
                    __global float* pBounds, // The rectangle bounds of each polyline - set of 4 values [top, left, bottom, right....]
                    __global uint* pStarts, // The start index of each polyline in the polylines array
                    __global float* polygon, // The polygon i want to intersect with - first 4 items are the rectangle bounds [top, left, bottom, right, pntCount, x,y,x,y,x,y....]
                    __global float* output, // Result array for saving the intersections polylines indices
                    __global uint* resCount) // The result count
    int i = get_global_id(0);
    uint start = convert_uint(pStarts[i]);

    if (isRectanglesIntersected(pBounds[i * 4], pBounds[i * 4 + 1], pBounds[i * 4 + 2], pBounds[i * 4 + 3],
        polygon[0], polygon[1], polygon[2], polygon[3])) {

        if (isPolygonInersectsPolyline(polygon, polylines, start)){
            int oldVal = atomic_inc(resCount);
            output[oldVal] = i;

Whose OpenCL driver is this? This is either some really not obvious error in the code or compiler bug. Frankly, I’d suggest you to break your polygon into a set of triangles (triangle strip) and solve an easier problem: if your polyline intersects one of the triangles it intersects the whole polygon. It both can be parallelized more effectively and is faster to accomplish then by trying to solve this mystery.

Whose OpenCL driver is this?
You mean which device am i using ?

my CPU is Intel® Core™ i7-4790 CPU @ 3.6GHz (8 CPUs), ~3.6GHz

From my nature i am stubborn, and when i learn something, i want to be sure that i understand it fully, and not rounding corners.

I will take your note for the optimization under consideration, but in order to “trust” my work, i would have like to understand what am i doing wrong.

This is a simple code, going forward with more complicated algorithms will only leave me doubts, maybe again something will weird & wrong.

The thing is that this is working on single work group but not on more… So my guess is that there is some memory issues here…

I would check more and more till i will bleed from my ears :slight_smile:

That’s the thing with GPU programming. There are no “complicated” algorithms, they don’t map well to parallel architectures. Make the task easier and it will run faster. Regardless, adding more shape to your program would be a nice start. Replace your “all in one buffer” approach when describing a polygon. Create structures

struct Polygon {
  int num_of_points;
  float4 pBounds;
struct Polyline {
  float4 pBounds;
  int num_of_points; 
  int offset;//index of the first point in array

Use vector data type for coordinates. Use __constant modifier for input data.
Your kernel will look something like this:

__kernel void calcIntersections(__constant Polyline* polylines_desc, __constant float2* polylines_data,
                    __constant Polygon polygon, __constant float2* polygon_data,
                    __global int* output) //It will be much easier to use 1 or 0 if i-th polyline intersects polygon or not

Once you restructure your code to get more help from the compiler, you’ll find out there are substantially less ways for you to mess up. Good luck!

Yes i guess… I’m coming from opengl/webgl shaders world… so i’m still stuck with flat arrays and many work-arounds to get my gpgpu done :slight_smile:

I will try to take your advice and structure my code better, although my curiosity for this bug is eating me.

Thanks allot