OpenGL pipeline stall with CUDA

I’m doing morph animations on GPU using CUDA. Each frame, I update the vertex buffer before rendering:

cudaGraphicsResourceSetMapFlags(cudaResource, cudaGraphicsMapFlagsWriteDiscard);
cudaError err = cudaGraphicsMapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);

//Update Vertex Buffer 

err = cudaGraphicsUnmapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);

After, I render using glDrawRangeElements.
Using Nsight I see that glDrawRangeElements call stalls until GPU begins to actually draw the same mesh.


The lag is independent of the computation I’m doing. As long as the resource is Map / Unmapped the lag is present.
I added cudaStreamSynchronize and cudaDeviceSynchronize to ensure GPU is done and I also double and triple buffered my Vertex Buffer but it didn’t change anything.
I get the lag only when I use the Map the resource using CUDA, otherwise it all runs well.

I’m on windows 7 with NVIDIA GTX 480.
I’ve tried updating the drivers, CUDA versions(5.5 and 6.0) and the GPU (GTX 680) but to no avail.

Any ideas or pointers would be greatly appreciated.