Executing OpenCL kernel and OpenGL compute shader in the same program

hterrolle · December 22, 2020, 8:58am

Hi,

I was wondering if running OpenCL Kernel and OpenGL Compute shader would not cause any problem excessing the GPU.

I could Use Different EGL but it look like i cannot share EGL context between C++ JNI and Java OpenGL Renderer thread.

So the Question is. Do i need to create an EGL context for the OpenCL kernel or there would be no problem to send a Kernel instruction than OpenGL instruction in asynchronized way. Or it may have some problem in case or Kernel instruction and OpenGL instruction could be superposed.

Let say i got 10 Kernel how vill be execited 1 by 1 in C. The résult will be send to OpenGL and then i would execute let say 5 compute shader and 5 fragment to display the result using Java.

So at some time it may be possible that Kernel instruction and OpenGL instruction could be interlaced " entrelacer in french ". And that is this case make some problem. Or will i need to do some flush after each kernel instruction and OpenGL instruction.

Sorry for my english. But i hope the question is anderstandable ;))

Dark_Photon · December 22, 2020, 3:59pm

I think I understand you.

To interleave (interlace, entrelacer, etc.), you’re going to want to look at what OpenGL / OpenCL interop synchronization that your target GPUs/GPU drivers support.

Method #1: The absolute most inefficient method is when the drivers don’t support any efficient synchronization and you basically have to flush all of the work out of the GPU pipeline (via clFinish() / glFinish()) before you can flip to queuing commands with the other API.

To use the GPU more efficiently, you do not want to have to do these flushes. In fact, many modern GPUs support parallel graphics and compute queues on the GPU, and this sledgehammer approach to synchronization I believe precludes use of this capability.

Method #2: A more efficient alternative to this is to use pipeline events to synchronize work between CL and GL via:

ARB_cl_event, and
cl_khr_gl_event

So check and see if the GPU driver(s) you’re targeting support these event sharing extensions in both their OpenCL and OpenGL implementations.

Years ago when I was first trying to interleave OpenCL and OpenGL use of the GPU in a single program, I hit this problem. On NVIDIA GeForce GPUs/drivers, I had to (and would still have to AFAIK!) use the first, ugly sledgehammer flush method (clFinish() / glFinish()) of synchronization. This because NVIDIA did not (and still does not AFAICT) support the above OpenCL/OpenGL event sharing extensions in their drivers for most GeForce GPUs:

Apparently these extensions are only supported in their drivers for high-end Quadro and Titan GPUs. More confirmation of that here: in the OpenGL Hardware Database (gpuinfo.org) (type in Quadro and GeForce in the first column).

It would seem, NVIDIA really does not want you to use OpenCL in an OpenGL program on their consumer GPUs, providing their proprietary CUDA compute language/libs as an option instead.

As far as OpenGL Compute Shaders, I don’t know whether they (by contrast) are supported efficiently on NVIDIA GeForce GPUs/drivers, or whether NVIDIA has tied one or both of their hands behind their back too. If anyone does know more about this though, I’d love to hear from you!

hterrolle · December 22, 2020, 4:40pm

Thanks for the answer,

I am working with android phonne ARM mali G72 and there is no

ARB_cl_event, and
cl_khr_gl_event

So i need to find some other trick ;))

If i creta an EGL context on the C part. MAy the that the synchrinization could be done automaticly. Like when i am using WebGL and openGL at the same time. Both got their own EGL context. And their seems to be able to work together. I have already tried it when displying in OpenGL the output frame of a WebGL, jThree, and OpenGL url.
In fact i can use an openGL URL inside my openGL app. And can even perform action on the openGL URL. Like rotation if the openGL url has impleted it.

It is just an idéa. I haven’t try it yet.

Is that sound like possible. Bacause i am sure that 2 EGL context will be synchronized. And than use buffer out from compute shader and send it to C openCL.

It is a trick for sure. But if it can work, why not ;))

hterrolle · December 23, 2020, 3:10pm

Hi,

So i just finish my testing concerning the interleave.

I run at 60 image seconde for OpenGL display and i did not encounter any problem.

here is a part of my log (it may be interresting) :

2020-12-23 15:55:10.141 E/CameraPreview: onPreviewFrame 96
2020-12-23 15:55:10.149 E/ComputeShaderImage: glDispatchCompute
2020-12-23 15:55:10.149 E/ComputeShaderImage: glMemoryBarrier
2020-12-23 15:55:10.150 E/CameraPreview: onDraw 96 flipframe 1
2020-12-23 15:55:10.150 E/CameraPreview: !! dispatchDraw 96 flipframe 0
2020-12-23 15:55:10.152 E/JNIProcessor: 0 gNV21Kernel finished in 10 ms ligth: 18

2020-12-23 15:55:10.155 E/MyGLRenderer: onDrawFrame

2020-12-23 15:55:10.159 E/JNIProcessor: 2 gSuperPixel 1 finished in 16 ms ligth: 18

2020-12-23 15:55:10.165 E/ComputeShaderImage: glDispatchCompute
2020-12-23 15:55:10.165 E/ComputeShaderImage: glMemoryBarrier
2020-12-23 15:55:10.166 E/JNIProcessor: 2 gVision 1 finished in 23 ms ligth: 18

2020-12-23 15:55:10.167 E/MyGLRenderer: onDrawFrame

2020-12-23 15:55:10.172 E/JNIProcessor: 2 gSuperPixel 2 finished in 29 ms ligth: 18
2020-12-23 15:55:10.178 E/JNIProcessor: 2 gVision 2 finished in 36 ms ligth: 18

2020-12-23 15:55:10.181 E/ComputeShaderImage: glDispatchCompute
2020-12-23 15:55:10.182 E/ComputeShaderImage: glMemoryBarrier

2020-12-23 15:55:10.183 E/JNIProcessor: 3 gCompression finished in 40 ms ligth: 18

2020-12-23 15:55:10.184 E/MyGLRenderer: onDrawFrame

2020-12-23 15:55:10.185 E/JNIProcessor: 3 gLignes before enqueueReadBuffer finished in 42 ms ligth: 18
2020-12-23 15:55:10.186 E/JNIProcessor: 4 traitement enqueueReadBuffer finished in 44 ms
2020-12-23 15:55:10.190 E/JNIProcessor: void Extraction_Point: buf.bufligne bleuY: 6289
2020-12-23 15:55:10.190 E/JNIProcessor: void Extraction_Point: buf.bufligne vertY: 4909
2020-12-23 15:55:10.190 E/JNIProcessor: Trait_Raw_Col Rouge 0 indrectRV: 300
2020-12-23 15:55:10.190 E/JNIProcessor: void Extraction_Point: buf.bufligne rougeX: 3052
2020-12-23 15:55:10.192 E/JNIProcessor: Trait_Raw_Col Blanc 2 indrectBV: 501
2020-12-23 15:55:10.193 E/JNIProcessor: Trait_Raw_Col Vert 1 indrectVH: 419
2020-12-23 15:55:10.195 E/JNIProcessor: Trait_Raw_Col Bleue 3 indrectBH: 464
2020-12-23 15:55:10.196 E/JNIProcessor: void Extraction_Point: buf.bufligne blancX: 3669
2020-12-23 15:55:10.197 E/JNIProcessor: 9 traitement forme finished in 54 ms
2020-12-23 15:55:10.199 E/JNIProcessor: 10 traitement enqueueWriteBuffer finished in 56 ms
2020-12-23 15:55:10.199 E/JNIProcessor: 11 traitement enqueueNDRangeKernel finished in 57 ms
2020-12-23 15:55:10.200 E/ComputeShaderImage: glDispatchCompute
2020-12-23 15:55:10.202 E/ComputeShaderImage: glMemoryBarrier

2020-12-23 15:55:10.206 E/MyGLRenderer: onDrawFrame

2020-12-23 15:55:10.207 E/JNIProcessor: 12 traitement bufligne finished in 64 ms
2020-12-23 15:55:10.207 E/JNIProcessor: 13 END JNICALL CameraPreview_runfilter

No conflict and no crash. Happy day ;))

i think i just have trouble with my Kernel time calculation ;))

PS: it is possible to insert color on posted line ?

Dark_Photon · December 23, 2020, 5:11pm

Great!

I hadn’t really gone looking for it. But yes, it appears there is. Try this markup:

[color=blue]Blue text[/color]
[color=#0000ff]Blue text[/color]
[bgcolor=yellow]Yellow background[/bgcolor]
[color=blue][bgcolor=yellow]Blue text on yellow background[/bgcolor][/color]

Blue text
Blue text
Yellow background color
Blue text on yellow background

As you can see, it isn’t honored in multiline code block markup (``` … ``` or “Preformatted text” toolbar button). But it does work in block quote markup.

MathiasMagnus · December 28, 2020, 9:44am

There is a method #1.5:

cl_khr_gl_event is supported, there is an implicit synchronization between your runtimes, that is if you can make that OpenGL context current that affects their state when you acquire/release the resources. You can read more about this in the extensions spec HTML in section 12.6 (sry, can’t post links). Actually synchronizing via events is only required when your engine is structured in such a way that your OpenGL context is current on another thread (render work is being enqueued) or the symbol isn’t in scope to your compute module.

Dark_Photon · December 30, 2020, 2:41am

I think this is what you were referring to.

OpenCL 3.0 Extension Specification (HTML)
OpenCL 3.0 Extension Specification (PDF)

OpenCL 3.0 Extension Spec:

Chapter 12. Creating OpenCL Event Objects from OpenGL Sync Objects

12.1. Overview

This section describes the cl_khr_gl_event extension. … In addition, this extension modifies the behavior of clEnqueueAcquireGLObjects and clEnqueueReleaseGLObjects to implicitly guarantee synchronization with an OpenGL context bound in the same thread as the OpenCL context …

12.6. Additions to the OpenCL Extension Specification

Add following the paragraph describing parameter event to clEnqueueAcquireGLObjects:

"If an OpenGL context is bound to the current thread, then any OpenGL commands which

affect or access the contents of a memory object listed in the mem_objects list, and

were issued on that OpenGL context prior to the call to clEnqueueAcquireGLObjects

will complete before execution of any OpenCL commands following the clEnqueueAcquireGLObjects which affect or access any of those memory objects. If a non-NULL event object is returned, it will report completion only after completion of such OpenGL commands."

Add following the paragraph describing parameter event to clEnqueueReleaseGLObjects:

"`If an OpenGL context is bound to the current thread, then then any OpenGL commands which

affect or access the contents of the memory objects listed in the mem_objects list, and

are issued on that context after the call to clEnqueueReleaseGLObjects

will not execute until after execution of any OpenCL commands preceding the clEnqueueReleaseGLObjects which affect or access any of those memory objects. If a non-NULL event object is returned, it will report completion before execution of such OpenGL commands.`"

Replace the second paragraph of Synchronizing OpenCL and OpenGL Access to Shared Objects with:

“Prior to calling clEnqueueAcquireGLObjects, the application must ensure that any pending OpenGL operations which access the objects specified in mem_objects have completed.If the cl_khr_gl_event extension is supported, then the OpenCL implementation will ensure that any such pending OpenGL operations are complete for an OpenGL context bound to the same thread as the OpenCL context. This is referred to as implicit synchronization. …”

hterrolle · January 3, 2021, 9:58am

It look like on ARM mali G72 there is more than one GPU heart. So they may not need any synchronization. It is like double buffering they may switch from one heart to another. And i think that OpenCL do not need to use Context like OpenGL. I not sure of how it works. But it look like it is working well.

system · October 19, 2021, 6:06pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.