How to implement parallel BFS for DAG (weight of all edges was 1) on OpenGL ComputeShader?

How to implement parallel BFS for not weighted (weight of all edges was 1) directed acyclic graph on OpenGL ComputeShader? For some reason I can not use some thing like CUDA or D3D12 Vulkan Work Graph but only OpenGL4.6. I am finding a method to traverse the DAG on OpenGL ComputeShader to speedup my program. Is there any method for this? Thank you.

Presumably:

Traversing a DAG in a OpenGL GLSL Compute Shader is one thing.

But your mention of “CUDA or D3D12 Vulkan Work Graph” implies another (GPU generated execution from task graphs).

Perhaps if you explained in more detail what you’re trying to accomplish and specifically what bottleneck you’re hitting, someone here smarter than I can give you some good suggestions.

It could be that your performance problem could be alleviated by mesh shaders, NV_command_list, or a garden variety compute shader operating on a tree as data. Or possibly via Vulkan interop, opening up access to things like VK_NV_device_generated_commands and/or AMD Vulkan Work Graphs, which provide for GPU-generated work beyond basic DrawIndirect capability.