Compute shader shared mem not fast, and usampler2D,ssbo

  1. It takes about the same time between not use shared mem and use shared mem​. i measured the time as below.
t0=getTime();
glDispatchCompute();
glFinish();
printf("DispatchCompute:%.3f ms\n",getTime()-t0);
  1. opengles3.2 compute shader support usampler2D ? i use texelFetch and get all 0 of output as below.
static const char gComputeShader[] = 
    "#version 320 es\n"
    "layout(local_size_x = 12,local_size_y = 12,local_size_z = 1) in;\n"
    "layout(location = 0) uniform highp usampler2D input0;\n"
    "layout(binding = 2, rgba32ui )writeonly uniform highp uimage2D output0;\n"
    "void main()\n"
    "{\n"
" ivec2 loc = ivec2(gl_GlobalInvocationID.xy);\n"
" uvec4 input0_data = texelFetch(input0,loc,0);\n"
        "    imageStore(output0,loc,input0_data);\n"
    "}\n";
     
       glGenTextures(1, &input0_id); 
       glBindTexture(GL_TEXTURE_2D, input0_id);
       glTexImage2D(GL_TEXTURE_2D, 0, GL_R8UI,width, height, 0, GL_RED_INTEGER, GL_UNSIGNED_BYTE, f0);
       int texId = 0;
       glActiveTexture(GL_TEXTURE0 + texId);
       glUniform1i(0, texId);

3.can create an ssbo(Shader Storage Buffer Object ) for compute shader backed by EGL image that accomplished without ever copying ?

please advise

thanks

That is a terrible way to profile strictly on-GPU operations. You’re only profiling API overhead on the CPU, not the actual GPU execution time.

I don’t know if ES 3.2 has timer queries, but if it does, use them.

SSBOs are buffers, not images. So no.

when use ssbo,only copy data to ssbo from cpu buffer? have way to remove copy?

thanks