Array shader output variables

Hi all,

I’m attempting to implement a brute force K-Nearest-Neighbors algorithm to help me compute normal vectors of a point cloud.

I compute the distances between points at the vertex shader and then i want to sort them to another shader program. So, i want to packed them into a texture buffer object (Transform Feedback extension).

I didn’t find any tutorial/post on how to accomplish this, so i need your help.

At the “GLSL Language Specification 1.5” says for the vertex shader output variables:

They can only be float, floating-point vectors, matrices, signed or unsigned integers or integer vectors, or arrays or
structures of any these.

Until now i haven’t managed to output an array with integers…Some parts of my testing code(with no compilation errors):

OPENGL SETUP CODE:


int att_computeNormals[] = 
{glGetVaryingLocationNV(program, "neighbors"),};

glTransformFeedbackVaryingsNV(
program, 1, att_computeNormals, GL_SEPARATE_ATTRIBS_NV);

.
.
.
glGenTextures(1,&tex_Neighbors_computed);
glGenBuffers (1,&tbo_Neighbors_computed);
glBindBuffer (GL_TEXTURE_BUFFER_EXT, tbo_Neighbors_computed);
glBufferData (GL_TEXTURE_BUFFER_EXT, sizeof(GLint)*mesh->vertices.size()*N, 0, GL_STATIC_DRAW);


OPENGL NORMAL ESTIMATION CODE:


glUseProgram(program);

glBindBufferBaseNV(GL_TRANSFORM_FEEDBACK_BUFFER, 0, tbo_Neighbors_computed);

.
.
.
glBindTexture (GL_TEXTURE_BUFFER_EXT, tex_Neighbors_computed);
glTexBufferEXT(GL_TEXTURE_BUFFER_EXT, GL_LUMINANCE32I_EXT, tbo_Neighbors_computed);


GLSL VERTEX SHADER


#version 140
#extension GL_EXT_gpu_shader4 : enable
#define N 16

out int neighbors[N];

int main(void)
{	
for(int i=0; i<N; i++) neighbors[i] = i;
}

thanks for your time…

So, i want to packed them into a texture buffer object (Transform Feedback extension).

TF doesn’t deal with textures. You are using it together with TBO extension. Just want to make sure you understand what you are doing.

  1. Be sure you call glTransformFeedbackVaryingsNV before the program is linked.


int att_computeNormals[] = 
{glGetVaryingLocationNV(program, "neighbors"),};

You are trying to get only the first output value here, are you? Then try “neighbors[0]”.

  1. make sure each command returns without GL error.

  2. can you post more initialization code?..

And check your limits on separate attribs and components (see MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS, MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS and MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS).

E.g. a G80 under 3.2 core reports max separate components is
4 = 4 attribs x 1 components,
whereas with interleaved the max is
64 = 4 attribs x 4 vec4 x 4 components.

Thanx for the responses…

0)Yeap that’s correct…

  1. I have made several implementations calling glTransformFeedbackVaryingsNV after the program is linked.

From Transform_feedback Spec :

It is not necessary to (re-)link <program> after calling TransformFeedbackVaryingsNV()

I think that’s the linking problem of ‘glTransformFeedbackVaryingsEXT’.

2)NO, I’m trying to get all the output values. The output TBO will store the N neighbor indices for each vertex.
So how i must define it?

  1. Each command returns without GL error…

  2. More ?? :), the only code is missing is this:


glGenTextures(1,&tex_Neighbors_computed);
glGenBuffers (1,&tbo_Neighbors_computed);
glBindBuffer (GL_TEXTURE_BUFFER_EXT, tbo_Neighbors_computed);
glBufferData (GL_TEXTURE_BUFFER_EXT, sizeof(GLint)*mesh->vertices.size()*N, 0, GL_STATIC_DRAW);

I have made several implementations calling glTransformFeedbackVaryingsNV after the program is linked.

That’s weird. I looked at the 3 forms of transform feedback: NV, EXT, and the core functionality. Only in the NV one does this function affect a program post-link stage (and it’s illegal to call it pre-link). The other two both put the feedback varyings on the same level as attribute index bindings.

Maybe it’s a driver bug and someone at NVIDIA hooked the function up wrong.

More ?? :), the only code is missing is this:

You may want to consider using GL_DYNAMIC_COPY rather than GL_STATIC_DRAW. The COPY means that you don’t intend to directly access the buffer (you won’t call BufferSubData, GetBufferSubData, map it, or call BufferData with a non-NULL pointer). This can let GL do some interesting stuff. The DYNAMIC is because STATIC means that you will only fill the buffer once. Since you’re probably going to do transform feedback multiple times on this buffer, DYNAMIC makes more sense.

That’s weird. I looked at the 3 forms of transform feedback: NV, EXT, and the core functionality. Only in the NV one does this function affect a program post-link stage (and it’s illegal to call it pre-link). The other two both put the feedback varyings on the same level as attribute index bindings.
Maybe it’s a driver bug and someone at NVIDIA hooked the function up wrong.

Yeah that’s weird…i had tried to use the transform_feedback_EXT but linking problems arised…i think there is a discussion about this error it this forum.

You may want to consider using GL_DYNAMIC_COPY rather than GL_STATIC_DRAW. The COPY means that you don’t intend to directly access the buffer (you won’t call BufferSubData, GetBufferSubData, map it, or call BufferData with a non-NULL pointer). This can let GL do some interesting stuff. The DYNAMIC is because STATIC means that you will only fill the buffer once. Since you’re probably going to do transform feedback multiple times on this buffer, DYNAMIC makes more sense.

Maybe i 'll change the usage pattern of the data store in the future. However, it’s not my biggest concern right now.

I, finally, managed to output an array of integer by changing the glTransformFeedbackVaryingsNV parameters :


int att_computeNormals[] ={
glGetVaryingLocationNV(program, "neighbors[0]"),
glGetVaryingLocationNV(program, "neighbors[1]"),
glGetVaryingLocationNV(program, "neighbors[2]"),
glGetVaryingLocationNV(program, "neighbors[3]"),
glGetVaryingLocationNV(program, "neighbors[4]"),
.
.
.
};

glTransformFeedbackVaryingsNV(program, N, att_computeNormals, GL_INTERLEAVED_ATTRIBS_NV);

whereas with interleaved the max is N = 64 (for my GeForce 8800 Ultra).

BUT,

i want to compute the ‘undirected’ normal vector in the same shader. So, i need one VBO to store these values. How can i achieve this, since i have chosen varyings to be written, interleaved, into the buffer object ???

Conclusion:
I want to ouput 2 separate buffers:
1 VBO : stores the normal vectors,
1 TBO : stores the N neighbor indices (interleaved)

[quote]
You are trying to get only the first output value here, are you? Then try “neighbors[0]”.

I, finally, managed to output an array of integer by changing the glTransformFeedbackVaryingsNV parameters
[/QUOTE]
That’s what I meant to do :slight_smile:

I want to ouput 2 separate buffers:
1 VBO : stores the normal vectors,
1 TBO : stores the N neighbor indices (interleaved)

I don’t think you can mix interleaving with separating in TF.

Yeah that’s weird…i had tried to use the transform_feedback_EXT but linking problems arised…i think there is a discussion about this error it this forum.

The EXT and core versions now work correctly at least on Catalyst 9.9+.

I don’t think you can mix interleaving with separating in TF.

That’s bad !!!

Maybe i 'll split the shader…

Thanx all for your help :slight_smile:

I discovered a new problem… :frowning:

when i use a uniform variable in a condition expression, then the body of the loop is not executed (but it works for values < 5000…weird, why does it depend on the value??).
Here is the example code:


out int kNN[N];
uniform int numOfVertices;

void main (){
  int j=0;
  for(int i=0; i<numOfVertices; i++){
    if(something) kNN[i] = j++;
  }
}


I found why this is happening, but how am gonna solve it since i need to pass this value (or compute it inside the shader with textureSizeBuffer()…) ?

Depending on the hardware, > 5000 unrolled loops will make a program too big for the card.

Depending on the hardware, > 5000 unrolled loops will make a program too big for the card.

So, what do you suggest if i have a model with size > 4096 (i think this is the correct limit) ?

Then you will probably have to switch from “brute force” to “somewhat more intelligent force”.
Can you think a more parallel way to compute knn ?
Split up the model in cubes (hierarchical or not), use ping-ponging to process data in pyramid, etc ?

Thanx,

I will probably switch to KD-Tree perspective since i want to handle large models…

One final question not exactly relevant to this post…

Are there any significant differences (speed,memory,limits) between CUDA/OpenCL and Shader implementations for a GPGPU problem?
(because i am trying to solve all my problems with GLSL vertex shader in conjuction of TBOs… :slight_smile: !!!)

Are there any significant differences (speed,memory,limits) between CUDA/OpenCL and Shader implementations for a GPGPU problem?

It’s far too early to tell for performance, as OpenCL implementations are just getting started.

However, one thing is certain: you can do things easier from OpenCL than you can from GLSL. You get things like pointers, scratch memory, recursion (I think), etc. It’s like programming an actual computer, rather than a, you know, shader.

Depending on the hardware, > 5000 unrolled loops will make a program too big for the card.

You can cheat the card by splitting the for loop to 2 for loops (but again u have a limit, not working for values > 40000) :


i=0;
const int LIMIT = 4096;
K = numOfVertices/LIMIT+1;
M = numOfVertices%LIMIT;
for(m=0; m<K; m++){
   L = (m==K-1) ? M : LIMIT;
   for(k=0; k<L; k++,i++){
     ...
   }
}

  1. I finally implemented the point-based normal vector estimation on GPU using the brute force K-Nearest-Neighbors algorithm… (only vertex shaders - transform feedback)

BUT…a delay arised after normal estimation and before rendering. The waiting time is depending on the model’s size. For example, for a model with 2000 vertices i 'll wait 2-3 sec, for a model with 30000 vertices 2-3++ minutes !!

Since something going on with gpu memory, i thought to delete the vertex/texture buffers after their use. But this solution had no result. Here is the pseudocode…



{
.
.
.
  init_Buffers_Textures();
  computeNormals();
  glutPostRedisplay ();
  return ;
}
// GPU Normal estimation
computeNormals(){
  computeKNN(p1); 
  computeNormalValue(p2);
  computeNormalWeights(p3);

delete buffers;
  // Prim's algorithm
  for(int i=0;i<numOfVertices;i++){
    computeMinimumWeight(p4);
    computeMinimumEdge(p5);
    storeMinimum(p6);
  }
delete buffers;
}
// p1,...,p6 : vertex shader programs

Any ideas ???

“dealy arised after normal estimation and before rendering”
Be sure to read : http://www.opengl.org/wiki/Performance
It does not sound you measure time correctly.

You didn’t understand me…

I have returned from computeNormals() and then nothing is happening…(just waiting).

The program doesn’t enter into the render/display function !!!
The time was measured by me…(How can i measure ‘nothing’? :))

As far as I understood from your pseudo code, computeNormals() uses GL calls to compute normals on the GPU.
OpenGL is pipelined, so most GL calls return immediatly, without waiting for actual completion of the command.

You have to call glFinish(); if you want to wait for previous commands to complete, before measuring time. Otherwise you measure nothing.

Did you actually read the link I provided above ?

If really I did not understood you, then you should be clearer in the description of your problem(s).

Good luck.

Thanx for your help so far…

Did you actually read the link I provided above ?

Yes i did…

You have to call glFinish(); if you want to wait for previous commands to complete, before measuring time. Otherwise you measure nothing.

My problem is not measuring time but why this big delay before rending exist?

( I changed the position of the function call and now the program stucks in a glGenTextures() call !!?!?)