Does the use of geometric shaders significantly reduce performance?

huanshen · September 17, 2020, 11:56am

Instead of computing in the vertex shader, I used the geometric shader to calculate the three point pixel coordinates of the triangle.What is the cause of the significant drop in performance found?

gl_Position = projection * view * model * gl_in[0].gl_Position;
EmitVertex();
gl_Position = projection * view * model * gl_in[1].gl_Position;
EmitVertex();
gl_Position = projection * view * model * gl_in[2].gl_Position;
EmitVertex();

Dark_Photon · September 17, 2020, 1:28pm

Talk more about your use case: GPU, number of primitives being processed, rasterizing or capturing via transform feedback, etc.

General consensus is that geometry shaders aren’t that fast. Don’t apply them to hundreds of thousands or millions of primitives and expect a big performance win. There are good architectural reasons why this is.

Alfonse_Reinheart · September 17, 2020, 2:14pm

The general rule of thumb for GS’s is this: if you’re using them to optimize performance of a rendering operations, you’re using them wrong. GS’s are not free; they’re not cheap. You’ve lost performance simply by activating the stage. So you shouldn’t use GS’s for something you could have done in a VS.

You should only use a GS if you have something that specifically cannot be done unless you use the specific features of a GS. In your case, that code could easily have been in the VS, so that’s where it ought to go.

GClements · September 17, 2020, 8:36pm

For a start, the geometry shader will be invoked for each primitive. If a vertex is shared between multiple primitives the transformation will be performed once for each primitive containing that vertex. Whereas if you perform the transformation in the vertex shader, the result will be cached and the transformation will only be repeated if the vertex isn’t in the cache.

Incidentally, one of the main reasons geometry shaders incur a performance cost is that each vertex emitted by a geometry shader is distinct, even if the “copies” of a vertex are guaranteed to be identical. So any vertex processing which is performed in the geometry shader will have a significantly greater cost than if it’s performed in the vertex shader.

As Alfonse says: only use a geometry shader for processes which need to use a geometry shader.

huanshen · September 18, 2020, 1:17am

If I want to compute the normals of a triangle in real time, what else can I do?

Alfonse_Reinheart · September 18, 2020, 3:40am

I can’t see why you would need to compute a triangle normal unless you’re generating mesh data. That is, you’re not loading per-generated data. After all, most transforms on mesh data can transform normals just as well as positions (though you may need to provide a special matrix for doing so). So if your mesh was built off-line, then whatever tool which produced that mesh can also produce normals for it.

So let’s assume you have some process executing during your program that is producing position data for a mesh out of whole cloth.

Well, lots of such processes can generate vertex normals along with vertex positions. Vertex normals can be generated for bicubic patches for example with some creative mathematics in the generation process. Subdivision surface and other mechanisms also tend to be able to generate normals alongside positions.

However, if your mechanism for building such positions does not allow for generating normals along with positions, then you’re going to have to compute those normals from the mesh itself. This requires iterating over the triangles around a position and taking the average of the normals for the adjacent triangles (an average weighted by the surface area of each triangle).

You can’t do that in a geometry shader, as there can be quite a few triangles adjacent to a single vertex. Indeed, it’s probably not reasonable to do it in a compute shader either. So if you used an on-GPU process to generate the positions, you’re going to have to read that back on the CPU (including topology data) to compute vertex normals.

If all you want is a per-face normal, you don’t need to compute that in the GS at all. Just compute it in the fragment shader.

Pass the position of the vertex from the VS to the FS. Note that the space of the vertex is important; transform the position into whatever space you want to use for lighting. Then use the FS functions dFdx and dFdy on this position. These functions compute the rate-of-change of the position in the window-space X and Y directions. Of course, the rate-of-change is basically just a direction pointing along the window-space X and Y directions of the triangle. If you take the cross product X cross Y, and normalize the result, you get the triangle’s normal. And you’ll get the same value at every point on the triangle, since the rate-of-change of a value being linearly interpolated is fixed.

huanshen · September 18, 2020, 3:55am

I have Mesh information and only want to compute the normals of the face. The reason I do this is to save memory. Can fragment Shader compute the normals by retrieving the other vertices?

Alfonse_Reinheart · September 18, 2020, 4:43am

Each face has one normal. If you’re trying to compute the one face normal (and thus create a faceted appearance), then I already explained how to do that.

You need to save 4 bytes per vertex this badly?

huanshen · September 18, 2020, 5:45am

I have 14 million triangular meshes, 7 million point clouds.If a single precision array is used to directly store the point cloud and the normals, one triangle corresponds to one norm. each triangle normals are copied three times, requiring approximately 1G of gpu memory.If stored by index, it takes about 0.33G for each vertex to correspond to a normal.But for display effect, I need a triangle to correspond to a normal.So that’s the only way I can do it.

GClements · September 18, 2020, 10:16am

There’s no need to use floats for normals. For lighting, bytes are usually sufficient, shorts or half-float are definitely sufficient. As you only need direction (and not magnitude), you can use two components at the expense of some computation (e.g. lat/lon representation).

For per-face values, only the provoking vertex (by default, the last vertex) needs to have the correct values. If a fragment shader input has the flat qualifier, the value from the provoking vertex will be used for all fragments, rather than the value being interpolated.

However, that does mean that you have to order the indices so that each vertex is the provoking vertex in at most one triangle. It also means that you must have at least as many vertices as triangles; a smooth triangle mesh typically has around twice as many vertices as triangles, so having per-face attributes tends to roughly double the number of vertices required.

But for face normals, the usual solution is to do what Alfonse suggests and compute it in the fragment shader:

vec3 normal = normalize(cross(dFdx(pos), dFdy(pos)));

This avoids the need to use a geometry shader or to have per-face attributes.

huanshen · September 21, 2020, 3:31am

Thanks for your help, I have solved this problem by calculating normals with fragment shaders.

huanshen · September 21, 2020, 8:25am

I found a strange phenomenon, I directly assigned the color in the fragment shader and calculate a light model by normal, the time is the same, why does this phenomenon happen?What are some ways to improve rendering efficiency?

Dark_Photon · September 21, 2020, 1:13pm

@huanshen, I merged your latest post with this thread because it made no sense as a thread starter. You need to provide considerably more context for posts that start a new thread. They should stand alone.

Please see The Forum Posting Guidelines for tips in composing these posts.

Alfonse_Reinheart · September 21, 2020, 1:22pm

That’s not particularly surprising. Basic lighting computations are not exactly expensive. So if your shader is doing something that is actually expensive (like fetching from a texture), your lighting computation can probably mostly hide in the latency caused by the expensive operation.

That depends greatly on exactly what you’re rendering and how you’re trying to render it.

GClements · September 21, 2020, 1:59pm

Earlier, you said:

If you mean 14 million triangles, 7 million vertices, then I suspect that those triangles are very small. In which case, vertex processing and triangle setup could dominate, making rasterisation essentially irrelevant.

Can you simplify the vertex shader? In your initial post, you wrote:

You shouldn’t be composing the model, view and projection matrices in the shader. Typically, you’d calculate either view * model or projection * view * model in the client and pass that as a uniform. If the projection transformation is a perspective projection you need eye-space coordinates for lighting, so you have to keep the projection separate. But you might be able to use the fact that the projection matrix is sparse (most elements are zero) to simplify the calculation (or it might not matter).

Also, ordering the vertices to maximise cache utilisation will reduce the number of vertex shader invocations. E.g. for a large regular grid, don’t store the vertices in “raster scan” order but use “strip mining” (where the minor dimension is small enough to fit an entire row/column of vertices in the cache). For irregular grids optimisation is more complex, but there are libraries to do this.

huanshen · September 22, 2020, 1:32am

My Shader is very simple.

vertShader:

#version 450 core

layout (location = 0) in vec4 aPos;

uniform mat4 model;
uniform mat4 projection;
uniform mat4 view;

out vec3 FragPos;  
void main()
{  
    FragPos = vec3(model * aPos);
    gl_Position = projection * view  * vec4(FragPos,1.0f);
}

fragment shader

#version 450 core

out vec4 fColor;
in vec3 pos;

struct Material {
    vec3 ambient;
    vec3 diffuse;
    vec3 specular;    
    float shininess;
}; 

struct Light {
    vec3 position;

    vec3 ambient;
    vec3 diffuse;
    vec3 specular;
};

in vec3 FragPos;    

uniform vec3 viewPos;
uniform Material material;
uniform Light light;

uniform mat4 model;
uniform mat4 projection;
uniform mat4 view; 

void main()
{   
    vec3 normal = (cross(dFdx(FragPos), dFdy(FragPos)));
	vec3 norm = normalize(mat3(transpose(inverse(model))) * vec3(normal));

	 // ambient
    vec3 ambient = light.ambient * material.ambient;
	 // diffuse 
    vec3 lightDir = normalize(light.position - FragPos);
    float diff = max(dot(norm, lightDir), 0.0);
    vec3 diffuse = light.diffuse * (diff * material.diffuse);
	 // specular
    vec3 viewDir = normalize(viewPos - FragPos);
    vec3 reflectDir = reflect(-lightDir, norm);  
    float spec = pow(max(dot(viewDir, reflectDir), 0.0), material.shininess);
    vec3 specular = light.specular * (spec * material.specular);  
        
    vec3 result = ambient + diffuse + specular;
    fColor = vec4(result, 1.0);
}

huanshen · September 22, 2020, 1:38am

I have tried to calculate projection * View * Model on the CPU and then pass in uniform, but the time consumption is consistent.

So let me try this, and I understand what you’re saying, which is to try to satisfy the cache hit.I have studied Cuda before.

system · October 19, 2021, 7:09pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.