Fps drops when passing multiplied vector to gl_FragColor

Hey! First of all, sorry for the name of the topic, I have no idea how to call the problem I am having.

First of, here are my shaders because there’s no way you’ll understand without seeing them.

attribute vec3 a_position; //attributes of vertex
attribute vec3 a_normal; 
attribute vec2 a_texCoord0;

uniform mat4 u_worldTrans;
uniform mat4 u_projViewTrans; //some uniforms like transform of model, and camera matrix
uniform vec3 lightPos; //position of light
varying vec3 toCameraVector; //sending a toCameraVector to fragment shader (vector that points to camera)

varying vec2 v_texCoord0; //sending texCoords
varying vec3 normal; // sending Normal of vertex
varying vec3 toLightNormal; //sending toLightVector (vector that points to light)

void main() {
    v_texCoord0 = a_texCoord0; //setting texCoords for fragment shader
    vec4 worldTrans = u_worldTrans * vec4(a_position, 1.0); //calculate worldTransform of vertex
    gl_Position = u_projViewTrans * worldTrans; //setting the position of vertex
    normal = (u_worldTrans * vec4(a_normal, 0.0)).xyz; //setting the normal in worldCoords
    toLightNormal = lightPos - worldTrans.xyz; // setting toLightVector 
    toCameraVector = (inverse(u_projViewTrans) * vec4(0.0, 0.0, 0.0, 1.0)).xyz - worldTrans.xyz; //setting toCameraVector


#ifdef GL_ES 
precision mediump float; //have no idea what this does but it is needed I think

varying vec2 v_texCoord0; //getting all the info from vertexShader like texCoords, normal, and those 2 directional vectors
varying vec3 normal;
varying vec3 toLightNormal;
varying vec3 toCameraVector;

uniform sampler2D grass; //my textures. grass, dirt, path, flowers are than placed onto the mesh depending on the BlendMap color at corresponding pixel
uniform sampler2D dirt;
uniform sampler2D path;
uniform sampler2D flowers;
uniform sampler2D blendMap;
uniform sampler2D pathSpec; // specular map for path texture

uniform vec3 lightColour; //Colour of the light

float damper = 20; // some settings for customizable specular lighting
float reflectivity = 1;

void main() {
       vec4 blendMapColour = texture(blendMap, v_texCoord0); //get the color of the blendMap at the corresponding pixel to vector
      float grassColour = 1 - (blendMapColour.r + blendMapColour.g + blendMapColour.b); //I want to place grass when there is black color on the BlendMap so I create float that indicates that and then I will multiply grass texture with it.
      vec2 texCoord = v_texCoord0 * 40; // multiply texCoords for tiling the texture
      vec4 grassC = texture(grass, texCoord) * grassColour; //get all the textures at the corresponding pixel and then I multiply those by a color of the blendMap, basically, if there's black color I want to place grass, if red color I want to place //dirt texture and so on. You know, if there's no red color at the pixel of the Blendmap then the dirtTex.rgba are 0 so there's  no dirt texture at all
      vec4 dirtC = texture(dirt, texCoord) * blendMapColour.r;
      vec4 pathC = texture(path, texCoord) * blendMapColour.b;
      vec4 flowersC = texture(flowers, texCoord) * blendMapColour.g;
      vec4 pathS = texture(pathSpec, texCoord) * blendMapColour.b;
      vec4 finalColor = grassC + dirtC + pathC + flowersC; //mix all the textures together 

      vec3 UniteNormal = normalize(normal); //here the diffuse lighting calculations  starts, no need to comment it I think
      vec3 UnitLightNormal = normalize(toLightNormal);
      float nDotl = dot(UniteNormal, UnitLightNormal);
      float brightness = max(nDotl, 0.2);
      vec3 diffuse = lightColour * brightness; //the diffuse lighting calculations ends
      if (pathC.r > 0) { //if there is path texture at the corresponding pixel, I want to calculate the specular light of it so I do that  here
      vec3 UnitCameraVector = normalize(toCameraVector);
      vec3 fromLightNormal = -UnitLightNormal;
      vec3 reflectedVector = reflect(fromLightNormal, UniteNormal);
      float SpecFactor = dot(reflectedVector, UnitCameraVector);
      SpecFactor = max(SpecFactor, 0.0);
      float Dumped = pow(SpecFactor, damper);
      vec3 finalSpec = Dumped * reflectivity * lightColour;
      gl_FragColor = vec4(diffuse, 1.0) * finalColor + vec4(finalSpec, 1.0) * pathS.r; // and now just apply the diffuse light, finalColor(which is now the path Texture basically) and specularlight onto the pixel
      } else {
      gl_FragColor = vec4(diffuse, 1.0) * finalColor; //if there is not path texture at the corresponding pixel but there's grass or whatever else, I dont do specular light.

Basically, my shaders are pretty advanced(they do diffuse lighting, specular lighting, multitexturing) and I noticed quite bad performance with it. The main problem occured when I wanted to optimize it. I tried to “profile” what slows it all down, so I tried to modify everything and found out that if I pass red(vec4(1,0,0,1)) or any other color to gl_FragColor(the color of pixel), so I dont draw the lighting and textures, everything works pretty good. What a weird thing? I still do all the light and multitexturing calculations in fragment Shader, but I dont draw them and FPS are good. This means, there is not any problem in calculations, but there’s problem in displaying the result. Then I played around with it little more and found out that if I just mix all the textures together(without multiplying them by color of the blendMap) and pass it onto pixel color the performance is still good. but then if I multiply the result with diffuse vec. it starts to be worse and worse.

My question is, is this how it should work? I mean, I could understand that displaying the textures may slow down game, but as I mentioned, this doesn’t…

I really appreciate any advice. Thanks in advance!

It’s called dead code elimination. If calculations have no effect on the final output the compiler simply removes them.

Thanks for the information :slight_smile:

But, by to “profile” I mean, I just tried different variations of fragment Shader and then had a look at FPS I am getting. I didn’t use any profiling tool.

It doesn’t matter how you measure performance.

If a calculation has no effect upon the shaders outputs (which can include atomic counters, images, and buffer variables, as well as the shader’s “out” variables), the compiler will simply remove it.

Idk what you mean. The compiler has to do all everything written in code, doesn’t it? You probably mean some profile compiler that profiles your game and after that pops you which functions are expensive, but I just did normal “run” function in the IDE and tried different variations of shaderCode to see what works the best

what GClements means is, your output of the fragment shader is what you set inside gl_FragColor. After all that s what is going to be rendered for that fragment. Now when you do all your calculation and your texturing but that doesnt affect at all the final color you set in the gl_FragColor is like you are not doing anything useful. The compilers are super smart and can see that, and since doesn’t contribute in any way to the final computation it just removes that code from the final exectuable. If you ever had a look to assembly code of c++ code you would be much surprised of how much stuff the compiler does and the final code might look so different. You want to do a quick test? Create an attribute in your code, maybe something like attribute vec3 a_position2; and do nothing else, after you compile the shader try to query the location of that attribute, opengl will return a -1, because It cannot find that attribute, it was not used for anything so the compiler just took it away.

Idk what you mean. The compiler has to do all everything written in code, doesn’t it?

No, it does not.

If you have this code (in C):

int Foo(int x, int y) {return x + y;}

All the outside world knows about this function is that it adds x and y. If you changed the code to this:

int Foo2(int x, int y)
  int temp = x * y;
  float temp2 = (float)(temp * temp);
  double temp3 = temp2 + temp;
  return x + y;

Could anyone in the outside world notice the change? No. All of the changes are local, and nothing outside of this function can get at any of the local variables. Therefore, the difference is purely academic to the caller of the function.

So if you’re writing an optimizing compiler, it is perfectly legal for your compiler to detect that temp3 is not used, and thus cull out the expression that generates it. Then, you detect that temp2 is not used, so you cull out that expression. And then you detect that temp is not used, so you cull it out. Thus, you have reduced Foo2 to Foo.

Your optimizing compiler has done its job by making this code faster. And this is just as true of GLSL as C.

The only difference with GLSL is that the “return value” of a shader instantiation are the shader stage output variables. The principle is the same: any expressions which do not affect the generation of output variables will be removed by the compiler.

Oh I get it now. After all it seems that sampling more than 1 texture causes the slowdowns. Or, to be honest sampling the BlendMap texture causes the biggest slow downs. there’s probably something wrong with the image, or it’s just too big(1024 x 1024px).

Thank you all!

Ensure that all of the textures are mipmapped (have all mipmap levels and use a minification filter with mipmapping)

Minifying a texture without mipmapping can significantly increase texture bandwidth.

To gauge the effect texture bandwidth has on performance, try varying the level-of-detail (LoD) bias for the textures (with glTexParameter(GL_TEXTURE_LOD_BIAS), or with the bias parameter to texture() in the fragment shader). If performance varies significantly with bias, texture bandwidth is a limiting factor; consider reducing the resolution or the bit depth or using a compressed format.

All the images already had mipmapping applied. Now I just applied the mipmaps onto the BlendMap texture and I also reduced the bit depth of all images, but the performance is pretty much the same. It improved the FPS a little bit, though. Anyway I am getting 2000 FPS while looking at it from big distance, closer I am, worse it is. but generally I get 1000 - 1700 FPS while I look at it from close distance. These aren’t good results, are they? Because what I am rendering is a terrain(splitted into chunks), I think adding some buildings, character, skybox, NPCs would destroy it completly because 1000 - 1700 FPS while rendering just terrain is not quite good I think. Testing it on mid-end computer.

It’s not really meaningful to try to extrapolate from such simple scenes, as there may be overheads which are significant for the test case but insignificant when dealing with more complex scenes.

Also, you can’t just take the rendering times for individual elements rendered in isolation and add them to get an overall result. E.g. the early depth test optimisation means that adding elements to the scene will typically result in other elements being occluded, reducing the rendering time for the occluded elements.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.