Need more than max possible size of uniform value arrays in fragment shader

jpmarceaux · November 9, 2019, 6:04am

Hi people, I’m new to the world of OpenGL shaders, and I’m looking for help with a problem I’ve come across.

I’m developing an application with OpenGL shaders and GLSL. Basically, I have one large loop that needs to be executed on each pixel. Because this loop should be executed on the pixels, I have put it in the fragment shader. The problem is that when I go to compile the fragment shader, I get an error because the number of elements in my loop exceeds the maximum possible size for a fragment shader.

Here’s the current code for my fragment shader, which compiles and outputs the right image:

#version 430 core
#define M_PI 3.1415926535897932384626433832795

in vec4 gl_FragCoord;

uniform int DIM_N;
uniform int DIM_M;
uniform vec4 redlogs[256]; // red logons
uniform vec4 greenlogs[256]; // green logons
uniform vec4 bluelogs[256]; // blue logons
uniform vec2 dims;

out vec4 fColor;
void main()
{
    float x = gl_FragCoord.x/dims.x;
    float y = gl_FragCoord.y/dims.y;
    fColor = vec4(0.0, 0.0, 0.0, 1.0);
    for(int m = 0; m < DIM_M; m++){
        for(int n = 0; n < DIM_N; n++){

fColor[0] += redlogs[n+m*DIM_N][0] * sin(n*M_PI*x) * sin(m*M_PI*y);
fColor[0] += redlogs[n+m*DIM_N][1] * sin(n*M_PI*x) * cos(m*M_PI*y);
fColor[0] += redlogs[n+m*DIM_N][2] * cos(n*M_PI*x) * sin(m*M_PI*y);
fColor[0] += redlogs[n+m*DIM_N][3] * cos(n*M_PI*x) * cos(m*M_PI*y);

fColor[1] += greenlogs[n+m*DIM_N][0] * sin(n*M_PI*x) * sin(m*M_PI*y);
fColor[1] += greenlogs[n+m*DIM_N][1] * sin(n*M_PI*x) * cos(m*M_PI*y);
fColor[1] += greenlogs[n+m*DIM_N][2] * cos(n*M_PI*x) * sin(m*M_PI*y);
fColor[1] += greenlogs[n+m*DIM_N][3] * cos(n*M_PI*x) * cos(m*M_PI*y);

fColor[2] += bluelogs[n+m*DIM_N][0] * sin(n*M_PI*x) * sin(m*M_PI*y);
fColor[2] += bluelogs[n+m*DIM_N][1] * sin(n*M_PI*x) * cos(m*M_PI*y);
fColor[2] += bluelogs[n+m*DIM_N][2] * cos(n*M_PI*x) * sin(m*M_PI*y);
fColor[2] += bluelogs[n+m*DIM_N][3] * cos(n*M_PI*x) * cos(m*M_PI*y);


        }
    }
}

So that compiles, but when I change the dimensions of redlogs[256] to something like [1024], the shader will not compile… I’m wondering what I should do instead? Is there a way to use larger arrays on the GPU?

Thanks!

Dark_Photon · November 9, 2019, 5:14pm

You can move the data into textures such as TBOs or ordinary 2D textures, or into buffer objects accessed as SSBOs or UBOs.

With UBOs though, you might run into space constraints as IIRC the limit is ~64k on some GPUs. Whereas SSBOs, TBOs, and textures in general can be quite large. More on accessing those in shaders here: Interface Block (GLSL)

Given that you’re doing 2D access, you may just want to consider storing all this data in a 2D texture. Then you don’t need to do the array indexing in your shader and you can benefit from the texture cache.

All that said, from an efficiency standpoint, you probably should look at other formulations of your algorithm that don’t involve O(n^2) looping over all large images within each fragment. If your shaders take too long, you can trigger GPU driver resets, which will basically kill your program.

jpmarceaux · November 9, 2019, 7:49pm

Thanks Dark_Photon, that sounds like the right direction. I’ll give it a try soon. I agree that it would make sense to reformulate the algorithm, but I’ve had some trouble understanding how I would do that…

In terms of OpenCL and kernels, the other option I see is having a 3D space of shape (Lx, Ly, DimN*DimM), where the third dimension spans those `logon’ images. In this formulation, a first kernel would operate on each point in the 3D space, calculating the the red, green, and blue component compressed from those 4 basis functions with associated amplitudes. After the first kernel finishes, a second kernel would come in and sum values over the 3rd dimension, creating the output 2d texture of shape (Lx, Ly) that could be passed to the fragment shader.

In your opinion, would the compute shader be a proper choice for this implementation? The pipeline would then be something like:

Compute shader 1: calculate sample values in 3d space (stored in an SSBO?) ->
Compute shader 2: compress the 3d buffers into 2d textures ->
Vertex shader: draw quad over viewport ->
Fragment shader: overlay 2d texture on the quad

Does this sound reasonable?

Thanks again, while this GLSL stuff can be challenging, it’s also very rewarding!

system · October 19, 2021, 7:09pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.