Access texture in compute shader frequently

Hello, I want to use compute shader to speed up calculation of every voxel in the volume, which is stored as a 3d-texture. But something weird happened.

  • VS 2019
  • GLFW 3.3.1
  • OpenGL Core 4.6

I’ve created some functions in the compute shader as follow:

vec2 evaluateValuePerTex(layout(rgba32f) readonly image3D kernelTex, ivec3 texCoord, uvec3 p)
{
	ivec3 coordMean = texCoord;
	ivec3 coordVar = texCoord + ivec3(0, 0, 2 * 3 * 4);

	vec4 means = imageLoad(kernelTex, coordMean);
	vec4 vars = imageLoad(kernelTex, coordVar);

	float binWeight = means.w;
	float kernelValue = vars.w * gaussian3D(means.xyz, vars.xyz, p);

	return vec2(binWeight, kernelValue);
}

float evaluateProb(uvec3 p, ivec3 texCoord)
{
	vec2 V1 = evaluateValuePerTex(tex_GMMKernel1, texCoord, p);
	vec2 V2 = evaluateValuePerTex(tex_GMMKernel2, texCoord, p);
	vec2 V3 = evaluateValuePerTex(tex_GMMKernel3, texCoord, p);
	vec2 V4 = evaluateValuePerTex(tex_GMMKernel4, texCoord, p);

	return V1.x * V1.y + V2.x * V2.y + V3.x * V3.y + V4.x * V4.y;
}

It returns the value reconstructed by four texture images tex_GMMKernel1, tex_GMMKernel2, tex_GMMKernel3 and tex_GMMKernel4, and I’m sure that single texture access value is correct, and nothing wrong with image binding.

And I call glDispatchCompute as follow:

	CompShader reconCompShader("../resources/shaders/reconstruct_iter_comp.glsl");

	for (int i = 0; i < KernelNum; ++i)
		glBindImageTexture(i, texSet[i]->getID(), 0, GL_TRUE, 0, GL_READ_ONLY, GL_RGBA32F);

	glBindImageTexture(KernelNum + 0, texVolume.getID(), 0, GL_TRUE, 0, GL_READ_WRITE, GL_RG32F);
	glBindImageTexture(KernelNum + 1, texCount.getID(), 0, GL_FALSE, 0, GL_READ_WRITE, GL_R16UI);

	reconCompShader.use();

	glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT | GL_SHADER_STORAGE_BARRIER_BIT);

	for (int i = 0; i < 256; ++i)
	{
		std::cout << "channel " << i << "\r";
		glDispatchCompute(resolution.x, resolution.y, resolution.z);
	}

	std::cout << std::endl << "Calculate finished..." << std::endl;

but the result doesn’t match what the code should get using pure C++. Please notice the function

float evaluateProb(uvec3 p, ivec3 texCoord)

the four similar lines of function call get V1, V2, V3 and V4, but when the function returns

V1.x*(V1.y + V2.y + V3.y + V4.y)

it acts like return the value V1.x * V4.y, and also following results happened but I don’t know why:

V1.x*(V1.y + V2.y + V3.y + V4.y)    // similar to V1.x * V4.y
V1.x*(V1.y + V2.y + V3.y)    // similar to V1.x * V3.y
V1.x*(V1.y + V2.y)    // similar to V1.x * V2.y

so I want to know why, I am certainly sure there’re something wrong when I access textures frequently in function evaluateProb. Maybe there’re something else while accessing textures? I’d be honored to get your help, thanks!

It’s not 100% clear what your code is doing or how you verify what it’s doing. But I find this to be somewhat dubious:

glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT | GL_SHADER_STORAGE_BARRIER_BIT);

Not so much because of what it is, but because of where it is. At that point in the code, you just bound some images and the program. You haven’t done anything yet to require a barrier. Also, I don’t see any usage of SSBOs in the code you’ve shown, so I don’t know why that barrier is there.

Plus, I see no barrier after the dispatch calls. So how do you ensure that whatever process you’re using to read the data can read it?

Furthermore, it’s not clear how your dispatch loop is supposed to work. You make 256 dispatch calls, all with the same program and number of work groups, with zero uniform or other changes. As such, each of these 256 calls should do the exact same thing, reading the exact same data and producing the exact same results.

The only way this would not be the case is if a dispatch call were expecting to read data written by a prior dispatch call. But it can’t do that without issuing a barrier between the dispatches.

Thanks for replying, It’s my fault not to paste full code here. Here’s my compute shader code.

// reconstruct using 4 3d textures, update each channel

#version 460 core

/// Working group set
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

/// image-binding set
layout(binding = 0, rgba32f)	uniform readonly image3D tex_GMMKernel1;
layout(binding = 1, rgba32f)	uniform readonly image3D tex_GMMKernel2;
layout(binding = 2, rgba32f)	uniform readonly image3D tex_GMMKernel3;
layout(binding = 3, rgba32f)	uniform readonly image3D tex_GMMKernel4;

layout(binding = 4, rg32f)		uniform coherent image3D tex_Volume;
layout(binding = 5, r16ui)		uniform coherent uimage1D tex_Count;


uint XGap[5], YGap[4], ZGap[3];
float PI = 3.141592653589793238462643383279;

/// aux-function set
void Init(uvec3 resolution)
{
	uint dx = resolution.x / 4;
	uint dy = resolution.y / 3;
	uint dz = resolution.z / 2;

	XGap[0] = 0;
	XGap[1] = dx;
	XGap[2] = dx * 2;
	XGap[3] = dx * 3;
	XGap[4] = resolution.x;

	YGap[0] = 0;
	YGap[1] = dy;
	YGap[2] = dy * 2;
	YGap[3] = resolution.y;

	ZGap[0] = 0;
	ZGap[1] = dz;
	ZGap[2] = resolution.z;
}

uvec3 calBrickRes(uint index)
{
	uvec3 resolution = uvec3(0);

	uint I = index;
	uint iz = I / 12;
	I -= iz * 12;
	resolution.z = ZGap[iz + 1] - ZGap[iz];

	uint iy = I / 4;
	I -= iy * 4;
	resolution.y = YGap[iy + 1] - YGap[iy];

	uint ix = I / 1;
	I -= ix * 1;
	resolution.x = XGap[ix + 1] - XGap[ix];

	return resolution;
}

uvec4 calBrickOffsetNIndex(uvec3 p)
{
	uvec4 offsetNIdx = uvec4(0);
	offsetNIdx.xyz = p;

	uint nx = 0, ny = 0, nz = 0;
	if (p.z > ZGap[1])	nz++;
	if (p.y > YGap[1])	ny++;
	if (p.y > YGap[2])	ny++;
	if (p.x > XGap[1])	nx++;
	if (p.x > XGap[2])	nx++;
	if (p.x > XGap[3])	nx++;

	offsetNIdx.w = nz * 12 + ny * 4 + nx;

	offsetNIdx.xyz = offsetNIdx.xyz - 
		uvec3(XGap[nx], YGap[ny], ZGap[nz]);

	return offsetNIdx;
}

float gaussian1D(float mean, float var, float x)
{
	float v = 1.0f / sqrt(2.0f * PI * var);
	return v * exp(-0.5 * (x - mean) * (x - mean) / var);
}

float gaussian3D(vec3 means, vec3 vars, uvec3 p)
{
	float vx = gaussian1D(means.x, vars.x, float(p.x));
	float vy = gaussian1D(means.y, vars.y, float(p.y));
	float vz = gaussian1D(means.z, vars.z, float(p.z));
	return vx * vy * vz;
}

vec2 evaluateValuePerTex(layout(rgba32f) readonly image3D kernelTex, ivec3 texCoord, uvec3 p)
{
	ivec3 coordMean = texCoord;
	ivec3 coordVar = texCoord + ivec3(0, 0, 2 * 3 * 4);

	vec4 means = imageLoad(kernelTex, coordMean);
	vec4 vars = imageLoad(kernelTex, coordVar);

	float binWeight = means.w;
	float kernelValue = vars.w * gaussian3D(means.xyz, vars.xyz, p);

	return vec2(binWeight, kernelValue);
}

float evaluateProb(uvec3 p, ivec3 texCoord)
{
	vec2 V1 = evaluateValuePerTex(tex_GMMKernel1, texCoord, p);
	vec2 V2 = evaluateValuePerTex(tex_GMMKernel2, texCoord, p);
	vec2 V3 = evaluateValuePerTex(tex_GMMKernel3, texCoord, p);
	vec2 V4 = evaluateValuePerTex(tex_GMMKernel4, texCoord, p);

	return V1.x * V1.y + V2.x * V2.y + V3.x * V3.y + V4.x * V4.y;
}

/// Entry Point, main function
void main()
{
	// Initialization
	uint B = 8;

	ivec3 resolution = ivec3(gl_NumWorkGroups.xyz);
	ivec3 samplePoint = ivec3(gl_GlobalInvocationID.xyz);

	Init(resolution);

	uint channelIdx = imageLoad(tex_Count, 0).x;	// iter

	// step 1. find which brick the sample point locates at
	uvec4 brickInfo = calBrickOffsetNIndex(samplePoint);
	uint brickIdx = brickInfo.w;

	// step 2. find which block the sample point belongs to
	uvec3 brickOffsetPos = brickInfo.xyz;
	uvec3 sampP = brickOffsetPos % uvec3(B);
	uvec3 brickResolution = calBrickRes(brickIdx);

	uvec3 brickBlockRes = brickResolution / uvec3(B);
	uvec3 brickBlockPos = brickOffsetPos / uvec3(B);

	if (brickOffsetPos.x >= brickBlockRes.x * B)
	{
		sampP.x += B;
		brickBlockPos.x--;
	}

	if (brickOffsetPos.y >= brickBlockRes.y * B)
	{
		sampP.y += B;
		brickBlockPos.y--;
	}

	if (brickOffsetPos.z >= brickBlockRes.z * B)
	{
		sampP.z += B;
		brickBlockPos.z--;
	}

	uint blockIdx = brickBlockPos.z * brickBlockRes.y * brickBlockRes.x +
		brickBlockPos.y * brickBlockRes.x +
		brickBlockPos.x;

	// step 3. calculate value
	ivec3 texCoord = ivec3(uvec3(blockIdx, channelIdx, brickIdx));

	float P = evaluateProb(sampP, texCoord);
	float volumeProb = imageLoad(tex_Volume, samplePoint).y;

	barrier();

	// step 4. update volume
	if (P > volumeProb)
		imageStore(tex_Volume, samplePoint, vec4(float(channelIdx), P, 0.0f, 0.0f));
	
	// step 5. update count
	if (samplePoint == ivec3(0))
		imageStore(tex_Count, 0, uvec4(channelIdx + 1));
}

I looped for 256 times, and increase texture tex_Count in each time. Because it’s time-consuming if I use loop in shader, I call glDispatchCompute in the C++ function instead of using for in glsl function.

I used to add glMemoryBarrier right after the call glDispatchCompute, but the results remain the same. I don’t know where it should be so I left it there.

Note that other invocations within the global work group could get either the original value or the incremented value. I suggest having the application code store the iteration number in a uniform variable.

Well it seems that I can’t use user-defined variables as uniform input values in compute shaders.

Compute shaders cannot have input variables (with the “in” qualifier). They can still access uniform variables (not limited to textures and images), uniform blocks and buffer variables (in SSBOs).

Thanks. I use uniform uint instead of tex_Count, but the result remains the same.

Well finally I found out where the bugs were. I forgot to initialize data in four textures, and for every single work thread in compute shader, functions gaussian() would return ‘nan’ or '-inf`. Old values are deprecated while calculate with subsequent textures. Luckily it’s not a problem about OpenGL.

There’s no such thing as a “uniform input value”. There are uniforms and there are inputs, but a variable can’t be both.