Clearing depth only inside scissor area on Intel GPU fails

LendoK · June 12, 2020, 11:48am

Hi, I’m rendering multiple shadow passes in in one depth attachment side by side. Each pass is rendered in different viewport and scissor area. On my Nvidia GPU everything works fine. But on my Intel machine the shadowmap atlas is partially broken.

Looks like only the last pass is correct. There are no validation Errors at all.
The attachment description looks like this:

attachment.description.samples = SampleCount;
attachment.description.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
attachment.description.storeOp = (createinfo.usage & VK_IMAGE_USAGE_SAMPLED_BIT) ? VK_ATTACHMENT_STORE_OP_STORE : VK_ATTACHMENT_STORE_OP_DONT_CARE;
attachment.description.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
attachment.description.stencilStoreOp = attachment.description.storeOp;
attachment.description.format = createinfo.format;
attachment.description.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
attachment.description.finalLayout =VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL;

If I change the finalLayout also to VK_IMAGE_LAYOUT_UNDEFINED, the shadowmap looks good!

But I get a validation error for Updating the DescrptorSet with the wrong layout.

…vkUpdateDescriptorSets() failed write update validation for VkDescriptorSet 0xa600000000a6 with error: Write update to VkDescriptorSet VkDescriptorSet 0xa600000000a6 allocated with VkDescriptorSetLayout VkDescriptorSetLayout 0xa300000000a3 binding #3 failed with error message: Attempted write update to combined image sampler descriptor failed due to: Descriptor update with descriptorType VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER is being updated with invalid imageLayout VK_IMAGE_LAYOUT_UNDEFINED for image VkImage …

johannesugb · June 12, 2020, 12:13pm

Two questions:

createinfo.usage is the same in both cases, right? I.e. (createinfo.usage & VK_IMAGE_USAGE_SAMPLED_BIT) evaluates to the same value, the only thing that changes is the image layout?
What happens if you use the VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL layout?

johannesugb · June 12, 2020, 12:21pm

Another question:

Which layout have you specified for the subpass where the attachment is used (or for the subpasses, if you have more than one)? I.e. which layout have you specified in the VkAttachmentReference for that image? (It should probably be VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL.)

LendoK · June 12, 2020, 12:43pm

Only change is the image layout to VK_IMAGE_LAYOUT_UNDEFINED
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL is also not working, same result as for VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL. All layouts that can be used for combined image sampler descriptor are not working. But also working: VK_IMAGE_LAYOUT_PREINITIALIZED
Yes, I’m using for depthReference layout in the subpass VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL

johannesugb · June 12, 2020, 1:24pm

What about VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL for the finalLayout?

We might need a bit more information about the rest of your application.
How do you sample from the depth buffer after having rendered it? Do you use a sampler? Maybe provide some more code…

johannesugb · June 12, 2020, 1:33pm

If you change the order of the passes, is the one which is drawn last always correct, but not the former ones?

I don’t know how you are rendering the passes. Is there one single vkCmdBeginRenderPass/vkCmdEndRenderPass for ALL of the passes or is there one vkCmdBeginRenderPass/vkCmdEndRenderPass PER pass?

In the former case, you might need some barriers between the calls (for testing, you could use a coarse barrier from VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT -> VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT with memory access VK_ACCESS_MEMORY_WRITE_BIT -> VK_ACCESS_MEMORY_WRITE_BIT).

In the latter case, the clear operations would probably be the reason why it fails. You’ll need VK_ATTACHMENT_LOAD_OP_LOAD for all passes after the first one.

LendoK · June 12, 2020, 2:43pm

Same thing for VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.

I’m using a sampler with depthcompare enabled. Code is based on Vulkan Examples GitHub - SaschaWillems/Vulkan: Examples and demos for the new Vulkan API

VK_CHECK_RESULT(shadowAtlas.framebuffer->createSampler(VK_FILTER_LINEAR, VK_FILTER_LINEAR, VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE, true))

...

VkResult createSampler(VkFilter magFilter, VkFilter minFilter, VkSamplerAddressMode adressMode, bool depthcompare = false)
{
    VkSamplerCreateInfo samplerInfo = initializers::samplerCreateInfo();
    samplerInfo.magFilter = magFilter;
    samplerInfo.minFilter = minFilter;
    samplerInfo.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
    samplerInfo.addressModeU = adressMode;
    samplerInfo.addressModeV = adressMode;
    samplerInfo.addressModeW = adressMode;
    samplerInfo.mipLodBias = 0.0f;
    samplerInfo.maxAnisotropy = 1.0f;
    samplerInfo.minLod = 0.0f;
    samplerInfo.maxLod = 1.0f;
    samplerInfo.borderColor = VK_BORDER_COLOR_FLOAT_OPAQUE_WHITE;
    if(depthcompare)
    {
        samplerInfo.compareEnable = VK_TRUE;
        samplerInfo.compareOp = VK_COMPARE_OP_LESS;
    }
    return vkCreateSampler(vulkanDevice->logicalDevice, &samplerInfo, nullptr, &sampler);
}

...

VPM_Light_Shadow_descriptorSet.writeDescriptorSets.push_back(initializers::writeDescriptorSet(
                    VPM_Light_Shadow_descriptorSet.descriptorSet,
                    VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,
                    3,
                    scene->shadowAtlas.framebuffer->getDescriptor(0)));

...

inline VkDescriptorImageInfo *getDescriptor(uint32_t attachment)
{
    VkDescriptorImageInfo* descriptorImageInfo = new VkDescriptorImageInfo();
    descriptorImageInfo->sampler = sampler;
    descriptorImageInfo->imageView = attachments[attachment].view;
    descriptorImageInfo->imageLayout = attachments[attachment].description.finalLayout;
    return descriptorImageInfo;
}

The order of the passes does not matter. The last one wins.
I have vkCmdBeginRenderPass / vkCmdEndRenderPass for each pass.

johannesugb · June 12, 2020, 3:15pm

The logs say DS=Clear for every “Depth-only Pass”. That should only be DS=Clear for the FIRST pass only and DS=Load for all subsequent passes. At the start of every pass, the depth buffer is cleared. Therefore, only the results of the last pass are visible.

I suspect that using an invalid layout (VK_IMAGE_LAYOUT_UNDEFINED) has the effect that the GPU does not execute the clear operation and hence, the results of all passes are included in the image.

LendoK · June 12, 2020, 4:38pm

Clearing the whole buffer at the beginning is not really an option for me. In case nothing was updated, I don’t want to render every pass in every frame again. I want to be able to use it as a cache. This is why I’m using the scissor areas. Only the scissor area should be cleared, the rest of the buffer should stay. And it works perfectly on a Nvidia chip, with VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL as final layout, no invalid layout there.
It’s weird, as you can see in the image of the atlas, the first two passes are not properly cleared, there are visible artifacts. So the scissor is somehow working, but not fully.

Alfonse_Reinheart · June 12, 2020, 6:02pm

First, you’re not using “scissor” areas. The scissor box is a part of the graphics pipeline state, and during render pass “load op” operations, there is no graphics pipeline state. Load op clearing will clear the renderable portion of the image, as defined by the framebuffer and render pass instance begin information. Now, you can make the renderable area equal to the scissor box you’re going to use in your pipeline, but that’s up to you.

Second, if you want to preserve the contents of the image outside of the renderable area then you must specify the right layout for the image. Any layout transition where the initial layout is UNDEFINED will always make the image’s contents undefined. Entirely. It doesn’t matter if your render pass instance only specifies a subset of the image to render to; it’s the layout transition at the start of the render pass that throws the data away, not the render pass itself.

You got away with it on NVIDIA because NVIDIA hardware doesn’t care about layouts. Intel is more respectful of layouts, so you have to use them correctly.

LendoK · June 12, 2020, 7:08pm

Thanks for clearing things up.
I’m already using the scissor rect for the render area.

renderPassBeginInfo.renderArea.offset.x = scissor.offset.x;
renderPassBeginInfo.renderArea.offset.y = scissor.offset.y;
renderPassBeginInfo.renderArea.extent.width = scissor.extent.width;
renderPassBeginInfo.renderArea.extent.height = scissor.extent.height;

It’s working with:

attachment.description.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL;
attachment.description.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL;

@johannesugb and @Alfonse_Reinheart, thank you very much.

system · December 12, 2020, 7:08pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.