Compute pipeline creation fails in one specific device

Good afternoon,

Resorting to these forums for some needed help.
I thought of posting in StackOverflow like i did in the past for a Vulkan related issue but decided to go straight to the source this time!

We’ve put together some Vulkan code for edge detection in mobile devices. This was done after the first warnings of Renderscript depreciation by Android/Google. Our pipeline is loosely based on their example migration app.

From the mobile devices we have and that support Vulkan, our code works ok except in one particular device - an Huawei P20 lite (Hisilicon Kirin 659 chipset with Mali-T830 GPU). The device itself is running Android 8.0.0. Apart from the code logical checks upon setting up Vulkan, AIDA64 also confirms support and existent of a Vulkan device.

We have a bunch of shaders doing the work (4 shader modules/pipelines) but i’ve noticed that in this device the vkCreateComputePipelines method returns VK_ERROR_INITIALIZATION_FAILED in 3 of them. Based on the one that does not fail i managed to resolve 2 out the 3 that fail by changing the format of VkImages used as sampler input/output. I still haven’t managed to get the remaining one working and i have no clue what might be wrong.

Compilation comp to SPIR-V is done using embedded Android tools (glslc). Android NDK version 22.0.7026061.

I will try to keep the code shown to what matters but happy to share more if it helps!

shader_1x5.comp / shader_5x1.comp - fails to create pipeline

#version 450 core
#pragma shader_stage(compute)

layout (local_size_x_id = 0, local_size_y_id = 1) in;

layout (binding=0) uniform usampler2D inputImage;
layout (binding=1, r8ui) uniform writeonly uimage2D outputImage;

const uint kernel[5] = uint[5](56,250,412,250,56);

void main()
{
...
}

shader_sobel.comp - fails to create pipeline

#version 450 core
#pragma shader_stage(compute)

layout (local_size_x_id = 0, local_size_y_id = 1) in;

layout (binding=0) uniform usampler2D inputImage;
layout (binding=1, rg8ui) uniform writeonly uimage2D outputImage;
layout (binding=2, r32f) uniform writeonly image2D outGx;
layout (binding=3, r32f) uniform writeonly image2D outGy;

layout (push_constant, std140) uniform PushConstant {
	ivec2 thresh;
} constant;

void main()
{
...
}

shader_nms.comp - succeds in creating pipeline

#version 450 core
#pragma shader_stage(compute)

layout (local_size_x_id = 0, local_size_y_id = 1) in;

layout (binding=0) uniform usampler2D inputImage;
layout (binding=1, r32i) uniform writeonly iimage2D outputImage;

void main()
{
...
}

pipelines definition

mBlurHorizontalPipeline = Pipeline::create(mContext.get(), manager, "shaders/shader_5x1.comp.spv", sizeof(uint32_t));
        mBlurVerticalPipeline = Pipeline::create(mContext.get(), manager, "shaders/shader_1x5.comp.spv", sizeof(uint32_t));
        mSobelPipeline = Pipeline::create(mContext.get(), manager, "shaders/shader_sobel.comp.spv", sizeof(uint32_t)*2, true);
        mNmsPipeline = Pipeline::create(mContext.get(), manager, "shaders/shader_nms.comp.spv", sizeof(uint32_t));

Pipeline::creation code

std::unique_ptr<Pipeline> Pipeline::create(const VulkanContext* context, AAssetManager* manager, const char* shader, uint32_t pushConstantSize, bool sobel) {
    auto pipeline = std::make_unique<Pipeline>(context, pushConstantSize, sobel);
    pipeline->createShaderModule(manager, shader);
    pipeline->createPipeline();
    return std::move(pipeline);
}

bool Pipeline::createPipeline() {

    std::vector<VkDescriptorSetLayoutBinding> descriptorSetLayoutBinding = {};
    VkDescriptorSetLayoutBinding input = {};
    input.binding = 0;
    input.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
    input.descriptorCount = 1;
    input.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
    descriptorSetLayoutBinding.push_back(input);
    VkDescriptorSetLayoutBinding output = {};
    output.binding = 1;
    output.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
    output.descriptorCount = 1;
    output.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
    descriptorSetLayoutBinding.push_back(output);
    if (mSobel) {
        output = {};
        output.binding = 2;
        output.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
        output.descriptorCount = 1;
        output.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
        descriptorSetLayoutBinding.push_back(output);
        output = {};
        output.binding = 3;
        output.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
        output.descriptorCount = 1;
        output.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
        descriptorSetLayoutBinding.push_back(output);
    }

    VkDescriptorSetLayoutCreateInfo descriptorSetLayoutDesc = {};
    descriptorSetLayoutDesc.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
    descriptorSetLayoutDesc.bindingCount = static_cast<uint32_t>(descriptorSetLayoutBinding.size());
    descriptorSetLayoutDesc.pBindings = descriptorSetLayoutBinding.data();

    if (VK_SUCCESS != vkCreateDescriptorSetLayout(mContext->device(), &descriptorSetLayoutDesc, nullptr, &mDescriptorSetLayout))
    {
        __android_log_print(ANDROID_LOG_INFO, "pipeline.cpp", "Descriptor Set Layout creation failed\n");
        return false;
    };

    // Allocate descriptor set
    VkDescriptorSetAllocateInfo descriptorSetAllocateInfo = {};
    descriptorSetAllocateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
    descriptorSetAllocateInfo.descriptorPool = mContext->descriptorPool();
    descriptorSetAllocateInfo.descriptorSetCount = 1;
    descriptorSetAllocateInfo.pSetLayouts = &mDescriptorSetLayout;

    if (VK_SUCCESS != vkAllocateDescriptorSets(mContext->device(), &descriptorSetAllocateInfo, &mDescriptorSet))
    {
        __android_log_print(ANDROID_LOG_INFO, "pipeline.cpp", "Descriptor Set allocation info failed\n");
        return false;
    };

    bool hasPushConstant = mPushConstantSize > 0;
    VkPushConstantRange pushConstantRange = {};
    pushConstantRange.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
    pushConstantRange.offset = 0;
    pushConstantRange.size = mPushConstantSize;

    VkPipelineLayoutCreateInfo layoutDesc = {};
    layoutDesc.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
            layoutDesc.setLayoutCount = 1;
    layoutDesc.pSetLayouts = &mDescriptorSetLayout;
    layoutDesc.pushConstantRangeCount = hasPushConstant ? 1u : 0u;
    layoutDesc.pPushConstantRanges = hasPushConstant ? &pushConstantRange : nullptr;

    if (VK_SUCCESS != vkCreatePipelineLayout(mContext->device(), &layoutDesc, nullptr, &mPipelineLayout))
    {
        __android_log_print(ANDROID_LOG_INFO, "pipeline.cpp", "Pipeline layout creation failed\n");
        return false;
    }

    // Create compute pipeline
    const auto workGroupSize = mContext->getWorkGroupSize();
    const uint32_t specializationData[] = { workGroupSize, workGroupSize };
    const std::vector<VkSpecializationMapEntry> specializationMap = {
            // clang-format off
            // constantID, offset,               size
            {0, 0 * sizeof(uint32_t), sizeof(uint32_t)},
            {1, 1 * sizeof(uint32_t), sizeof(uint32_t)},
            // clang-format on
    };
    VkSpecializationInfo specializationInfo = {};
    specializationInfo.mapEntryCount = static_cast<uint32_t>(specializationMap.size());
    specializationInfo.pMapEntries = specializationMap.data();
    specializationInfo.dataSize = sizeof(specializationData);
    specializationInfo.pData = specializationData;

    VkComputePipelineCreateInfo pipelineDesc = {};
    pipelineDesc.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
    pipelineDesc.stage.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
    pipelineDesc.stage.stage = VK_SHADER_STAGE_COMPUTE_BIT;
    pipelineDesc.stage.module = mShader;
    pipelineDesc.stage.pName = "main";
    pipelineDesc.stage.pSpecializationInfo = &specializationInfo;
    pipelineDesc.layout = mPipelineLayout;

    if (VK_SUCCESS != vkCreateComputePipelines(mContext->device(), VK_NULL_HANDLE, 1, &pipelineDesc, nullptr, &mPipeline))
    {
        __android_log_print(ANDROID_LOG_INFO, "pipeline.cpp", "Compute pipeline creation failed\n");
        return false;
    }

    return true;
}

Like i mentioned above, i managed to be able to create the shader_1x5.comp & shader_5x1.comp pipelines by changing the outputs on the shaders from r8ui / uimage2D to r32i / iimage2D (based on the working shader shader_nms.comp). The VkImage formats were obviously changed when the shaders changed. Why would the Vulkan device be ok with 32i but not 8ui?

The shader_sobel.comp uses push constants and actually sets 3 output VkImages but i’ve tried to reduce it to one output (with changed format) and not using push constants with no success.

I just want to get some pointers why this particular device is so problematic (or if indeed our Vulkan code is the problematic one!).

Validation layers do highlight some issues but in posterior code after the pipeline creations - i’m fairly certain it is not related with this? Validation layers do not show any problem in other Mali devices tested (ex: Mali-G710) or on shader playground.

I’m happy to provide a test case if needed.

Looking forward to hearing from the community and appreciate your kind help.

Regards

I can only say something about this part: according to a record on vulkan.gpuinfo.org for your device, the R8_UINT format does not support STORAGE_IMAGE usage, but R32_SINT and R32_UINT both do.

Thank you for pointing that out Carsten! I completely missed that on the vulkan gpuinfo org.

That clearly explains why the 2 shaders (shader_5x1.comp and shader_1x5.comp) start working but i still can’t figure out why the shader_sobel.comp one still doesn’t work.

I used rg32_ui and rg32_i which is VK_FORMAT_R32G32_(U/S)INT for the output image and still no luck. Those support VK_IMAGE_USAGE_STORAGE_BIT | VK_IMAGE_USAGE_SAMPLED_BIT which is what i need for that particular output layout (binding=1) on shader_sobel.comp.