S3TC BC1 compression layout / Using linear image layout for compressed images

Silverlan · March 13, 2016, 8:33am

I’ve been trying to load compressed BC1/DXT1 images, but so far I haven’t had much luck.
Here are the relevant snippets from the specification:

https://www.khronos.org/registry/vulkan/specs/1.0/xhtml/vkspec.html#resources-images:

For images created with linear tiling, rowPitch, arrayPitch and depthPitch describe the layout of the subresource in linear memory. For uncompressed formats, rowPitch is the number of bytes between texels with the same x coordinate in adjacent rows (y coordinates differ by one). arrayPitch is the number of bytes between texels with the same x and y coordinate in adjacent array layers of the image (array layer values differ by one). depthPitch is the number of bytes between texels with the same x and y coordinate in adjacent slices of a 3D image (z coordinates differ by one). Expressed as an addressing formula, the starting byte of a texel in the subresource has address:
// (x,y,z,layer) are in texel coordinates
address(x,y,z,layer) = layer*arrayPitch + z*depthPitch + y*rowPitch + x*texelSize + offset
For compressed formats, the rowPitch is the number of bytes between compressed blocks in adjacent rows. arrayPitch is the number of bytes between blocks in adjacent array layers. depthPitch is the number of bytes between blocks in adjacent slices of a 3D image.
// (x,y,z,layer) are in block coordinates
address(x,y,z,layer) = layer*arrayPitch + z*depthPitch + y*rowPitch + x*blockSize + offset;
arrayPitch is undefined for images that were not created as arrays. depthPitch is defined only for 3D images.

For color formats, the aspectMask member of VkImageSubresource must be VK_IMAGE_ASPECT_COLOR_BIT. For depth/stencil formats, aspect must be either VK_IMAGE_ASPECT_DEPTH_BIT or VK_IMAGE_ASPECT_STENCIL_BIT. On implementations that store depth and stencil aspects separately, querying each of these subresource layouts will return a different offset and size representing the region of memory used for that aspect. On implementations that store depth and stencil aspects interleaved, the same offset and size are returned and represent the interleaved memory allocation.

My image is a normal 2D image (0 layers, 1 mipmap), so there’s no arrayPitch or depthPitch.
The dds-data works fine with glCompressedTexImage2D, so the source data is definitely fine.
In OpenGL I’ve used GL_COMPRESSED_RGBA_S3TC_DXT1_EXT, for Vulkan I’m using VK_FORMAT_BC1_RGBA_UNORM_BLOCK, which should be equivalent.
Here’s my code for mapping the image data:


auto dds = load_dds("img.dds");
auto *srcData = static_cast<uint8_t*>(dds.data());
auto *destData = static_cast<uint8_t*>(vkImageMapPtr); // Pointer to mapped memory of VkImage
destData += layout.offset(); // layout = VkImageLayout of the image
assert((w %4) == 0);
assert((h %4) == 0);
assert(blockSize == 8); // S3TC BC1
auto wBlocks = w /4;
auto hBlocks = h /4;
for(auto y=decltype(hBlocks){0};y<hBlocks;++y)
{
	auto *rowDest = destData +y *layout.rowPitch();
	auto *rowSrc = srcData +y *(wBlocks *blockSize);
	for(auto x=decltype(wBlocks){0};x<wBlocks;++x)
	{
		auto *pxDest = rowDest +x *blockSize;
		auto *pxSrc = rowSrc +x *blockSize; // 4x4 image block
		memcpy(pxDest,pxSrc,blockSize); // 64Bit per block
	}
}

The code runs without issues, however the resulting image is corrupt (The colors are correct, but the blocks seem to be in the wrong order):

I’m not surprised the outcome looks like this, because the image layout is set to optimal. However, changing it to linear results in a gpu driver crash after one of the memcpy-calls. What could be the reason for that?

Alfonse_Reinheart · March 13, 2016, 12:42pm

Where’s the actual Vulkan code responsible for allocating this memory, creating the image from it, mapping it, and later rendering with it?

Silverlan · March 13, 2016, 3:55pm

It essentially boils down to this:


vk::Device device = ...; // Initialization
vk::AllocationCallbacks allocatorCallbacks = ...; // Initialization
[...] // Load the dds data
uint32_t width = dds.width();
uint32_t height = dds.height();
auto format = dds.format(); // = vk::Format::eBc1RgbaUnormBlock;

vk::Extent3D extent(width,height,1);

vk::ImageCreateInfo imageInfo(
	vk::ImageCreateFlagBits(0),
	vk::ImageType::e2D,format,
	extent,1,1,
	vk::SampleCountFlagBits::e1,
	vk::ImageTiling::eLinear,
	vk::ImageUsageFlagBits::eSampled | vk::ImageUsageFlagBits::eColorAttachment,
	vk::SharingMode::eExclusive,
	0,nullptr,
	vk::ImageLayout::eUndefined
);

vk::Image img = nullptr;
device.createImage(&imageInfo,&allocatorCallbacks,&img);

vk::MemoryRequirements memRequirements;
device.getImageMemoryRequirements(img,&memRequirements);
uint32_t typeIndex = 0;
get_memory_type(memRequirements.memoryTypeBits(),vk::MemoryPropertyFlagBits::eHostVisible,typeIndex); // -> typeIndex is set to 1
auto szMem = memRequirements.size();
vk::MemoryAllocateInfo memAlloc(szMem,typeIndex);
vk::DeviceMemory mem;
device.allocateMemory(&memAlloc,&allocatorCallbacks,&mem); // Note: Using the default allocation (nullptr) doesn't change anything
device.bindImageMemory(img,mem,0);
	
uint32_t mipLevel = 0;
vk::ImageSubresource resource(
	vk::ImageAspectFlagBits::eColor,
	mipLevel,
	0
);
vk::SubresourceLayout layout;
device.getImageSubresourceLayout(img,&resource,&layout);

auto *srcData = device.mapMemory(mem,0,szMem,vk::MemoryMapFlagBits(0));
[...] // Map the dds-data (See code from first post)
device.unmapMemory(mem);

(I’ve removed the error handling and some irrelevant parts in this snippet. I’m also using the Nvidia Vulkan C++ API, however the structure is the same as for the C API.)

As for the pipeline, it uses a combined image sampler descriptor and renders directly to the current swapchain framebuffer. I highly doubt the pipeline is an issue here, considering I can render uncompressed images perfectly fine exactly the same way, and the code is basically the same (Functionality-wise) as the one from the triangle demo in the SDK.
Since I’m using several abstraction layers, it would take a while to unravel the code completely. I’ll try to write a quick adaptation tomorrow of the triangle demo instead.

Silverlan · March 14, 2016, 4:04am

Alright, I’ve modified the triangle demo from the SDK, and the same problem occurs. I haven’t done any changes to the demo, other than this function (My changes are inside the ‘// DDS’ blocks:


static void
demo_prepare_texture_image(struct demo *demo, const uint32_t *tex_colors,
                           struct texture_object *tex_obj, VkImageTiling tiling,
                           VkImageUsageFlags usage, VkFlags required_props) {
	// DDS
	tiling = VK_IMAGE_TILING_LINEAR;
	struct dds ddsData = load_dds("image.dds");

    VkFormat tex_format = ddsData.format; // tex_format = VK_FORMAT_BC1_RGBA_UNORM_BLOCK (133)
    int32_t tex_width = ddsData.width; // tex_width = 512
    int32_t tex_height = ddsData.height; // tex_height = 512
	unsigned int blockSize = ddsData.blockSize; // blockSize = 8
	//
    VkResult U_ASSERT_ONLY err;
    bool U_ASSERT_ONLY pass;

    tex_obj->tex_width = tex_width;
    tex_obj->tex_height = tex_height;

    const VkImageCreateInfo image_create_info = {
        .sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
        .pNext = NULL,
        .imageType = VK_IMAGE_TYPE_2D,
        .format = tex_format,
        .extent = {tex_width, tex_height, 1},
        .mipLevels = 1,
        .arrayLayers = 1,
        .samples = VK_SAMPLE_COUNT_1_BIT,
        .tiling = tiling,
        .usage = usage,
        .flags = 0,
        .initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED
    };
    VkMemoryAllocateInfo mem_alloc = {
        .sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,
        .pNext = NULL,
        .allocationSize = 0,
        .memoryTypeIndex = 0,
    };

    VkMemoryRequirements mem_reqs;

    err =
        vkCreateImage(demo->device, &image_create_info, NULL, &tex_obj->image);
    assert(!err);

    vkGetImageMemoryRequirements(demo->device, tex_obj->image, &mem_reqs);

    mem_alloc.allocationSize = mem_reqs.size;
    pass =
        memory_type_from_properties(demo, mem_reqs.memoryTypeBits,
                                    required_props, &mem_alloc.memoryTypeIndex);
    assert(pass);

    /* allocate memory */
    err = vkAllocateMemory(demo->device, &mem_alloc, NULL, &tex_obj->mem);
    assert(!err);

    /* bind memory */
    err = vkBindImageMemory(demo->device, tex_obj->image, tex_obj->mem, 0);
    assert(!err);

    if (required_props & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) {
        const VkImageSubresource subres = {
            .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
            .mipLevel = 0,
            .arrayLayer = 0,
        };
        VkSubresourceLayout layout;
        void *data;
        int32_t x, y;

        vkGetImageSubresourceLayout(demo->device, tex_obj->image, &subres,
                                    &layout);

        err = vkMapMemory(demo->device, tex_obj->mem, 0,
                          mem_alloc.allocationSize, 0, &data);
        assert(!err);

		// DDS
		// Same code as in my first post, except in C
		int32_t w = tex_width;
		int32_t h = tex_height;
		uint8_t *srcData = (uint8_t*)(ddsData.data);
		uint8_t *destData = (uint8_t*)(data); // Pointer to mapped memory of VkImage
		destData += layout.offset; // layout = VkImageLayout of the image
		assert((w %4) == 0);
		assert((h %4) == 0);
		assert(blockSize == 8); // S3TC BC1
		uint32_t wBlocks = w /4;
		uint32_t hBlocks = h /4;
		for(uint32_t y=0;y<hBlocks;++y)
		{
			uint8_t *rowDest = destData +y *layout.rowPitch;
			uint8_t *rowSrc = srcData +y *(wBlocks *blockSize);
			for(uint32_t x=0;x<wBlocks;++x)
			{
				uint8_t *pxDest = rowDest +x *blockSize;
				uint8_t *pxSrc = rowSrc +x *blockSize; // 4x4 image block
				memcpy(pxDest,pxSrc,blockSize); // 64Bit per block
			}
		}
		//

		// Original Triangle demo code
       /* for (y = 0; y < tex_height; y++) {
            uint32_t *row = (uint32_t *)((char *)data + layout.rowPitch * y);
            for (x = 0; x < tex_width; x++)
                row[x] = tex_colors[(x & 1) ^ (y & 1)];
        }*/
		//

        vkUnmapMemory(demo->device, tex_obj->mem);
    }

    tex_obj->imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    demo_set_image_layout(demo, tex_obj->image, VK_IMAGE_ASPECT_COLOR_BIT,
                          VK_IMAGE_LAYOUT_PREINITIALIZED, tex_obj->imageLayout,
                          VK_ACCESS_HOST_WRITE_BIT);
    /* setting the image layout does not reference the actual memory so no need
     * to add a mem ref */
}

I’ve attached the entire file to this post, minus the code for actually loading the dds-file. (Which has dependencies on several external libraries.)

Silverlan · March 18, 2016, 6:47am

Still haven’t been able to make any progress on this. The entire test-program can be downloaded here.
I’m using gli to load the dds-data, which is also included in the project. The executable is located in “x64/Debug/tri.exe”
To build it, the Vulkan SDK include directory has to be added to the “tri” project, and the path to the dds has to be changed (tri.c, Line 809).
This is a 1:1 copy of the triangle demo, with the exception of the “demo_prepare_texture_image”-function in tri.c (Lines 803 to 903) and the “dds.cpp” and “dds.h” files. “dds.cpp” contains the code for loading the dds, and mapping the image memory.

The image it’s supposed to load (“x64/Debug/test.dds”, DXT1) looks like this:

The result, however, is this:

I’ve checked the specification again, but can’t find anything I may have missed. I would appreciate it if anyone could take a look.

Salabar · March 18, 2016, 12:06pm

It glitches consistently on a Radeon, if this is comforting. I don’t really understand the way you copy data on GPU. DDS is more or less opaque format optimized for GPU access. Shouldn’t single memcpy be sufficient instead of weird for(auto y=decltype(blockCount.y){0};y<blockCount.y;++y) loop?

Silverlan · March 19, 2016, 12:38am

I’m not entirely sure myself. Either way, I’ve tried changing the “map_data_dds” function to:


void map_data_dds(struct dds *r,void *imgData,VkSubresourceLayout layout)
{
	auto &tex = *static_cast<gli::texture*>(r->texture);
	gli::storage storage {tex.format(),tex.extent(),tex.layers(),tex.faces(),tex.levels()};

	auto *srcData = static_cast<uint8_t*>(tex.data(0,0,0));
	auto *destData = static_cast<uint8_t*>(imgData); // Pointer to mapped memory of VkImage
	destData += layout.offset; // layout = VkImageLayout of the image
	auto extents = tex.extent();
	auto w = extents.x;
	auto h = extents.y;
	auto blockSize = storage.block_size();
	auto blockCount = storage.block_count(0);
	auto blockExtent = storage.block_extent();
	memcpy(destData,srcData,storage.size());
}

The outcome is the same.