S3TC BC1 compression layout / Using linear image layout for compressed images

I’ve been trying to load compressed BC1/DXT1 images, but so far I haven’t had much luck.
Here are the relevant snippets from the specification:

My image is a normal 2D image (0 layers, 1 mipmap), so there’s no arrayPitch or depthPitch.
The dds-data works fine with glCompressedTexImage2D, so the source data is definitely fine.
In OpenGL I’ve used GL_COMPRESSED_RGBA_S3TC_DXT1_EXT, for Vulkan I’m using VK_FORMAT_BC1_RGBA_UNORM_BLOCK, which should be equivalent.
Here’s my code for mapping the image data:

auto dds = load_dds("img.dds");
auto *srcData = static_cast<uint8_t*>(dds.data());
auto *destData = static_cast<uint8_t*>(vkImageMapPtr); // Pointer to mapped memory of VkImage
destData += layout.offset(); // layout = VkImageLayout of the image
assert((w %4) == 0);
assert((h %4) == 0);
assert(blockSize == 8); // S3TC BC1
auto wBlocks = w /4;
auto hBlocks = h /4;
for(auto y=decltype(hBlocks){0};y<hBlocks;++y)
	auto *rowDest = destData +y *layout.rowPitch();
	auto *rowSrc = srcData +y *(wBlocks *blockSize);
	for(auto x=decltype(wBlocks){0};x<wBlocks;++x)
		auto *pxDest = rowDest +x *blockSize;
		auto *pxSrc = rowSrc +x *blockSize; // 4x4 image block
		memcpy(pxDest,pxSrc,blockSize); // 64Bit per block

The code runs without issues, however the resulting image is corrupt (The colors are correct, but the blocks seem to be in the wrong order):

I’m not surprised the outcome looks like this, because the image layout is set to optimal. However, changing it to linear results in a gpu driver crash after one of the memcpy-calls. What could be the reason for that?

Where’s the actual Vulkan code responsible for allocating this memory, creating the image from it, mapping it, and later rendering with it?

It essentially boils down to this:

vk::Device device = ...; // Initialization
vk::AllocationCallbacks allocatorCallbacks = ...; // Initialization
[...] // Load the dds data
uint32_t width = dds.width();
uint32_t height = dds.height();
auto format = dds.format(); // = vk::Format::eBc1RgbaUnormBlock;

vk::Extent3D extent(width,height,1);

vk::ImageCreateInfo imageInfo(
	vk::ImageUsageFlagBits::eSampled | vk::ImageUsageFlagBits::eColorAttachment,

vk::Image img = nullptr;

vk::MemoryRequirements memRequirements;
uint32_t typeIndex = 0;
get_memory_type(memRequirements.memoryTypeBits(),vk::MemoryPropertyFlagBits::eHostVisible,typeIndex); // -> typeIndex is set to 1
auto szMem = memRequirements.size();
vk::MemoryAllocateInfo memAlloc(szMem,typeIndex);
vk::DeviceMemory mem;
device.allocateMemory(&memAlloc,&allocatorCallbacks,&mem); // Note: Using the default allocation (nullptr) doesn't change anything
uint32_t mipLevel = 0;
vk::ImageSubresource resource(
vk::SubresourceLayout layout;

auto *srcData = device.mapMemory(mem,0,szMem,vk::MemoryMapFlagBits(0));
[...] // Map the dds-data (See code from first post)

(I’ve removed the error handling and some irrelevant parts in this snippet. I’m also using the Nvidia Vulkan C++ API, however the structure is the same as for the C API.)

As for the pipeline, it uses a combined image sampler descriptor and renders directly to the current swapchain framebuffer. I highly doubt the pipeline is an issue here, considering I can render uncompressed images perfectly fine exactly the same way, and the code is basically the same (Functionality-wise) as the one from the triangle demo in the SDK.
Since I’m using several abstraction layers, it would take a while to unravel the code completely. I’ll try to write a quick adaptation tomorrow of the triangle demo instead.

Alright, I’ve modified the triangle demo from the SDK, and the same problem occurs. I haven’t done any changes to the demo, other than this function (My changes are inside the ‘// DDS’ blocks:

static void
demo_prepare_texture_image(struct demo *demo, const uint32_t *tex_colors,
                           struct texture_object *tex_obj, VkImageTiling tiling,
                           VkImageUsageFlags usage, VkFlags required_props) {
	// DDS
	struct dds ddsData = load_dds("image.dds");

    VkFormat tex_format = ddsData.format; // tex_format = VK_FORMAT_BC1_RGBA_UNORM_BLOCK (133)
    int32_t tex_width = ddsData.width; // tex_width = 512
    int32_t tex_height = ddsData.height; // tex_height = 512
	unsigned int blockSize = ddsData.blockSize; // blockSize = 8
    VkResult U_ASSERT_ONLY err;
    bool U_ASSERT_ONLY pass;

    tex_obj->tex_width = tex_width;
    tex_obj->tex_height = tex_height;

    const VkImageCreateInfo image_create_info = {
        .pNext = NULL,
        .imageType = VK_IMAGE_TYPE_2D,
        .format = tex_format,
        .extent = {tex_width, tex_height, 1},
        .mipLevels = 1,
        .arrayLayers = 1,
        .samples = VK_SAMPLE_COUNT_1_BIT,
        .tiling = tiling,
        .usage = usage,
        .flags = 0,
    VkMemoryAllocateInfo mem_alloc = {
        .pNext = NULL,
        .allocationSize = 0,
        .memoryTypeIndex = 0,

    VkMemoryRequirements mem_reqs;

    err =
        vkCreateImage(demo->device, &image_create_info, NULL, &tex_obj->image);

    vkGetImageMemoryRequirements(demo->device, tex_obj->image, &mem_reqs);

    mem_alloc.allocationSize = mem_reqs.size;
    pass =
        memory_type_from_properties(demo, mem_reqs.memoryTypeBits,
                                    required_props, &mem_alloc.memoryTypeIndex);

    /* allocate memory */
    err = vkAllocateMemory(demo->device, &mem_alloc, NULL, &tex_obj->mem);

    /* bind memory */
    err = vkBindImageMemory(demo->device, tex_obj->image, tex_obj->mem, 0);

    if (required_props & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) {
        const VkImageSubresource subres = {
            .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
            .mipLevel = 0,
            .arrayLayer = 0,
        VkSubresourceLayout layout;
        void *data;
        int32_t x, y;

        vkGetImageSubresourceLayout(demo->device, tex_obj->image, &subres,

        err = vkMapMemory(demo->device, tex_obj->mem, 0,
                          mem_alloc.allocationSize, 0, &data);

		// DDS
		// Same code as in my first post, except in C
		int32_t w = tex_width;
		int32_t h = tex_height;
		uint8_t *srcData = (uint8_t*)(ddsData.data);
		uint8_t *destData = (uint8_t*)(data); // Pointer to mapped memory of VkImage
		destData += layout.offset; // layout = VkImageLayout of the image
		assert((w %4) == 0);
		assert((h %4) == 0);
		assert(blockSize == 8); // S3TC BC1
		uint32_t wBlocks = w /4;
		uint32_t hBlocks = h /4;
		for(uint32_t y=0;y<hBlocks;++y)
			uint8_t *rowDest = destData +y *layout.rowPitch;
			uint8_t *rowSrc = srcData +y *(wBlocks *blockSize);
			for(uint32_t x=0;x<wBlocks;++x)
				uint8_t *pxDest = rowDest +x *blockSize;
				uint8_t *pxSrc = rowSrc +x *blockSize; // 4x4 image block
				memcpy(pxDest,pxSrc,blockSize); // 64Bit per block

		// Original Triangle demo code
       /* for (y = 0; y < tex_height; y++) {
            uint32_t *row = (uint32_t *)((char *)data + layout.rowPitch * y);
            for (x = 0; x < tex_width; x++)
                row[x] = tex_colors[(x & 1) ^ (y & 1)];

        vkUnmapMemory(demo->device, tex_obj->mem);

    demo_set_image_layout(demo, tex_obj->image, VK_IMAGE_ASPECT_COLOR_BIT,
                          VK_IMAGE_LAYOUT_PREINITIALIZED, tex_obj->imageLayout,
    /* setting the image layout does not reference the actual memory so no need
     * to add a mem ref */

I’ve attached the entire file to this post, minus the code for actually loading the dds-file. (Which has dependencies on several external libraries.)

Still haven’t been able to make any progress on this. The entire test-program can be downloaded here.
I’m using gli to load the dds-data, which is also included in the project. The executable is located in “x64/Debug/tri.exe”
To build it, the Vulkan SDK include directory has to be added to the “tri” project, and the path to the dds has to be changed (tri.c, Line 809).
This is a 1:1 copy of the triangle demo, with the exception of the “demo_prepare_texture_image”-function in tri.c (Lines 803 to 903) and the “dds.cpp” and “dds.h” files. “dds.cpp” contains the code for loading the dds, and mapping the image memory.

The image it’s supposed to load (“x64/Debug/test.dds”, DXT1) looks like this:

The result, however, is this:

I’ve checked the specification again, but can’t find anything I may have missed. I would appreciate it if anyone could take a look. :slight_smile:

It glitches consistently on a Radeon, if this is comforting. :slight_smile: I don’t really understand the way you copy data on GPU. DDS is more or less opaque format optimized for GPU access. Shouldn’t single memcpy be sufficient instead of weird for(auto y=decltype(blockCount.y){0};y<blockCount.y;++y) loop?

I’m not entirely sure myself. Either way, I’ve tried changing the “map_data_dds” function to:

void map_data_dds(struct dds *r,void *imgData,VkSubresourceLayout layout)
	auto &tex = *static_cast<gli::texture*>(r->texture);
	gli::storage storage {tex.format(),tex.extent(),tex.layers(),tex.faces(),tex.levels()};

	auto *srcData = static_cast<uint8_t*>(tex.data(0,0,0));
	auto *destData = static_cast<uint8_t*>(imgData); // Pointer to mapped memory of VkImage
	destData += layout.offset; // layout = VkImageLayout of the image
	auto extents = tex.extent();
	auto w = extents.x;
	auto h = extents.y;
	auto blockSize = storage.block_size();
	auto blockCount = storage.block_count(0);
	auto blockExtent = storage.block_extent();

The outcome is the same.