Strange VkDestroyImage behaviour

Hello! I am experiencing some strange behavior with VkDestroyImage, and I cannot figure out why it’s happening. When calling VkDestroyImage on an image created with a depth format, the call takes around 0.2ms, while destroying images with a color format takes around 0.001ms. The images are allocated through a custom VMA pool. Any ideas on what’s going on? Thanks!

There’s really no rule saying it can’t be that way. Maybe depth images include some roundtrip to the GPU in that particular driver…

What platform is this?

This is Windows on an RTX 3080. I’m mostly asking to confirm that it’s the drivers causing this and not something I’ve done wrong.

Before deleting the resources, I’m calling vkWaitForFences on the fence belonging to the command buffer using the resources, so they should be free.

With a minimal example I get ~15us for depth image destruction, and ~7us for color image destruction.

Hmm, very interesting.

Is there any way incorrect synchronization could slow the destruction of an image down? And without the validation layers saying anything?

I have six images being destroyed, two of them depth images. The depth images take around 0.2ms, and the color ones take around 0.002ms.

What system are you testing on?

I think incorrect synchronization is an undefined behavior. If you enter that state, the computer is allowed to come alive and eat all your donuts. You can try the minimal example to eliminate some possibilities.

#include <vulkan/vulkan.h>

#include <iostream>
#include <chrono>

int main() {
	VkInstanceCreateInfo ici{};

	VkInstance instance;
	auto err = vkCreateInstance( &ici, nullptr, &instance );
	if( err ) throw err;

	VkPhysicalDevice phys_device;
	uint32_t num_devices = 1;
	err = vkEnumeratePhysicalDevices( instance, &num_devices, &phys_device );
	if( err ) throw err;

	float prio = 1.0f;
	VkDeviceQueueCreateInfo qci{};
	qci.queueFamilyIndex = 0;
	qci.queueCount = 1;
	qci.pQueuePriorities = &prio;

	VkDeviceCreateInfo dci{};
	dci.queueCreateInfoCount = 1;
	dci.pQueueCreateInfos = &qci;

	VkDevice device;
	err = vkCreateDevice( phys_device, &dci, nullptr, &device );
	if( err ) throw err;

	for( int i = 0; i < 5; ++i ){
		VkImageCreateInfo imgci{};
		imgci.imageType = VK_IMAGE_TYPE_2D;
		imgci.format = VK_FORMAT_D24_UNORM_S8_UINT;
		imgci.extent = {1920, 1080, 1};
		imgci.mipLevels = 1;
		imgci.arrayLayers = 1;
		imgci.samples = VK_SAMPLE_COUNT_1_BIT;
		imgci.tiling = VK_IMAGE_TILING_OPTIMAL;

		VkImage depth_img;
		err = vkCreateImage( device, &imgci, nullptr, &depth_img );
		if( err ) throw err;

		imgci.format = VK_FORMAT_R8G8B8A8_UNORM;

		VkImage color_img;
		err = vkCreateImage( device, &imgci, nullptr, &color_img );
		if( err ) throw err;

		auto begin = std::chrono::high_resolution_clock::now();
		vkDestroyImage( device, color_img, nullptr );
		auto end = std::chrono::high_resolution_clock::now();
		std::cout << "color destruction: " << std::chrono::duration_cast<std::chrono::microseconds>(end-begin).count() << "us" << std::endl;

		begin = std::chrono::high_resolution_clock::now();
		vkDestroyImage( device, depth_img, nullptr );
		end = std::chrono::high_resolution_clock::now();
		std::cout << "depth destruction: " << std::chrono::duration_cast<std::chrono::microseconds>(end-begin).count() << "us" << std::endl;


The test was on W10 with GTX 1060 6G 531.41 GRD

Don’t forget for valid measurements to build in Release mode and have layers off.

So I tested the program and have an average depth destruction of 4us.

The difference to what I’m doing is that I’m using VMA to allocate images. So either it’s me doing funky stuff and missing synchronization, or it’s because of some shenanigans with VMA (unlikely). Thanks for the help so far.

I also tested calling vkWaitDeviceIdle before destroying the images. Shouldn’t that ensure the images are finished being used on the GPU?

Alright, so I found the issue! Destroying images is slow when they have been allocated thru a custom VMA pool. Thanks a lot for the help!