Problems with vkCmdCopyImage into swapchain

markus-zzz · April 5, 2016, 10:43pm

I am trying to create a very minimal Vulkan example and have run into issues in that it works seemingly OK on desktop (Linux GeForce GT 720) but not so good on my Nvidia shield device.

The example more or less creates an image, draws a grid into it (by memory mapping and using the CPU), acquires a swapchain image and copies into that swapchain image.

Validation layers are enabled and they have no complaints (they did at first, of course, but then I fixed those remarks).

The symptom is that on the shield device the grid looks skewed/corrupted and only every third frame seem to be shown while the other two are black.

If anyone could glimpse over the code and suggest what I am doing wrong i would much appreciate it.

The code is at vulkan/minimal at master · markus-zzz/vulkan · GitHub and builds for android and/or xcb without any additional dependencies.

Thanks!

krOoze · April 6, 2016, 7:20am

Sure.

Well, if problems are on Shield, the chances always are, that it is caused by that particular gimmick.

Nice to see actual code for a change, so let’s see (!shameless nitpicking warning!):

Sequentialy as I read the code:
Init():

Do you have a (good) reason to override the official Vulkan loader with your own? And you should perhaps only load vkGetInstanceProcAddr() the evil way and the rest through it (spec ch. 3.1).
In vk_minimal_init you rebuild mostly everything of vk_minimal_context, yet you expect the VkInstance and Surface of it to be defined. That’s a little bit odd design.
If you want only the first PhysicalDevice, you can just call EnumeratePD only once. On the other hand, if you want robust future-proof snippet, you should do loop until it is VK_SUCCESS and not VK_INCOMPLETE. Similarly for all the functions with this usage paradigm.
Though the queue families are usually nicely ordered, you should not expect the first one to be GRAPHICS. Loop through them to pick the one you want.

if (vkGetPhysicalDeviceSurfaceSupportKHR)

You should assert() that instead (and in your loader). Or use the official loader.

Use a variable, not a magic constant for the queue family even if you only use 0. Less error chances that way.
You should preinitialize your structs to 0s or use C designated initializers to avoid errors.
E.g. in your dqci variable (and others), you now have random flags member.
I can’t find it or it is omitted in spec. So it would perhaps be healthy to assert that queueCount of FamilyProperties is > 0.
I would estheticaly expect highest queue priority (1.0), if it is the only queue.
In VkDeviceCreateInfo you have specified no validation layers.
In VkDeviceCreateInfo you have NULL for pEnabledFeatures. Which means disable everything. Is that intentional?
If you intent to copy into swapchain directly, you should query support of that capability from the VkSurfaceCapabilitiesKHR.
you check > 0 for the present modes. Should do the same for the formats then. Both are guaranteed by spec I think though(two present modes and some formats if display is supported).
In VkSwapchainCreateInfoKHR you should use the queried format ColorSpace, not magic constant.
In VkSwapchainCreateInfoKHR you should most likely have your preTransform set to the currentTransform of VkSurfaceCapabilitiesKHR.
In VkSwapchainCreateInfoKHR you should not demand at least 3 images and then just pick the first presentation mode you get. You probably do not want MODE_IMMEDIATE with this.
You should use VK_NULL_HANDLE and not NULL for objects (handles)
There is no vk_minimal_destroy(), which is a bit evil.

TBC

krOoze · April 6, 2016, 9:52am

Draw():

In draw_grid() it would be healthy to add the offset at start.
The vkResetCommandBuffer() is redundant. The Begin() should do that implicitely.
You should set your InheritanceInfo* to nullptr, because you declared this as primary cmd buffer.
Your barriers are extremely paranoid and general-purpose. I presume that is intentional, because you can’t get it working properly yet.
The draw() seems to be designed to be called only once. So I hope you are not violating that.
Just a tip: I think it is not good feng shui to pollute the space inbetween Begin and EndCmdBuffer with unrelated (non vkCmd) commands. I even indent that part.
In vkAcquireNextImageKHR you have timeout=0, which means everything fails if image is not immediately available. Is this intentional? If not use UINT64_MAX=wait/block until available.
pWaitDstStageMask of VkSubmitInfo may NOT be nullptr. That should result in access violation.

markus-zzz · April 6, 2016, 11:22am

Thanks! I really appreciate the effort

I have fixed most of your remarks (at least those that I feel could have anything to do with correctness) and pushed to github. Status on the shield device is the same though so still not displaying correctly…

Obviously this example is intended to be minimal so I am really trying to get away with hard-coding as much as possible and the only goal is to make it run on these two devices (desktop and shield).

As for the dynamic loading this is basically what Nvidia tells me to do for Android since we cannot link to libvulkan.so. I have seen other people do this dynamic loading except that I cannot stand repeatedly specifying all the functions so I decided to go with a .def files approach.

My example is so simple so I would not expect it to use any features that have to be enabled so I thought pEnabledFeatures == NULL should be ok.

Correct, the excessive synchronization is intentionally redundant in a fruitless effort to make it work.

When I used validation layers (for the xcb variant) they were set up with environment variables so they are not visible in the code.

Besides that I agree with all of your comments but could you elaborate a bit on ‘draw()’ only being designed to be called once? I do call it multiple times and I can see that the semaphore creation/destruction could probably be moved out it should still be functionally correct right?

Again thanks for doing this!

krOoze · April 6, 2016, 4:05pm

You are welcome sir, and well you have it under unlicence, so we are both getting screwed.

Didn’t think it would be that easy. Just gonna fetch my MK55 Tactical Nuclear Launcher…

Something tells me, the symptom of the two black frames would be the minImages=3 (and now maxImages which should only make it worse) and IMMEDIATE mode and driver mishandling that unexpected setting. Try minI=1 and IMMEDIATE (or minI=N and FIFO).
BTW maxImages can be 0, meaning unrestricted number, so should not be used for minImages.

If desperate, I would try cmdBlit in place of cmdCopy, to rule out another hunch (should be same output if driver is correct).

Loader: I guess, if it works, it works… (I don’t know how I feel about the hijacking of VkInstance object functions directly from dll. And performance-wise they should import (applicable) functions through vkGetDeviceProc.)

Yes, there seem to be none in the Features to interfere, when disabled.

As for the draw() re-callability: I stopped to watch for it, so there may be more, but:
a) the barrier of the source image changes layout from P to G. On second pass is already G. The spec seems ambiguous on allowing that, so not sure what it does exactly…
b) the buffer is rerecorded. But I suppose that is for the laziness sake and because of the swapchain image. Should work as it is…

Gonna try compile+run this in a bit on my W10 AMD desktop (I also keep updated validation layers). So there should be more…

markus-zzz · April 6, 2016, 11:16pm

Setting minImageCount = 1 makes swapchain initialization fail but setting minImageCount = 2 seem to make only every other frame blank and setting minImageCount = 5 makes 4/5 blank (using IMMEDIATE mode).

I tried vkCmdBlitImage with similar results; it works fine on desktop but looks messed up on the shield (not exactly in the same way though).

I got the impression from some presentation (forgot which) that layout image barriers were not that important to Nvidia GPUs and the desktop behavior seems to confirm that but could be that it is different for mobile devices such as the shield. Anyway it would be nice if the validation layers actually warned about this (will file a bug on them if that turned out to be the root cause).

OK, so next thing to have a look at is the image barriers then…

krOoze · April 7, 2016, 6:27am

Read the shield documentation, and they do say to hard-set minImages to 3, as it was. BTW the Vulkan support seems to be preview so trouble is to be expected.
Just try all the other presentation modes (FIFO) one by one. IMMEDIATE really seems like a bad fit.

There is room for paranoia still. You can try putting vkWaitDeviceIdle() at start, after unmap, after aquireImage, after submit and after present.

So I managed to compile+run it:
It works correctly (even though my implementation lists TRANSFER as unsupported for my swapchain images).

1)vk_minimal_context::canvas.size should be VkDeviceSize not uint32_t.

ERROR: unknown0: 11, DS, Source AccessMask 65536 [VK_ACCESS_MEMORY_WRITE_BIT] must have required access bit 16384 [VK_ACCESS_HOST_WRITE_BIT] when layout is VK_IMAGE_LAYOUT_PREINITIALIZED, unless the app has previously added a barrier for this transition.

Looks ominous.

WARNING: unknown0: 11, DS, Additional bits in Source accessMask 32768 [VK_ACCESS_MEMORY_READ_BIT] are specified when layout is VK_IMAGE_LAYOUT_UNDEFINED.

I think it expects 0 flags for undefined layout (as in no cache flush needed, only execution dependency).

PERFORMANCE: unknown0: 6, DS, Layout for input image should be TRANSFER_SRC_OPTIMAL instead of GENERAL.

And one for DST too. That’s to be expected. But I assume GENERAL layout is harder to implement for driver developer (and do wrong). Never can rule it out as a cause.

I am still suspicious of the PREINITIALIZED to GENERAL barrier for subsequent frames. Maybe just draw a single frame until you get it working, to be sure.

krOoze · April 7, 2016, 6:49am

PS: Just try to build https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers demos and https://github.com/SaschaWillems/Vulkan samples. If it works, we can search for what they do differently. If not, no point bashing our heads against the wall, till they improve driver.

krOoze · April 7, 2016, 6:57am

PS2:
Missed this one in logs:
6)

ERROR: CommandBuffer0: 6, DS, Cannot submit cmd buffer using image with layout VK_IMAGE_LAYOUT_GENERAL when first use is VK_IMAGE_LAYOUT_PREINITIALIZED.

Most likely what I mentioned in 5).

markus-zzz · April 7, 2016, 9:21am

Your validation layer output looks most useful. For some reason I am not getting any of that when I run it How did you setup/enable the validation?

markus-zzz · April 7, 2016, 9:47am

Ah, never mind my last comment; turned out that I had a typo in the validation layer script so none of them got enabled (I had them working at some point before though).

I am getting the same output as you now so I will dig into that and then we shall see

krOoze · April 7, 2016, 11:25am

Programaticaly through the Vulkan API:

Pass VkDebugReportCallbackCreateInfoEXT to pNext of InstanceCI. And enable VK_LAYER_LUNARG_standard_validation and VK_EXT_debug_report.
Load the EXT function(s). Register my debug callback.
Enable VK_LAYER_LUNARG_standard_validation in DeviceCI.
Done

Also I keep them fresh streight from the github, so they possibly catch a bit more. (Actually nevermind, seems that LunarG just released new SDK)

You do (I assume):


$ export VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_standard_validation
$ export VK_DEVICE_LAYERS=VK_LAYER_LUNARG_standard_validation

+vk_layer_settings.txt with installed LnG SDK. Should work too (possibly even on Android).

markus-zzz · April 8, 2016, 7:11am

Problem has been solved

Actually turned out to be more Android related than Vulkan I would say. Adding the following line to the Android initialization made things work perfectly.

ANativeActivity_setWindowFlags(state->activity, AWINDOW_FLAG_FULLSCREEN, 0);

Without it the swapchain created from state->window turned out to be quite broken…

Anyway thanks again for helping and pointing out improvements and validation violations.

All committed to above mentioned repository in 477460eb.