How does one do a device to device Copy (over nvlink)

tweakoz · May 2, 2022, 3:46pm

And to be specific, how does one do so not using device groups, as I have read elsewhere that copying between physical devices in the same VkInstance should be preferred over multiGPU device groups.

I am specifically interested in copying framebuffers with VK_IMAGE_TILING_OPTIMAL (for texturing). The devices will be identical, implied by the nvlink. Therefore tiling should be identical. And of course I am interested in the fastest performing copy.

If it matters I am currently targeting 2 nvlinked 3090’s - and will be happy with a non-portable solution.

One potential solution I have thought of so far is using a cuda interop p2p copy - though I would prefer a Vulkan only solution if one exists.

Thanks.

Alfonse_Reinheart · May 2, 2022, 3:57pm

Can you cite a source for this?

As for the problem at hand, wouldn’t it make sense to just share the memory between the two devices?

tweakoz · May 2, 2022, 4:06pm

One source:

https://www.reddit.com/r/vulkan/comments/rfxihy/comment/hojh5kg/?utm_source=share&utm_medium=web2x&context=3

Blockquote
As for the problem at hand, wouldn’t it make sense to just share the memory between the two devices?

Perhaps. Sharing in this case might consist of GPU-1 drawing uv-warped geometry into a framebuffer sitting in GPU-1’s memory being textured with image data from GPU-2 - over nvlink (if this is even possible in Vulkan). Whether or not this would be ideal would depend on the cache miss rate. If data was being fetched from the remote GPU on average more than once per texel then I think the copy would be better. Given the number of samples I would need to take per texel it is probable that at least some of texels would have to be fetched multiple times, though I may be able to mitigate this by making the warp operation more cache friendly - all of this assuming the texturing cache on GPU-1 can cache data coming from GPU-2 over nvlink.

All that said, I am still trying to get my 2 nvlink’ed 3090’s to even show up in the same device group. ATM they are not in the same device group. nvidia-smi topo -m shows nvlink as active, and the CUDA P2P code examples show the expected behavior (~50GB/sec transfer bandwith over nvlink). So I know the NVLINK is physically working correctly. The 3090’s are just not showing up in the same Vulkan device groups.

It also should be noted that this is on Linux - Xorg , not Windows.

nvidia-smi topo -m

          GPU0    GPU1     mlx4_0      CPU Affinity       NUMA Affinity
GPU0       X      NV4	    SYS	      0-23,48-71	        0
GPU1      NV4	   X        NODE      24-47,72-95	        1
mlx4_0    SYS    NODE        X

nvidia-smi topo -p2p rwnap

       GPU0	GPU1	
GPU0	X	OK	
GPU1	OK	X

vulkaninfo groups excerpt

Groups:
=======
	Device Group Properties (Group 0):
		physicalDeviceCount: count = 1
			llvmpipe (LLVM 12.0.0, 256 bits) (ID: 0)
		subsetAllocation = 0
	Device Group Present Capabilities (Group 0):
		llvmpipe (LLVM 12.0.0, 256 bits) (ID: 0)
		Can present images from the following devices:
			llvmpipe (LLVM 12.0.0, 256 bits) (ID: 0)
		Present modes:
			DEVICE_GROUP_PRESENT_MODE_LOCAL_BIT_KHR
	Device Group Properties (Group 1):
		physicalDeviceCount: count = 1
			NVIDIA GeForce RTX 3090 (ID: 0)
		subsetAllocation = 0
	Device Group Present Capabilities (Group 1):
		NVIDIA GeForce RTX 3090 (ID: 0)
		Can present images from the following devices:
			NVIDIA GeForce RTX 3090 (ID: 0)
		Present modes:
			DEVICE_GROUP_PRESENT_MODE_LOCAL_BIT_KHR
	Device Group Properties (Group 2):
		physicalDeviceCount: count = 1
			NVIDIA GeForce RTX 3090 (ID: 0)
		subsetAllocation = 0
	Device Group Present Capabilities (Group 2):
		NVIDIA GeForce RTX 3090 (ID: 0)
		Can present images from the following devices:
			NVIDIA GeForce RTX 3090 (ID: 0)
		Present modes:
			DEVICE_GROUP_PRESENT_MODE_LOCAL_BIT_KHR

system · November 1, 2022, 4:07pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.