The only things I can think of are to make sure the global sizes are right and that the images are created with the right size. Make sure you are not using CL_MEM_USE_HOST_PTR since that will add the additional complication of making sure you allocate the right amount of size. But those should all be detected with the test you mentioned.
What size images are you looking at? What platform are you on? Are the images >16 bits/pixel and > 8k wide?
An in/out copy will probably work fine since the bits are just being moved across untouched.
When you use read_image in the GPU kernel it will use hardware to do the image access, and if the format is unsupported you’ll get garbage.
Unfortunately testing for error conditions is really hard so I don’t think the spec enforces them. I would definitely file a bug with Nvidia since this seems like a bug in the Nvidia OpenCL driver. I know on Mac OS X you’ll get that error if you try to use an image type that is not supported on a given device.