Maybe it’s not relevant, but that’s actually the ATI hardware that is fast…
But is it thinkable that ATI use some kind of optimization in its driver to speed up such transfer? Maybe compressing the data or whatever…
If yes, even to that extent?
on ATI hardware, pbo is not necessarily speeding up data transfer; we are able to transfer async most of the time anyway.
I noticed that ATI numbers were better but did not want to show off; I am happy that someone noticed it still.
there has been significant improvement over the last year in that area, and most of data transfer is fully hardware accelerated for ATI (data conversion, data alignment, …), which might be what happens here.
Are you timing just the data upload? Is it possible that the laptop’s GPU is sharing memory with main memory? So on the laptop, the upload is just main memory to main memory, but on the desktop, the upload is main memory to video memory.
high end ASIC tends to have more ALU, texture units, memory bandwidth, ROP, …
however, some performance aspect tend not to change at all:
bandwidth of pcie bus (related to number of lanes / pcie gen 1 or 2)
primitive throughput (related to engine clock)
so it is possible for a low end product to be faster than a high end in primitive throughput, if the low end has faster engine clock.
similarly, it is possible to be faster in data throughput if your cpu and system bandwidth are faster for your laptop than the one used for our high end ASIC.
data transfer performance is often limited by cpu speed and cpu memory bandwidth, as opposed to always limited by pcie bus.