Performance on APU with different buffer creation strategies

I’m testing various buffer creation streategies on an APU (acer iconia tab). The algorithm is Saxpy (vector addition), performed many times with different vector sizes. In particular, I’d like to find out if on an APU there is the chance to perform vector addition on the GPU faster than on the CPU, something that is practically never convenient on a traditional architecture (CPU and GPU not on the same chip) due to the PCI bus latency.

Since the ram is shared between the CPU and the GPU, I expected that creating a buffer with USE_HOST_POINTER and using Mapping/Unmapping would lead to extremely better performances. However, I tested both a project where data transfers between buffers and host memory are performed “manually” (i.e. enqueueRead/WriteBuffer) and a project based on Mapping/Unmapping. In the first case, the GPU execution time begins to be lower than the CPU execution time for vectors that are bigger than about 1 million elements. In the second case, the GPU never “wins” on the CPU, that is, its execution time is always higher than the CPU one. Moreover, the GPU execution time with mapping is lower than the GPU execution time with copy only for “small” vector sizes, but it turns to be higher for quite big vectors.

Any idea about this? Is my assumpition wrong?

The C++ sources of the projects: … opyPtr.cpp

The followings are the execution timings (GPU with copy and GPU with mapping), in the format: VECT_SIZE EXEC_TIME

CPU execution timings:

The kernel:

You are using AMD OpenCL driver, right? Then you need to check AMD APP OpenCL programming guide, zero copy buffers are thoroughly covered in that document.

I’ve currently installed the following driver:
Driver Packaging Version 8.881-110728a-122938C-ATI
Catalyst Version 11.8
Provider ATI Technologies Inc.
2D Driver Version
2D Driver File Path /REGISTRY/MACHINE/SYSTEM/ControlSet001/Control/Class/{4D36E968-E325-11CE-BFC1-08002BE10318}/0000
Direct3D Version
OpenGL Version
AMD VISION Engine Control Center Version 2011.0728.1756.30366
AMD Audio Driver Version

I’ve also installed the AMD APP SDK 2.5.

Are those an appropriate version of the software needed?