I have a small question about allocating and mapping buffers in pinned memory; is there maximum pinned memory size? how to know it? sometimes I keep increasing the size of my pinned memory buffers & test my program… the program behavior changes after certain (not necessarily fixed) allocation size; it either fail to allocate one of the buffers or, more terribly, it crashes and I receive a message from the operating system that my display driver stopped responding and recovered!
The answer depends on the operating system, etc. There’s no way in OpenCL to query it; however, I would expect OpenCL drivers to be smart and fall back into other methods if they run out of pinned memory, so the query would not be very useful anyway.
My general advice is this: any time you see your OpenCL driver crashing and you are pretty sure that it’s not the application’s fault, try to simplify the app as much as possible (10 lines of code is great) while still showing the bug, then go to the vendor’s customer support system and give them the source code along with the driver version and hardware you are using.
Thanks a lot David for your help always! I could overcome the problem of crashing by increasing the GPU timeout value…
But I’m really astonished about the pinned memory behavior because I measure the runtime with and without using pinned memory and ironically the program runs faster without using the pinned memory!
I wrote a code similar to that written in the “NVIDIA OpenCL best practices guide” but it’s slower although they mentioned that pinned memory has greater bandwidth…
is there any tips for using pinned memory?
I’m sorry, I’m of no use here.
I have also tried to use pinned memory on a Nvidia GPU by following the NVIDIA OpenCL best practices guide. Everything works fine, i.e. asynchronous data transfers and kernel executions, as long as the sum of the pinned memory buffer and the further global memory buffers on the GPU does not exceed the total amount of global memory, which is available on the GPU card. If I try to enlarge the pinned memory buffer, the kernel execution crashes.
In other words, it seems that in contrast to CUDA, the pinned memory buffer is not only allocated in host memory but also in global device memory. Did anyone else also experience this behaviour?
Thanks a lot birmat for your reply! that’s really useful information that sounds like very reasonable… but still using pinned memory in my application doesn’t improve the performance, on the contrarily, when I use pageable memory the execution time is faster… but I’m suspicious about a phrase that is written in the “OpenCl Best Practices Guide”, it says:
Pinned memory should not be overused. Excessive use can reduce overall system
performance because pinned memory is a scarce resource. How much is too much
is difficult to tell in advance, so as with all optimizations, test the applications and
the systems they run on for optimal performance parameters.
I guess my application might be overusing pinned memory so shall I ask you how much pinned memory transfers you have at your application? and what’s the average data size of these transfers? This might make things clearer for me so I’d be grateful if you could help!
sorry for bothering you again birmat but I’ll need also to know what Nvidia card you are using please…
Just a comment on pinned memory usage, for my code (which inputs a lot of data) using pinned memory increased the read rate by a ratio of nearly 3.
However, I soon noticed that if anyone on the computer uses MatLab, I fall back to the paged memory performance !
I suspect that MatLab uses ALL the pinned memory available on the system and that if openCL cannot allocate in pinned memory, it silently allocate in paged memory.
Hence, Naroqueen, if you want to use pinned memory, I suggest you ban the usage of MatLab on the machine during your GPU testing (important, a «clear all» is not sufficient, you must quit MatLab for the memory to be released).
(I noticed that with MatLab, but perhaps other applications may behave in a antisocial way with memory: try as much as possible to release pinned memory when you are finished )
For the Linux users,
gives the current per process limit on locked memory (on bash) in kilobytes.
You may try to increase it by :
ulimit -S -l 16384
for example. But that way you are limited to the hard limit (the hard limit is given by
ulimit -H -l
) for increasing the hard limit and the default user limit, you should add lines as
mylogon hard memlock unlimited
mylogon soft memlock 16384
@mygroup hard memlock unlimited
@mygroup soft memlock 16384
to /etc/security/limits.conf and log again (launching a new shell won’t make it, because you’ll get the limits of you login. However ssh -X localhost will do because it is a sort of relogin). No need to reboot of course, it is unix…