I have a thoroughly complex kernel processing audio input data. It will run for a couple of minutes, 60 times a second, and then hang. That’s on the GPU; on the CPU it will run for hours. The input data are constantly changing, but each variable is always within proscribed ranges. I have inserted test code before uploading the inputs to the kernel each frame; in this test code, I can force these inputs to be well below their valid input range, but it still will eventually crash. (Say the valid range for a particular input is 0->400; I can force it to 0->1 and it will STILL eventually crash. I can force it to be below 0.1 and it will still ultimately bite the dust.) However, if I force the input variables to zero, the GPU will happily dance for hours. Of course, that input-free dance is not so particularly interesting.
I’m at a loss so far, though I have clues. I can make it crash much faster than 2 minutes if an input variable is high in its approved range. I can make it crash in less then 10 seconds under the right circumstances. BUT, I can’t seem to back_off_of those certain circumstances such that they go away. As said above, I can force the input vars into ridiculously small portions of their valid range, and the kernel (let’s call him Harlan Sanders) will eventually go belly-up. BUT, if they’re forced to actual zero, no problems puppy, we can run all day long.
To repeat, I’m a bit at a loss - although I have things that seem like clues, I have not yet figured out what they are hinting at, though I’ve been trying for a few days. Frankly, I do not expect to find a real solution by asking here; whenever I stumble over a problem in opencl it seems that my fate is to be the first to articulate that particular problem. I guess this is part of the fun of being in on a technology during its infancy!!! BUT, I want to do some serious, sustainable work with this “baby” (or, maybe, “toddler”).
Op details: MacBook Pro 2010, OS 10.6.8, nv 330M GPU, xcode 3.2.5, shorts, teeshirt.
bonus P.S. for those who’ve read this far, including a related question:
My laptop, soldier that it has proved to be, is not powerful enough for the next stage. I must sell some stocks/bonds and purchase a Mac Pro. I’m looking at the ATI 5870. So, PERHAPS my problem will simply go away when I compile the .cl for the ATI??? Maybe I have run into a bug in the nV implementation. Maybe my kernel is so complex that I’m running into undetected resource limits (it’s 1300 lines of code). So, SINCE I run fine on the CPU, perhaps I’ll have no bugs, or different bugs, on the ATI card???
Thanks, guys & dolls –