I’m curious what has happened to the constant cache in Fermi. The graphic for the previous generation of SM clearly shows a “C cache”:
While the new SM for the fermi architecture has the “Configurable L1/Shared Memory” plus a “Unified Cache”.
What’s the difference between configured L1 and “Unified Cache”? And if constant caches are a thing of the past do the new caches still have the same problem that two work-items in a warp accessing different addresses will cause a serialization?
Not to mention the question, how does one “configure” the L1 cache? Hopefully this done based on the shared memory needs of the specific kernel being launched. But then again, how does this balance out with the ability to run multiple kernels on the device at once? Or is it still only one kernel per SM at a time?
I’m rambling… I should get back to coding…