Callbacks in sycl

user1900 · July 16, 2020, 4:41pm

Can SYCL use callbacks like OpenCL?

The purpose is to:

enqueue kernel operation
async the runtime calls the callback function when the kernel finishes
callback function: do some management and enqueue read operation
async the runtime calls the second callback function when the read op finishes
second callback function: do some management, continue in 1 or do other stuff.

How can be done this with SYCL? What could be the most performant way? I need to do such decisions and management as soon as the op finishes, and dynamically do new ops.

rodcplay · July 28, 2020, 2:12pm

In SYCL 1.2.1 it’s not possible to use callbacks but you can do it with the underlying OpenCL objects. It should be possible to do it without relying on callbacks. Perhaps something like dedicating a thread to it which blocks on memory reads (i.e. a host accessor) or maybe just waits on the kernel finishing.

In SYCL 2020 you can enqueue functions to run on the host using a host task, though the caveat to this is the function is enqueued onto the DAG so is subject to data dependencies and therefore isn’t guaranteed not to be reordered by the runtime, though generally if you enqueue the host task immediately after enqueueing the kernel it will run right after the kernel completes, if you want to guarantee no reordering then you could use an in-order queue (also in SYCL 2020). This does mean waiting for a SYCL implementation with the host task implementation.

user1900 · October 8, 2020, 12:46pm

Hi,

Thank you.

How can you block on memory reads (host accessor)?

And maybe related, how can I get the time it took to do the data transfer (reading or writing) and in the kernel, independently?

Example:

auto submit_event = q.submit([&](handler &h) {
  auto a = buf_a.get_access<sycl::access::mode::read>(h);
  auto b = buf_b.get_access<sycl::access::mode::read>(h);
  auto c = buf_c.get_access<sycl::access::mode::write>(h);

  h.parallel_for<class test_event>(R, [=](item<2> item) {
      auto sum = 0;
      for (size_t k = 0; k < N; ++k) {
        const auto a_ik = a[{item[0], k}];
        const auto b_kj = b[{k, item[1]}];
        sum += a_ik * b_kj;
      }
      c[item] = sum;
    });

});

submit_event.wait();

I assume the submit_event.wait() waits for data write + compute kernel + data read as a whole. And then with submit_event.get_profiling_info<sycl::info::event_profiling::command_start> and such i can get the profiling info for all the operations, but I don’t know how to distinguish between data and kernel.

gordonbrown589 · October 19, 2020, 10:15pm

Hi @user1900,

How can you block on memory reads (host accessor)?

So a host accessor will block access to the buffer for its lifetime, so when it is constructed it will wait for any kernels accessing it to complete before the constructor returns and any other access to the buffer will wait for the host accessor to be destroyed. However, there is no way to measure the time taken for the data transfer in this case.

And maybe related, how can I get the time it took to do the data transfer (reading or writing) and in the kernel, independently?

To separate the data transfer and the kernel function execution you can use the asynchronous copy command which can be called from within a command group just like parallel_for. Todo this you would create the buffers as uninitialized and copy the data from the host pointers asynchronously, for example:

auto copy_a_event = q.submit([&](handler &h) {
  auto a = buf_a.get_access<sycl::access::mode::write>(h);
  h.copy(host_a, a);
});

auto copy_b_event = q.submit([&](handler &h) {
  auto b = buf_b.get_access<sycl::access::mode::write>(h);
  h.copy(host_b, b);
});

auto kernel_event = q.submit([&](handler &h) {
  auto a = buf_a.get_access<sycl::access::mode::read>(h);
  auto b = buf_b.get_access<sycl::access::mode::read>(h);
  auto c = buf_c.get_access<sycl::access::mode::write>(h);

  h.parallel_for<class test_event>(R, [=](item<2> item) {
      auto sum = 0;
      for (size_t k = 0; k < N; ++k) {
        const auto a_ik = a[{item[0], k}];
        const auto b_kj = b[{k, item[1]}];
        sum += a_ik * b_kj;
      }
      c[item] = sum;
    });
});

auto copy_c_event = q.submit([&](handler &h) {
  auto c = buf_c.get_access<sycl::access::mode::read>(h);
  h.copy(c, host_c);
});

q.wait();

Now you have separate events copy_a_event, copy_b_event and copy_c_event which can be queried for profiling information separately from kernel_event.

I hope this helps.

Gordon

system · April 17, 2021, 10:15pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.