Make matrix_cl thread safe #2905

SteveBronder · 2023-05-17T17:16:40Z

Summary

This makes matrix_cl's event vectors thread safe by using tbb::concurrent_vector instead of a std::vector.

In Stan a matrix_cl can be in a model class and shared across multiple threads. When using multiple threads we can have a race condition where two operations attempt to push to the vector of read/write events.

I think the simplest fix for this is the below where we replace std::vector with tbb::concurrent_vector. This has some issues though, notably that tbb::concurrent_vector is not a contiguous block of memory. Some functions like clenqueueReadBuffer expect a pointer to memory and the size of the block of memory so I end up doing is making a hard copy of the tbb::concurrent_vector into an std::vector. This is not optimal for performance.

If we wanted to I could write something based on rigtorp's ring_buffer but for just a vector of data. The only odd part of it would be when we have to resize the vector we would have to make a hard copy of the events into a new slice of memory and keep the old memory around until either the buffer falls out of scope or the last event is done. I'm fine with doing that as well.

Tests

I'm also not sure how to write a test for this? The current tests all pass (in fact I found a small bug in normal_lccdf where the results didn't match and fixed that).

Side Effects

The bad thing to note here is the copy every time we need a contiguous block of memory for queuing the events for read/write ops. This is not great, though it will be fine for now if we keep it this way and then later I can write a faster scheme.

Release notes

Replace the std::vectors for read/write events in matrix_cl with tbb::concurrent_vectors

Checklist

Math issue #(issue number)
Copyright holder: Steve Bronder

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

…result

SteveBronder · 2023-05-17T17:19:18Z

Another idea would be to just have a ring buffer with fixed capacity and before we push onto the queue we have a spin lock to wait till space is available

stan/math/opencl/copy.hpp

SteveBronder · 2023-05-17T18:00:13Z

stan/math/opencl/matrix_cl.hpp

+    cl::Event cstr_event;
+    std::vector<cl::Event>* dep_events = new std::vector<cl::Event>(
+        A.write_events().begin(), A.write_events().end());
    try {
-      cl::Event cstr_event;
      opencl_context.queue().enqueueCopyBuffer(A.buffer(), this->buffer(), 0, 0,
-                                               A.size() * sizeof(T),
-                                               &A.write_events(), &cstr_event);
+                                               A.size() * sizeof(T), dep_events,
+                                               &cstr_event);
+      if (opencl_context.device()[0].getInfo<CL_DEVICE_HOST_UNIFIED_MEMORY>()) {
+        buffer_cl_.setDestructorCallback(
+            &delete_it_destructor<std::vector<cl::Event>>, dep_events);
+      } else {
+        cstr_event.setCallback(
+            CL_COMPLETE, &delete_it_event<std::vector<cl::Event>>, dep_events);
+      }
      this->add_write_event(cstr_event);
      A.add_read_event(cstr_event);
    } catch (const cl::Error& e) {
+      delete dep_events;
      check_opencl_error("copy (OpenCL)->(OpenCL)", e);
+    } catch (...) {
+      delete dep_events;
+      throw;


This is a place where things get weird with the tbb concurrent vector. We need a local copy to hang around till this process is finished executing which involves making a raw pointer which we then use a callback on the buffer or copy event to destroy

…-matrixcl

SteveBronder · 2023-10-25T18:10:54Z

@rok-cesnovar sorry to bother but could you look at this?

rok-cesnovar · 2023-10-26T16:19:34Z

Looking into this over the weekend.

rok-cesnovar

I think everything here makes sense, so feel free to merge. Thanks!

SteveBronder added 2 commits May 16, 2023 17:37

use tbb::concurrent_vector for multithreaded use of matrix_cl types

400c94d

Fix normal_lccdf for opencl so that LOG_HALF is added N times to the …

4e1f542

…result

SteveBronder changed the title ~~Feature/threadsafe matrixcl~~ Make matrix_cl thread safe May 17, 2023

SteveBronder requested a review from rok-cesnovar May 17, 2023 17:17

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

670ca40

new line

5069528

SteveBronder commented May 17, 2023

View reviewed changes

stan/math/opencl/copy.hpp Show resolved Hide resolved

SteveBronder commented May 17, 2023

View reviewed changes

SteveBronder and others added 10 commits July 24, 2023 09:57

fix opencl normal_lccdf test copy / paste error

48d79e7

use reference instead of copy for several command queue calls for OpenCL

7cafa74

add newline

cab8b87

update headers for opencl

21a1c95

use unordered map instead of map

b879a6b

Merge commit '2e87ecadc075bab42fd7a51178ce33ae2f6c3ccc' into HEAD

f3585d6

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

8a78efd

set default constructor for internal multi kernel

3e0006f

revert kernel_cache_ back to a std::map

3453621

add back map for profiling

b5fbe4a

SteveBronder mentioned this pull request Aug 8, 2023

adds wrapper to hard copy write events for opencl copies stan-dev/stan#3217

Merged

3 tasks

Merge remote-tracking branch 'origin/develop' into feature/threadsafe…

a845875

…-matrixcl

SteveBronder mentioned this pull request Aug 10, 2023

update opencl assign test to reassign zero sized vector after recover_memory() stan-dev/stan#3219

Merged

3 tasks

update with develop

baffed6

rok-cesnovar approved these changes Nov 7, 2023

View reviewed changes

SteveBronder merged commit 35ac188 into develop Nov 7, 2023

WardBrian deleted the feature/threadsafe-matrixcl branch August 5, 2024 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make matrix_cl thread safe #2905

Make matrix_cl thread safe #2905

SteveBronder commented May 17, 2023

SteveBronder commented May 17, 2023

SteveBronder May 17, 2023

SteveBronder commented Oct 25, 2023

rok-cesnovar commented Oct 26, 2023

rok-cesnovar left a comment

Make matrix_cl thread safe #2905

Make matrix_cl thread safe #2905

Conversation

SteveBronder commented May 17, 2023

Summary

Tests

Side Effects

Release notes

Checklist

SteveBronder commented May 17, 2023

SteveBronder May 17, 2023

Choose a reason for hiding this comment

SteveBronder commented Oct 25, 2023

rok-cesnovar commented Oct 26, 2023

rok-cesnovar left a comment

Choose a reason for hiding this comment