-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make matrix_cl thread safe #2905
Conversation
Another idea would be to just have a ring buffer with fixed capacity and before we push onto the queue we have a spin lock to wait till space is available |
cl::Event cstr_event; | ||
std::vector<cl::Event>* dep_events = new std::vector<cl::Event>( | ||
A.write_events().begin(), A.write_events().end()); | ||
try { | ||
cl::Event cstr_event; | ||
opencl_context.queue().enqueueCopyBuffer(A.buffer(), this->buffer(), 0, 0, | ||
A.size() * sizeof(T), | ||
&A.write_events(), &cstr_event); | ||
A.size() * sizeof(T), dep_events, | ||
&cstr_event); | ||
if (opencl_context.device()[0].getInfo<CL_DEVICE_HOST_UNIFIED_MEMORY>()) { | ||
buffer_cl_.setDestructorCallback( | ||
&delete_it_destructor<std::vector<cl::Event>>, dep_events); | ||
} else { | ||
cstr_event.setCallback( | ||
CL_COMPLETE, &delete_it_event<std::vector<cl::Event>>, dep_events); | ||
} | ||
this->add_write_event(cstr_event); | ||
A.add_read_event(cstr_event); | ||
} catch (const cl::Error& e) { | ||
delete dep_events; | ||
check_opencl_error("copy (OpenCL)->(OpenCL)", e); | ||
} catch (...) { | ||
delete dep_events; | ||
throw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a place where things get weird with the tbb concurrent vector. We need a local copy to hang around till this process is finished executing which involves making a raw pointer which we then use a callback on the buffer or copy event to destroy
@rok-cesnovar sorry to bother but could you look at this? |
Looking into this over the weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything here makes sense, so feel free to merge. Thanks!
Summary
This makes
matrix_cl
's event vectors thread safe by usingtbb::concurrent_vector
instead of astd::vector
.In Stan a
matrix_cl
can be in a model class and shared across multiple threads. When using multiple threads we can have a race condition where two operations attempt to push to the vector of read/write events.I think the simplest fix for this is the below where we replace
std::vector
withtbb::concurrent_vector
. This has some issues though, notably thattbb::concurrent_vector
is not a contiguous block of memory. Some functions likeclenqueueReadBuffer
expect a pointer to memory and the size of the block of memory so I end up doing is making a hard copy of thetbb::concurrent_vector
into anstd::vector
. This is not optimal for performance.If we wanted to I could write something based on rigtorp's
ring_buffer
but for just a vector of data. The only odd part of it would be when we have to resize the vector we would have to make a hard copy of the events into a new slice of memory and keep the old memory around until either the buffer falls out of scope or the last event is done. I'm fine with doing that as well.Tests
I'm also not sure how to write a test for this? The current tests all pass (in fact I found a small bug in normal_lccdf where the results didn't match and fixed that).
Side Effects
The bad thing to note here is the copy every time we need a contiguous block of memory for queuing the events for read/write ops. This is not great, though it will be fine for now if we keep it this way and then later I can write a faster scheme.
Release notes
Replace the
std::vector
s for read/write events inmatrix_cl
withtbb::concurrent_vectors
Checklist
Math issue #(issue number)
Copyright holder: Steve Bronder
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested