Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make matrix_cl thread safe #2905

Merged
merged 16 commits into from
Nov 7, 2023
Merged

Conversation

SteveBronder
Copy link
Collaborator

Summary

This makes matrix_cl's event vectors thread safe by using tbb::concurrent_vector instead of a std::vector.

In Stan a matrix_cl can be in a model class and shared across multiple threads. When using multiple threads we can have a race condition where two operations attempt to push to the vector of read/write events.

I think the simplest fix for this is the below where we replace std::vector with tbb::concurrent_vector. This has some issues though, notably that tbb::concurrent_vector is not a contiguous block of memory. Some functions like clenqueueReadBuffer expect a pointer to memory and the size of the block of memory so I end up doing is making a hard copy of the tbb::concurrent_vector into an std::vector. This is not optimal for performance.

If we wanted to I could write something based on rigtorp's ring_buffer but for just a vector of data. The only odd part of it would be when we have to resize the vector we would have to make a hard copy of the events into a new slice of memory and keep the old memory around until either the buffer falls out of scope or the last event is done. I'm fine with doing that as well.

Tests

I'm also not sure how to write a test for this? The current tests all pass (in fact I found a small bug in normal_lccdf where the results didn't match and fixed that).

Side Effects

The bad thing to note here is the copy every time we need a contiguous block of memory for queuing the events for read/write ops. This is not great, though it will be fine for now if we keep it this way and then later I can write a faster scheme.

Release notes

Replace the std::vectors for read/write events in matrix_cl with tbb::concurrent_vectors

Checklist

  • Math issue #(issue number)

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@SteveBronder SteveBronder changed the title Feature/threadsafe matrixcl Make matrix_cl thread safe May 17, 2023
@SteveBronder SteveBronder requested a review from rok-cesnovar May 17, 2023 17:17
@SteveBronder
Copy link
Collaborator Author

Another idea would be to just have a ring buffer with fixed capacity and before we push onto the queue we have a spin lock to wait till space is available

Comment on lines +619 to +640
cl::Event cstr_event;
std::vector<cl::Event>* dep_events = new std::vector<cl::Event>(
A.write_events().begin(), A.write_events().end());
try {
cl::Event cstr_event;
opencl_context.queue().enqueueCopyBuffer(A.buffer(), this->buffer(), 0, 0,
A.size() * sizeof(T),
&A.write_events(), &cstr_event);
A.size() * sizeof(T), dep_events,
&cstr_event);
if (opencl_context.device()[0].getInfo<CL_DEVICE_HOST_UNIFIED_MEMORY>()) {
buffer_cl_.setDestructorCallback(
&delete_it_destructor<std::vector<cl::Event>>, dep_events);
} else {
cstr_event.setCallback(
CL_COMPLETE, &delete_it_event<std::vector<cl::Event>>, dep_events);
}
this->add_write_event(cstr_event);
A.add_read_event(cstr_event);
} catch (const cl::Error& e) {
delete dep_events;
check_opencl_error("copy (OpenCL)->(OpenCL)", e);
} catch (...) {
delete dep_events;
throw;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a place where things get weird with the tbb concurrent vector. We need a local copy to hang around till this process is finished executing which involves making a raw pointer which we then use a callback on the buffer or copy event to destroy

@SteveBronder
Copy link
Collaborator Author

@rok-cesnovar sorry to bother but could you look at this?

@rok-cesnovar
Copy link
Member

Looking into this over the weekend.

Copy link
Member

@rok-cesnovar rok-cesnovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything here makes sense, so feel free to merge. Thanks!

@SteveBronder SteveBronder merged commit 35ac188 into develop Nov 7, 2023
@WardBrian WardBrian deleted the feature/threadsafe-matrixcl branch August 5, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants