Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return used tls from @threads for easier use of task_local_storage() #48543

Closed

Conversation

IanButterworth
Copy link
Member

@IanButterworth IanButterworth commented Feb 5, 2023

In thinking about #48542 I wondered maybe @threads could return the tls of all the used tasks, which would make task_local_storage() easier to use with @threads.

Something like

julia> function sum_multi(a::Vector{T}) where T
           tlss = Threads.@threads for i in eachindex(a)
               buf = get(task_local_storage(), :buf, zero(T))::T
               task_local_storage()[:buf] = buf + a[i]
           end
           return sum(tls -> tls[:buf]::T, tlss)
       end
sum_multi (generic function with 1 method)

julia> sum_multi(1:1_000_000)
500000500000

(Maybe this toy example could be reduced further with smarter get usage?)

@IanButterworth IanButterworth added the multithreading Base.Threads and related functionality label Feb 5, 2023
@IanButterworth
Copy link
Member Author

To make this friendly and not too verbose to repl usage, a minimal show method for Vector{IdDict{Any, Any}} could be added

@vtjnash
Copy link
Member

vtjnash commented Feb 6, 2023

I think there is already a different macro for parallel reductions that can be more efficient than this (parallel reductions)? I forget what the name changed to. It used to be @parallel in v0.0.

@MasonProtter
Copy link
Contributor

@vtjnash are you thinking of what's now @distributed [reducer] for ...? We've never had something like that for multithreading in Base, only multiprocessing as far as I'm aware.

@vtjnash
Copy link
Member

vtjnash commented Feb 12, 2023

Yeah, sounds right

@vchuravy
Copy link
Member

I think the right strategy here is to implement something akin to Hyper Reducers in Cilk (http://www.fftw.org/~athena/papers/hyper.pdf)

@tkf was experimenting with similar constructs, and I think it would be worthwhile providing a richer infrastructure in Base,
but I don't think returning/exposing TLS is the right strategy.

@@ -95,6 +95,7 @@ function threading_run(fun, static)
if !isempty(failed_tasks)
throw(CompositeException(map(TaskFailedException, failed_tasks)))
end
return map(Base.get_task_tls, tasks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simply:

Suggested change
return map(Base.get_task_tls, tasks)
return tasks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there other useful things inside the tasks?

Copy link
Member Author

@IanButterworth IanButterworth Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's nicer is that the show method for a vector of tasks is a lot tidier than a vector of dict. So running a threaded for loop in the repl would be tidier.

@IanButterworth
Copy link
Member Author

IanButterworth commented Jun 3, 2023

Another idea would be to expose a way to handle the buffer objects without the user having to manipulate tasks/TLSs

xs, ys = Threads.@threads x = zeros(10,10) y = zeros(2,2) for i in eachindex(a)
 	# do something with x, y, a[i] etc.
end

# optionally do something with xs & ys which are both of length threadpoolsize()

Edit: tried out here #50052

@vchuravy
Copy link
Member

vchuravy commented Jun 3, 2023

I would prefer that over exposing the TLS objects in that way. Kinda reminds me of Floops reducers.

I am not super convinced by the syntax, but the common ask seems to be that one wants some buffer shared by iterations on that tasks and then somehow returned.

@jonas-schulze
Copy link
Contributor

jonas-schulze commented Jun 3, 2023

How about having the local keyword create a task-local variable?

Edit: Or some new Julia keyword task_local akin to thread_local from C++.

https://en.cppreference.com/w/cpp/language/storage_duration

@IanButterworth
Copy link
Member Author

What would an example of that look like? I think I get idea but not sure how initialization would be indicated

@IanButterworth
Copy link
Member Author

It seems to me that using task_local_storage() with @threads (and returning tls in some form from the macro) would rarely be a good fit because task_local_storage() access has to be in the loop body, thus you introduce dict operations every iteration.

#50052 seems like a better fit for @threads use cases.

I'm tempted to close this in favor of that

@vchuravy vchuravy closed this Jun 4, 2023
@IanButterworth IanButterworth deleted the ib/threads_return_tlss branch June 4, 2023 01:17
@jonas-schulze
Copy link
Contributor

What would an example of that look like? I think I get idea but not sure how initialization would be indicated

Threads.@threads for i in eachindex(a)
    # only initialized once; not reset between different `i`
    # similar to `static` from C/C++
    local x = zeros(10, 10)
    local y = zeros(2, 2)

    # do something with x, y, a[i] etc.
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants