Remove use of `threadid` #429

devmotion · 2022-10-17T21:13:35Z

Just noticed that ThreadSafeVarInfo makes heavy use of threadid:

Lines 22 to 29 in 715526f

    
           function acclogp!!(vi::ThreadSafeVarInfo, logp) 
        
               vi.logps[Threads.threadid()] += logp 
        
               return vi 
        
           end 
        
           function acclogp!!(vi::ThreadSafeVarInfoWithRef, logp) 
        
               vi.logps[Threads.threadid()][] += logp 
        
               return vi 
        
           end

That should be changed since it is broken in Julia >= 1.8 if users don't use @threads :static for ... (see e.g. https://discourse.julialang.org/t/behavior-of-threads-threads-for-loop/76042).

The text was updated successfully, but these errors were encountered:

devmotion · 2022-10-19T08:07:29Z

Generally, I feel the general use of ThreadSafeVarInfo with non-mutable varinfo objects probably is broken and should maybe be disallowed. Everytime different threads would call any of the !! methods (or the ones without !! that still return a new varinfo object), different threads would continue working with different varinfo objects.

The more general use with mutable varinfo objects could be fixed, I assume, by restricting access with a lock. But maybe that's not the most efficient way.

torfjelde · 2022-10-19T16:55:45Z

Generally, I feel the general use of ThreadSafeVarInfo with non-mutable varinfo objects probably is broken and should maybe be disallowed.

So it should still be functional for evaluation because, yes, even though the returned __varinfo__ is thread-local, the logp field which is mutated is still the same.

For the immutable varinfos we actually have the opportunity to even parallelize sampling by just implementing a merge for the varinfos, though it would require some manual labour from the users side, e.g. mapreduce.

The more general use with mutable varinfo objects could be fixed, I assume, by restricting access with a lock.

Seems like a good idea 👍

torfjelde · 2023-02-01T14:50:26Z

Btw, regarding this, how can :dynamic cause issues here?

It seems like :dynamic means that tasks can migrate between threads, but how is it then possible that the two different tasks try to mutate the same index at the same time? For this to happen, it seems that you'd need one Task to yield on the line

vi.logps[Threads.threadid()] += logp

and then another Task, which is moved to the same thread, also hits this line before going back to "complete" this line in the original Task.

But there's no way the above line will actually cause the Task to yield back to the scheduler, right? So even though, yes, now there won't be a specific Task per thread, there still doesn't seem to be a scenario where the same thread will try to perform the above accumulation using two different tasks simultaenously?

Oooor am I misunderstanding something here?

devmotion · 2023-02-01T21:01:12Z

True, it seems when opening the issue I thought it's worse than it actually is. Generally though, IMO using threadid is bad style even in cases where currently it does not introduce concurrency issues. I think the explanation in https://juliafolds.github.io/FLoops.jl/stable/explanation/faq/#faq-state-threadid is quite good. Currently this specific line does not yield to the scheduler AFAIK (e.g., it does not involve any IO operation or logging) but that's not guaranteed to be the case in future Julia versions.

torfjelde · 2023-02-02T12:03:22Z

I think the explanation in https://juliafolds.github.io/FLoops.jl/stable/explanation/faq/#faq-state-threadid is quite good

Haha, this is exactly the explanation I was looking at that made me question your original worry 😅

But no, I agree with you that the usage of threadid because it could potentially introduce issues like those mentioned 👍

devmotion · 2023-02-05T19:10:22Z

JuliaLang/julia#48542

torfjelde · 2023-02-06T13:01:54Z

Ah neat!

Something like Dictionaries.jl would probably also be useful if you already know the return-type and the it's consistent across tasks.

EDIT: On second thought, usage of Dictionaries.jl is probably not a great idea as it probably expands the underlying Vector on demand, which would surely lead to issues when multiple threads are trying to write to it 😅

torfjelde · 2023-02-06T13:38:44Z

How bad of an idea is it to create a random buffer of Atomic and then use atomic_add!? I.e.

julia> Threads.nthreads()
16

julia> f(x) = log(x)
f (generic function with 1 method)

julia> g(n) = sum(f, 1:n)
g (generic function with 1 method)

julia> function g_t(n; buffer_size=Threads.nthreads())
           buffers = [Threads.Atomic{Float64}(0) for _ = 1:buffer_size]
           Threads.@threads for i = 1:n
               buffer_idx = rand(1:buffer_size)
               Threads.atomic_add!(buffers[buffer_idx], f(i))
           end
           return sum(deref, buffers)
       end
g_t (generic function with 1 method)

julia> @btime $g($(10_000_000));
  74.448 ms (0 allocations: 0 bytes)

julia> @btime $g_t($(10_000_000); buffer_size=$(1));
  613.850 ms (101 allocations: 8.56 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(2));
  386.434 ms (101 allocations: 8.55 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(4));
  351.702 ms (102 allocations: 8.56 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(16));
  152.913 ms (114 allocations: 8.84 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(32));
  91.895 ms (130 allocations: 9.22 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(64));

  88.032 ms (162 allocations: 9.98 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(128));
  83.266 ms (227 allocations: 11.52 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(1024));
  67.109 ms (1122 allocations: 32.55 KiB)

🙃

EDIT: A probably better solution would be to have buffer_size number of locks that we could just check if are currently acquired or not, and then we just iterate through until we hit one that isn't locked (with some max-tries of course before just waiting). The above was just a quick and dirty way to do something that would still improve efficiency quite drastically vs. a single Atomic.

torfjelde · 2023-02-06T13:40:17Z

Oh and another thing: Atomics won't work once you put AD in there, rigth? E.g. Atomic seems to have a rather limited support, and so I'd expect even something like Dual to not be usable.

devmotion mentioned this issue Mar 10, 2023

50% correctness with certain Turing models EnzymeAD/Enzyme.jl#659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove use of `threadid` #429

Remove use of `threadid` #429

devmotion commented Oct 17, 2022

devmotion commented Oct 19, 2022

torfjelde commented Oct 19, 2022

torfjelde commented Feb 1, 2023

devmotion commented Feb 1, 2023

torfjelde commented Feb 2, 2023 •

edited

Loading

devmotion commented Feb 5, 2023

torfjelde commented Feb 6, 2023 •

edited

Loading

torfjelde commented Feb 6, 2023 •

edited

Loading

torfjelde commented Feb 6, 2023

Remove use of threadid #429

Remove use of threadid #429

Comments

devmotion commented Oct 17, 2022

devmotion commented Oct 19, 2022

torfjelde commented Oct 19, 2022

torfjelde commented Feb 1, 2023

devmotion commented Feb 1, 2023

torfjelde commented Feb 2, 2023 • edited Loading

devmotion commented Feb 5, 2023

torfjelde commented Feb 6, 2023 • edited Loading

torfjelde commented Feb 6, 2023 • edited Loading

torfjelde commented Feb 6, 2023

Remove use of `threadid` #429

Remove use of `threadid` #429

torfjelde commented Feb 2, 2023 •

edited

Loading

torfjelde commented Feb 6, 2023 •

edited

Loading

torfjelde commented Feb 6, 2023 •

edited

Loading