Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First run is always very slow to start (Windows) #1

Closed
StaffanBetner opened this issue Dec 20, 2023 · 4 comments
Closed

First run is always very slow to start (Windows) #1

StaffanBetner opened this issue Dec 20, 2023 · 4 comments

Comments

@StaffanBetner
Copy link

I am using R 4.3.2, and the latest version of the package.

This is how it looks when I run e.g. stan_sample for the first time in a session:

> library(StanEstimators)
> library(tictoc)
> 
> lnorm_loglik <- function(par, x){sum(dlnorm(x, meanlog = par[1], sdlog = par[2], log = TRUE))}
> x = rlnorm(100, meanlog = 1, sdlog = 1)
> tic()
> stan_sample(lnorm_loglik, par_inits = c(1,1), additional_args = list(x = x), lower = c(-Inf, 0), num_chains = 4) -> fit
method = sample (Default)
  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 4
id = 1 (Default)
data
  file = C:\Users\staff\AppData\Local\Temp\Rtmp0WIYmd\file9fec6fe01b9b.json
init = C:\Users\staff\AppData\Local\Temp\Rtmp0WIYmd\file9fecc54156.json
random
  seed = 2412832692 (Default)
output
  file = C:\Users\staff\AppData\Local\Temp\Rtmp0WIYmd\file9fec72095f97.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
num_threads = 1 (Default)

### Main waiting time is here!
Gradient evaluation took 0.000363 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.63 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 0.000271 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.71 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 0.000291 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.91 seconds.
Adjust your expectations accordingly!



Gradient evaluation took 0.000302 seconds
1000 transitions using 10 leapfrog steps per transition would take 3.02 seconds.
Adjust your expectations accordingly!


Chain [1] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [1] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [1] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [1] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [1] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [1] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [1] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [1] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [1] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [1] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [1] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [1] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [1] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [1] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [1] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [1] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [1] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [1] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [1] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [1] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [1] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [1] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 1.562 seconds (Warm-up)
               1.236 seconds (Sampling)
               2.798 seconds (Total)

Chain [2] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [2] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [2] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [2] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [2] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [2] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [2] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [2] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [2] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [2] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [2] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [2] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [2] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [2] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [2] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [2] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [2] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [2] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [2] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [2] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [2] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [2] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 2.233 seconds (Warm-up)
               2.02 seconds (Sampling)
               4.253 seconds (Total)

Chain [3] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [3] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [3] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [3] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [3] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [3] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [3] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [3] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [3] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [3] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [3] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [3] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [3] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [3] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [3] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [3] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [3] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [3] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [3] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [3] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [3] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [3] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 1.964 seconds (Warm-up)
               1.716 seconds (Sampling)
               3.68 seconds (Total)

Chain [4] Iteration:    1 / 2000 [  0%]  (Warmup)
Chain [4] Iteration:  100 / 2000 [  5%]  (Warmup)
Chain [4] Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain [4] Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain [4] Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain [4] Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain [4] Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain [4] Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain [4] Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain [4] Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain [4] Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain [4] Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain [4] Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain [4] Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain [4] Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain [4] Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain [4] Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain [4] Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain [4] Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain [4] Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain [4] Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain [4] Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 1.531 seconds (Warm-up)
               1.661 seconds (Sampling)
               3.192 seconds (Total)

> toc()
432.25 sec elapsed

As the log indicates, it is not the sampling that takes 7 minutes, but something else. The main waiting time is before the first "Gradient evaluation took xxx". If I run any of the functions again, it doesn't have any waiting time like this, regardless of data and loglikelihood function.

@andrjohns
Copy link
Owner

I've reworked the handling of multiple/parallel chains to avoid the use of a mutex/locking when accessing the R session, can you try with the current main branch?

Also just to flag that I've changed the num_threads argument to parallel_chains

@StaffanBetner
Copy link
Author

StaffanBetner commented Dec 24, 2023

The problem still persists, unfortunately. But now I can kill the function without killing the R process.

@andrjohns
Copy link
Owner

Should be resolved in main now

@StaffanBetner
Copy link
Author

Splendid! Can confirm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants