Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate relative + absolute tolerance into grad_2F1 #2175

Merged
merged 22 commits into from
Nov 27, 2020

Conversation

sakrejda
Copy link
Contributor

Summary

Fixes issue #2158 by introducing combined absolute/relative tolerance to grad_2F1 and increasing the max number of default steps. Overall it should terminate sooner for very large gradients, the increased max number of steps is required to get the stated precision in the tests. New tests are referenced vs. a quick discrete diff in Mathematica.

The absolute/relative tolerance calculation is borrowed from the unit test formulation of the same problem:

Tests

Test case uses values from a user's failing model and reference values from mathematica. The gradients are in the tens of millions and the original absolute tolerance calculation was excessive there.

Side Effects

Nothing noticeable. Most likely improved performance for models that hit the boundaries in neg_binomial_cdf and anything else that relies on grad_2F1. Fixes a user-reported model bug as a side-effect (linked above).

Release notes

Make gradients for negative binomial and 2F1 function more robust for boundary values.

Checklist

By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

…to grad_2F1 and increasing the max number of default steps. Overall it should terminate sooner for very large gradients . New tests are referenced vs. a quick discrete diff in Mathematica.
…erence interval in the test of the neg_binomial_2_ccdf was too narrow so bumping it higher.
@sakrejda sakrejda self-assigned this Oct 29, 2020
@sakrejda
Copy link
Contributor Author

This works locally for the failing test so if there's a standard way to tickle the test system I'd appreciate any help.

@rok-cesnovar
Copy link
Member

I am able to replicate locally with clang 10.0 on Ubuntu. Not sure on the solution though.

@sakrejda
Copy link
Contributor Author

@rok-cesnovar thanks for checking the bug happens for you, I'm a little burned out on reading C++ errors at the moment but I'm sure I can figure it out. I'm on arch linux so it's usually one of the newest compiler versions and going to Ubuntu is like getting in a time machine.

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a closer look. I'm not sure relative error makes sense here. As far as tests, the thing that seemed to matter is the increased number of steps of the algorithm, not the error test change.

If I print out the error on convergence, I'm always seeing really small numbers which makes me think log_t_new is an increment (and so just making sure it is below some value is already a relative error check).

It looks like from the line log_t_new += log(fabs(p)) + log_z; that log_t_new is being incremented to some value, but I think this is actually the log of the terms in the power series of the function and so the terms are getting smaller in the sequence and the log is getting more negative (and so the log_t_new accumulation is only negative, and isn't really converging itself to anything, cuz ideally it goes to -inf which is zero error).

The variable p at the top of the loop looks like incrementally computing new factors to be included in each term in the series (look at the definition at the bottom of the Notation section here -- I wonder if p was for Pochhammer: https://en.wikipedia.org/wiki/Generalized_hypergeometric_function#Notation)

@sakrejda
Copy link
Contributor Author

@bbbales2 just for context: I re-wrote and re-organized the hypergeometric functions and their gradients last time I worked significantly on Stan, so I'm familiar with the calculations and their challenges. It would take another significant chunk of work to improve them over the current status and that is not what I'm trying to address in the PR.

@bbbales2
Copy link
Member

Yeah that's fine. I don't think we need any major work here.

I'm happy just multiplying max_steps by 10 and calling it a day.

Since the relative tolerance was new, I tried to verify that tit was behaving well at the switch and I wasn't able to do that. The tests I ran (./runTests.py test/unit/math/mix/fun/grad_2F1_test.cpp and ./runTests.py test/unit/math/rev/prob/neg_binomial_2_cdf_log_test.cpp) both showed that the log_t_new thing is going negative (so exp(log_t_new) is going to zero), and so I went and read about the power series and whatnot to try to piece together what was happening. My hunch is the tolerance here is already effectively a relative tolerance.

@sakrejda
Copy link
Contributor Author

oh this is definitely not worth your time to try and figure out the hypergeometric function notation! It is different everywhere and I standardized our notation across the different functions last time I touched them but the reference are still weird! Your comments are the nudge I needed to look at the parts that were bothering me yesterday so thanks!

You got it right that log_t_new is based on the increment (it's a lagged part of each new term in the sum) so I need to actually include g_a1 and g_b1 in the scaling rather than log_t_new. It should still work since each update to those is incrementally smaller (there's a separate function that checks the conditions for the infinite sum converging).

@sakrejda
Copy link
Contributor Author

Thanks again for the review, it really helps especially with these obscure calculations... even if I'm the one who re-wrote them... :/

@sakrejda
Copy link
Contributor Author

Looks like there were some failures from the test system that aren't code-related so I upped precision in one test and kicked it off again...

@sakrejda
Copy link
Contributor Author

BTW: upshot is that bumping the max_steps to 1e6 is still required to get the boundary value down right so the PR is adding relative tolerance to avoid excessive computation and also bumping max_steps to avoid a failure in Stan models. Downstream tests failed (out of memory) so I re-started those.

@sakrejda
Copy link
Contributor Author

Upstream tests fails again, anybody know what to do with this? It's been too long for me.

virtual memory exhausted: Cannot allocate memory
make/tests:158: recipe for target 'test/test-models/good/function-signatures/math/matrix/dims.hpp-test' failed
make: *** [test/test-models/good/function-signatures/math/matrix/dims.hpp-test] Error 1

@rok-cesnovar
Copy link
Member

That is an issue with upstream tests that we are looking into. Will make sure to restart these tests once the problem is fixed. Sorr for the inconvenience.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.16 3.19 0.99 -0.95% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.78% slower
eight_schools/eight_schools.stan 0.12 0.11 1.07 6.89% faster
gp_regr/gp_regr.stan 0.18 0.17 1.02 1.5% faster
irt_2pl/irt_2pl.stan 5.62 5.72 0.98 -1.81% slower
performance.compilation 89.8 87.91 1.02 2.11% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.46 8.45 1.0 0.16% faster
pkpd/one_comp_mm_elim_abs.stan 29.14 28.67 1.02 1.63% faster
sir/sir.stan 132.79 128.0 1.04 3.61% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.01 1.39% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 2.97 1.0 0.14% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.38 1.01 0.89% faster
arK/arK.stan 1.81 1.79 1.01 0.66% faster
arma/arma.stan 0.61 0.74 0.82 -22.22% slower
garch/garch.stan 0.7 0.55 1.26 20.54% faster
Mean result: 1.01466299927

Jenkins Console Log
Blue Ocean
Commit hash: 2f73f27


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@sakrejda sakrejda requested a review from bbbales2 November 2, 2020 02:28
@sakrejda
Copy link
Contributor Author

sakrejda commented Nov 2, 2020

Wow those ARMA/GARCH performance results

@bbbales2
Copy link
Member

bbbales2 commented Nov 2, 2020

Got it on my list for tomorrow!

@bbbales2
Copy link
Member

bbbales2 commented Nov 2, 2020

I think these convergence checks should be on g_a1/g_b1 directly.

I tried out:

    T g_a1_prev = g_a1;
    T g_b1_prev = g_b1;

    g_a1 += log_g_old_sign[0] > 0 ? exp(log_g_old[0]) : -exp(log_g_old[0]);
    g_b1 += log_g_old_sign[1] > 0 ? exp(log_g_old[1]) : -exp(log_g_old[1]);

    if (std::abs(value_of_rec(g_a1) - value_of_rec(g_a1_prev)) <=
        std::max(std::abs(value_of_rec(g_a1)) * precision, precision) &&
        std::abs(value_of_rec(g_b1) - value_of_rec(g_b1_prev)) <=
        std::max(std::abs(value_of_rec(g_b1)) * precision, precision)) {
      std::cout << "k: " << k << std::endl;
      std::cout << "log(fabs(term)): " << log(fabs(term)) << std::endl;
      std::cout << "log_t_new: " << value_of(log_t_new) << std::endl;
      std::cout << "log_g_old[0]: " << value_of(log_g_old[0]) << std::endl;
      std::cout << "log_g_old[1]: " << value_of(log_g_old[1]) << std::endl;
      std::cout << "g_old[0]: " << exp(value_of(log_g_old[0])) << std::endl;
      std::cout << "g_old[1]: " << exp(value_of(log_g_old[1])) << std::endl;
      std::cout << "abs(g_a1) * precision: " << std::setprecision(17) << std::abs(value_of(g_a1)) * precision << std::endl;
      std::cout << "abs(g_b1) * precision: " << std::setprecision(17) << std::abs(value_of(g_b1)) * precision << std::endl;
      std::cout << "precision: " << exp(log_precision) << std::endl;
      std::cout << "g_a1: " << std::setprecision(17) << value_of(g_a1) << std::endl;
      std::cout << "g_b1: " << std::setprecision(17) << value_of(g_b1) << std::endl;
      return;
    }

Running ./runTests.py test/unit/math/prim/fun/grad_2F1_test.cpp and looking at the last test output I get:

k: 141758
log(fabs(term)): 2.4034041754828013
log_t_new: -32.236241280788448
log_g_old[0]: -29.866770317230252
log_g_old[1]: -29.832837105305646
g_old[0]: 1.0691200123013884e-13
g_old[1]: 1.1060212359754558e-13
abs(g_a1) * precision: 2.9369829991912055e-07
abs(g_b1) * precision: 3.0843032119093972e-07
precision: 9.9999999999999874e-15
g_a1: 29369829.991912056
g_b1: -30843032.119093969

The problem is that the thing converging here is g_a1 and g_b1 but the checks were that log_t_new gets under some threshold (t_new < 1e-14).

If I change to the checks above I get:

k: 91430
log(fabs(term)): 2.3629489643518515
log_t_new: -17.372736574697267
log_g_old[0]: -15.04514642131258
log_g_old[1]: -15.009787610345416
g_old[0]: 2.923990315021038e-07
g_old[1]: 3.0292287240870817e-07
abs(g_a1) * precision: 2.9369829990916807e-07
abs(g_b1) * precision: 3.0843032118062863e-07
precision: 9.9999999999999874e-15
g_a1: 29369829.990916807
g_b1: -30843032.118062865

In this case the gradients are off in the 11th digit, but we're stopping the calculation way sooner (in this case when we're incrementing numbers smaller than abs(g_a1) * precision and abs(g_b1) * precision).

@sakrejda
Copy link
Contributor Author

sakrejda commented Nov 3, 2020

I think I agree, I doubt I can get to this today but I'll do it tomorrow (if the world isn't on fire)

@bbbales2
Copy link
Member

Just to bump this along I went ahead and made the changes. Have a look and revert them or change them if they're not good.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.02 2.99 1.01 1.19% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -0.58% slower
eight_schools/eight_schools.stan 0.12 0.11 1.02 1.6% faster
gp_regr/gp_regr.stan 0.17 0.17 1.0 -0.42% slower
irt_2pl/irt_2pl.stan 5.73 5.68 1.01 0.77% faster
performance.compilation 88.26 85.23 1.04 3.44% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.47 8.47 1.0 0.02% faster
pkpd/one_comp_mm_elim_abs.stan 29.73 30.38 0.98 -2.18% slower
sir/sir.stan 137.57 133.11 1.03 3.24% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.01 0.71% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.97 1.0 -0.23% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.4 1.01 0.65% faster
arK/arK.stan 1.79 1.79 1.0 0.04% faster
arma/arma.stan 0.62 0.61 1.03 2.52% faster
garch/garch.stan 0.74 0.74 1.0 0.22% faster
Mean result: 1.0076057049

Jenkins Console Log
Blue Ocean
Commit hash: c3085ad


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@@ -86,13 +86,16 @@ void grad_2F1(T& g_a1, T& g_b1, const T& a1, const T& a2, const T& b1,
log_g_old[1] = log_t_new + log(fabs(term));
log_g_old_sign[1] = term >= 0.0 ? log_t_new_sign : -log_t_new_sign;

T g_a1_prev = g_a1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbbales2 thanks for prodding me along, shouldn't we be storing these as just doubles? Is there any performance penalty involved in storing the type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess since we're just looking at the values we could do value_of_rec. Will push that up in a sec.

@sakrejda
Copy link
Contributor Author

So... that increment you calculate as the convergence condition, that is exp(log_g_old) (well, with the sign, but the magnitude is what should shrink) so I think the original condition on log_g_old was fine (and fewer calculations).

@bbbales2
Copy link
Member

@sakrejda that makes sense. I switched it to logs. The convergence numbers to compare to above are:

k: 91415
log(fabs(term)): 2.3629335185457978
log_t_new: -17.368339948175109
log_g_old[0]: -15.040765796341576
log_g_old[1]: -15.005406429629311
g_old[0]: 2.9368273165263018e-07
g_old[1]: 3.0425294376974053e-07
abs(g_a1) * precision: 2.9369829990912397e-07
abs(g_b1) * precision: 3.0843032118058315e-07
precision: 9.9999999999999874e-15
g_a1: 29369829.990912396
g_b1: -30843032.118058313

@sakrejda
Copy link
Contributor Author

@sakrejda that makes sense. I switched it to logs. The convergence numbers to compare to above are:

k: 91415
log(fabs(term)): 2.3629335185457978
log_t_new: -17.368339948175109
log_g_old[0]: -15.040765796341576
log_g_old[1]: -15.005406429629311
g_old[0]: 2.9368273165263018e-07
g_old[1]: 3.0425294376974053e-07
abs(g_a1) * precision: 2.9369829990912397e-07
abs(g_b1) * precision: 3.0843032118058315e-07
precision: 9.9999999999999874e-15
g_a1: 29369829.990912396
g_b1: -30843032.118058313

I think this is a third thing, I should have time later today to update this to what I think it should be and I'll ask you to check it then, ok? I think we're getting there, it's just a weird calculation...

@bbbales2
Copy link
Member

I'll ask you to check it then, ok?

Sure! Btw I don't think the red X is anything to worry about. Sounds like Jenkins got sad over the weekend. It'll probably fix itself with the next push you make.

@rok-cesnovar
Copy link
Member

I would also merge recent develop in. There were a couple of fixes in the last few days for the random false-positive failures, like #2187.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.67 4.01 0.91 -9.37% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.39% slower
eight_schools/eight_schools.stan 0.11 0.12 0.98 -1.92% slower
gp_regr/gp_regr.stan 0.16 0.17 0.99 -1.25% slower
irt_2pl/irt_2pl.stan 5.62 5.72 0.98 -1.86% slower
performance.compilation 86.67 85.67 1.01 1.15% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.44 8.44 1.0 -0.01% slower
pkpd/one_comp_mm_elim_abs.stan 28.96 29.52 0.98 -1.92% slower
sir/sir.stan 137.3 138.83 0.99 -1.11% slower
gp_regr/gen_gp_data.stan 0.05 0.04 1.02 1.61% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.01 2.96 1.02 1.78% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.37 1.03 2.66% faster
arK/arK.stan 2.47 2.49 0.99 -0.53% slower
arma/arma.stan 0.59 0.6 0.99 -0.78% slower
garch/garch.stan 0.75 0.74 1.0 0.24% faster
Mean result: 0.991629454456

Jenkins Console Log
Blue Ocean
Commit hash: 703b64a


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@sakrejda
Copy link
Contributor Author

@bbbales2 you have a "change request" pending on this but I just read the current code again and I think it does look right now. The tests are passing so I think this is good to merge. Can you take a look again and I'd love to get it merged.

@bbbales2 bbbales2 merged commit 5560224 into develop Nov 27, 2020
@bbbales2
Copy link
Member

@sakrejda in!

@rok-cesnovar rok-cesnovar deleted the feature/issue-2158-grad-2F1-precision branch November 27, 2020 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants