Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make varmat functions work with ldlt (Issue #2101) #2266

Merged
merged 54 commits into from
Jan 12, 2021

Conversation

bbbales2
Copy link
Member

@bbbales2 bbbales2 commented Dec 18, 2020

Summary

Second attempt at making varmat work with ldlt

Side Effects

I removed LDLT_alloc and replaced LDLT_factor with an API incompatible replacement.

So this is a major version bump for Math right?

Release notes

  • Add varmat implementation of *_ldlt_* functions

Checklist

  • Math issue Make functions with custom autodiff var<mat> friendly #2101

  • Copyright holder: Columbia University

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@@ -64,8 +55,8 @@ TEST(mathMixMatFun, traceGenInvQuadForm) {

// exception tests
// non-square second arg
Eigen::MatrixXd a34(3, 4);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't take an LDLT of a non-square matrix, so this fails before it gets to trace_gen_inv_quad_form_ldlt (which takes an LDLT as an argument)

@bbbales2
Copy link
Member Author

I ended up replacing the ldlt autodiff that was there, but it's still not great. There new LDLT_factor is incompatible with the old one. We should still probably remove this eventually. The new version is cleaner and simpler, but it's a kinda-sometimes autodiff object that doesn't really behave like the other autodiff things we have (that is why it takes a template parameter which says whether to allocate it in the arena or not).

If we're gonna have LDLT factors as autodiff variables, they need to be vari_value<LDLT> types or something and follow the rules (and similarly the functions that consume them need to not make any extra assumptions about the lifetime of objects, which they currently do). In the interim, we have this.

Still need add a three argument expect_ad_matvar so that trace_gen_inv_quad_form can be tested with varmat types.

Will also get some benchmarks to make sure the matvar stuff isn't slower. When I coded this stuff up before with reverse_pass_callback I remeber the existing stuff was pretty fast.

@bbbales2
Copy link
Member Author

bbbales2 commented Dec 21, 2020

Here's benchmarks:

ldlt_timings

And here's the timing results of the one where performance gets bad. Not sure what happened there. I didn't spend time looking into it:

ldlt_trace_inv

Edit: Benchmark code is basically the stuff here and here running with matvars. I had to modify them a bit for the new thing (to use stan::math::make_ldlt_factor instead of the LDLT_factor constructor), but that was it.

@bbbales2
Copy link
Member Author

bbbales2 commented Jan 1, 2021

This needs #2280 in before it is ready to go (there's a three argument function in this)

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.33 3.38 0.98 -1.62% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.91 -9.64% slower
eight_schools/eight_schools.stan 0.11 0.11 0.98 -1.98% slower
gp_regr/gp_regr.stan 0.15 0.15 1.01 1.24% faster
irt_2pl/irt_2pl.stan 5.22 5.21 1.0 0.3% faster
performance.compilation 91.96 89.8 1.02 2.35% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.65 8.63 1.0 0.17% faster
pkpd/one_comp_mm_elim_abs.stan 29.16 30.16 0.97 -3.42% slower
sir/sir.stan 133.61 137.78 0.97 -3.12% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.01% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.04 3.05 1.0 -0.21% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.37 1.07 6.33% faster
arK/arK.stan 2.54 2.53 1.01 0.51% faster
arma/arma.stan 0.61 0.61 1.01 0.52% faster
garch/garch.stan 0.67 0.67 1.0 -0.38% slower
Mean result: 0.995121488597

Jenkins Console Log
Blue Ocean
Commit hash: 294b7fa


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.37 3.41 0.99 -1.32% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.2% slower
eight_schools/eight_schools.stan 0.11 0.12 0.95 -5.35% slower
gp_regr/gp_regr.stan 0.16 0.15 1.03 2.87% faster
irt_2pl/irt_2pl.stan 5.16 5.19 1.0 -0.41% slower
performance.compilation 92.48 90.04 1.03 2.64% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.64 8.62 1.0 0.26% faster
pkpd/one_comp_mm_elim_abs.stan 30.07 28.28 1.06 5.96% faster
sir/sir.stan 135.68 135.11 1.0 0.42% faster
gp_regr/gen_gp_data.stan 0.05 0.04 1.09 8.44% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.08 3.07 1.0 0.26% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.4 0.92 -8.76% slower
arK/arK.stan 2.52 2.51 1.0 0.04% faster
arma/arma.stan 0.61 0.61 1.0 -0.48% slower
garch/garch.stan 0.67 0.68 0.98 -1.94% slower
Mean result: 1.00125016541

Jenkins Console Log
Blue Ocean
Commit hash: 294b7fa


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member Author

bbbales2 commented Jan 5, 2021

@SteveBronder is is ready to review now that the three argument tests are in.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.46 3.37 1.03 2.57% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.16% slower
eight_schools/eight_schools.stan 0.11 0.11 1.01 0.81% faster
gp_regr/gp_regr.stan 0.16 0.15 1.06 5.34% faster
irt_2pl/irt_2pl.stan 5.22 5.2 1.01 0.52% faster
performance.compilation 91.14 89.74 1.02 1.54% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.64 8.66 1.0 -0.19% slower
pkpd/one_comp_mm_elim_abs.stan 27.49 27.75 0.99 -0.94% slower
sir/sir.stan 143.89 131.45 1.09 8.65% faster
gp_regr/gen_gp_data.stan 0.05 0.04 1.05 4.74% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.07 3.07 1.0 -0.25% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.43 0.38 1.14 12.35% faster
arK/arK.stan 2.53 2.54 0.99 -0.61% slower
arma/arma.stan 0.61 0.61 1.01 0.8% faster
garch/garch.stan 0.67 0.68 0.98 -1.82% slower
Mean result: 1.02247109706

Jenkins Console Log
Blue Ocean
Commit hash: dbff4ad


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.34 3.33 1.0 0.46% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 0.67% faster
eight_schools/eight_schools.stan 0.11 0.12 0.97 -3.48% slower
gp_regr/gp_regr.stan 0.15 0.16 0.99 -1.21% slower
irt_2pl/irt_2pl.stan 5.24 5.14 1.02 1.88% faster
performance.compilation 91.48 89.67 1.02 1.97% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.66 8.7 0.99 -0.5% slower
pkpd/one_comp_mm_elim_abs.stan 28.54 29.48 0.97 -3.3% slower
sir/sir.stan 133.37 132.02 1.01 1.02% faster
gp_regr/gen_gp_data.stan 0.04 0.04 0.98 -1.54% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.05 3.04 1.0 0.09% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.38 0.97 -3.5% slower
arK/arK.stan 2.54 2.51 1.01 1.16% faster
arma/arma.stan 0.6 0.63 0.96 -4.12% slower
garch/garch.stan 0.67 0.69 0.97 -2.75% slower
Mean result: 0.991710391681

Jenkins Console Log
Blue Ocean
Commit hash: 4c59678


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So reviewing this I didn't like that we had to make multiple copies as we passed around the ldlt factor. Trying to sort this out I think it was just easier to make a little review branch with some changes

feature/varmat-ldlt2...review/feature/varmat-ldlt2#diff-8eb184dd0a9220fb56a0aea61b945cfea318ecfbfe8f25fd7ac6050c5f19b724R14

There's only one thing I strongly don't like about the changes I made which is the forward deceleration of make_chainable_ptr() in prim. We could flip this around and just declare the method for chainable_ldlt_ptr() in prim

// in prim's LDLT_factor
inline auto* chainable_ldlt_ptr();

and then in rev actually define chainable_ldlt_ptr() for the prim version.

Though if you don't like what I have above then I think then you can ignore that and just use the other review comments

Comment on lines +37 to 39
return A.ldlt().solve(
Eigen::Matrix<return_type_t<T, EigMat>, EigMat::RowsAtCompileTime,
EigMat::ColsAtCompileTime>(b));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] if b is an rvalue forwarding could be better here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'm gonna leave it as is though. This is some weird type promotion thing so the Eigen solves work. Like if T and EigMat don't match, it will make them match.

If we added another overload for when the scalar types are the same it would handle this. I grepped for where this function is used (it's an internal function not exposed to the Stan langage) and I don't think it's actually used anywhere (mdivide_right_ldlt uses this, but that is unused, and again that is an internal function), so I'm not gonna worry about this (though it would take about 5s to implement -- just seems like more code and complexity some future person has to decode).

@bbbales2
Copy link
Member Author

Changes look good. I'll merge tomorrow. The arena_t thing on the var LDLT is a good catch. Thanks!

@bbbales2
Copy link
Member Author

I moved make_chainable_ptr back out of LDLT_factor. It was kinda awkward to make it work with the prim stuff. The only place we do this is mdivide_left_ldlt and I grepped around and I think everywhere we do this there will only be one copy anyway.

This is the grep, and trace_inv_quad_form_ldlt and trace_gen_inv_quad_form_ldlt are only used in one place each as well:

stan/math/prim/fun/trace_inv_quad_form_ldlt.hpp:  return B.cwiseProduct(mdivide_left_ldlt(A, B)).sum();
stan/math/prim/fun/trace_gen_inv_quad_form_ldlt.hpp:  return multiply(B, D.transpose()).cwiseProduct(mdivide_left_ldlt(A, B)).sum();
stan/math/prim/prob/wishart_lpdf.hpp:        mdivide_left_ldlt(ldlt_S, W_ref));
stan/math/prim/prob/inv_wishart_lpdf.hpp:        mdivide_left_ldlt(ldlt_W, S_ref));

@bbbales2
Copy link
Member Author

@SteveBronder should be ready for review again

SteveBronder
SteveBronder previously approved these changes Jan 11, 2021
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.39 3.37 1.01 0.54% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -0.58% slower
eight_schools/eight_schools.stan 0.11 0.11 1.02 2.06% faster
gp_regr/gp_regr.stan 0.15 0.15 1.0 -0.32% slower
irt_2pl/irt_2pl.stan 5.22 5.19 1.0 0.45% faster
performance.compilation 91.33 89.79 1.02 1.68% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.64 8.63 1.0 0.21% faster
pkpd/one_comp_mm_elim_abs.stan 29.21 29.17 1.0 0.13% faster
sir/sir.stan 134.85 137.58 0.98 -2.02% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.35% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.05 3.05 1.0 -0.01% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.38 0.98 -2.14% slower
arK/arK.stan 2.51 2.53 0.99 -0.98% slower
arma/arma.stan 0.6 0.6 1.0 0.15% faster
garch/garch.stan 0.57 0.57 1.0 -0.13% slower
Mean result: 0.999713098543

Jenkins Console Log
Blue Ocean
Commit hash: 8366815


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member Author

@rok-cesnovar this was approved but then my test restart commit dismissed the review. Can you approve and merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants