Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get sd, log_softmax, and log_sum_exp fully var<mat> compatible (Issue #2101) #2169

Merged
merged 23 commits into from
Nov 29, 2020

Conversation

bbbales2
Copy link
Member

@bbbales2 bbbales2 commented Oct 27, 2020

Summary

This makes sd, log_sum_exp, and log_softmax fully var<mat> compatible (and apply_vector_unary in the process).

Release notes

Updated sd, log_softmax, and log_sum_exp to work fully with var<mat>

Checklist

  • Math issue Make functions with custom autodiff var<mat> friendly #2101

  • Copyright holder: Columbia University

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@bbbales2 bbbales2 marked this pull request as draft October 27, 2020 20:03
Copy link
Member Author

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions & comments

@@ -19,7 +20,8 @@ namespace stan {
*/
template <typename Container>
using is_container = bool_constant<
math::disjunction<is_eigen<Container>, is_std_vector<Container>>::value>;
math::disjunction<is_eigen<Container>, is_std_vector<Container>,
is_var_matrix<Container>>::value>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A var<mat> should count as a container, right? This might break template logic elsewhere but we can fix it elsewhere.

@SteveBronder
Copy link
Collaborator

I'll take a look at this in the morning

@bbbales2
Copy link
Member Author

bbbales2 commented Nov 2, 2020

Ping

Copy link
Member Author

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveBronder this is ready to look at. Not 100% there but I have a couple questions. One we get those ironed out it will make it easy to finish mat<var> ing log_sum_exp and log_softmax.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple quick comments

alpha.adj().noalias()
+= res.adj_
- (res.adj_.sum() * res.val_.array().exp()).matrix();
});
Copy link
Member Author

@bbbales2 bbbales2 Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t4c1, @SteveBronder writing this code I wanted a make_callback_var to wrap make_callback_vari. The argument pass to the functor can still be the vari to avoid the pointer chasing.

I also want .adj() and val() on vari_value.

That all sound okay to add? (Edit: if so I'll just do it here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want make_callback_var to avoid type algebra. In this case the input and output types will match, in other cases they won't and I would end up writing:

using ret_type = decltype((theta.array() - log(theta.exp().sum()));
return var_value<ret_type>(...);

require_std_vector_vt<is_matrix, Type>* = nullptr>
void check_return_type(const ReturnType& ret, const Type& x) {
if (ret.size() > 0 && x.size() > 0)
check_return_type(ret[0], x[0]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not be offended if you want me to take a closer look at check_return_type.

I think the logic we want is:

  1. If there are only var_value<double> s on the input, then there should only be var_value<double> s on the output
  2. If there are var_value<not double> s on the input, then there should be no var_value<double> s on the output

I think this would need an extra template program var_value_t to extract the var_value from a generic input type and I am too lazy to write it today.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is fine here unless I'm missing something. This is a specialization for std::vector<T> that just checks that the inner type for the input/output is correct. If that's what this does then I think it's fine

@bbbales2 bbbales2 marked this pull request as ready for review November 19, 2020 20:58
@bbbales2 bbbales2 changed the title Initial commit to get sd var<mat> compatible (Issue #2101) Get sd, log_softmax, and log_sum_exp fully var<mat> compatible (Issue #2101) Nov 19, 2020
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple little things to change and then this looks good.

Comment on lines 107 to 112
return make_callback_vari(
(theta.array() - log(theta.exp().sum())).matrix(),
[alpha](const auto& res) mutable {
alpha.adj().noalias()
+= res.adj_ - (res.adj_.sum() * res.val_.array().exp()).matrix();
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[side note] It would be nice to have a make_callback_var so we could still use .adj() and .val() etc. Mostly a slice of life feature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this

T log_softmax_impl(const T& alpha) {
check_nonzero_size("log_softmax", "alpha", alpha);

const auto& theta = to_ref(alpha.val().array() - alpha.val().maxCoeff());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to_ref() here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want it to evaluate to a temporary -- switched to an .eval().

Comment on lines 9 to 16
/**
* Specialisation for use with var_value<T> types where T inherits from
* EigenBase. Inputs are mapped to Eigen column vectors.
*
* The returned scalar type is deduced to allow for cases where the input and
* return scalar types differ (e.g., functions implicitly promoting
* integers).
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check these docs

Comment on lines +1604 to +1618
template <typename ResultMatVar, typename ResultVarMat, typename MatVar,
typename VarMat,
require_std_vector_vt<is_var, ResultMatVar>* = nullptr,
require_std_vector_vt<is_var, ResultVarMat>* = nullptr>
inline void test_matvar_gradient(const ad_tolerances& tols,
ResultMatVar& A_mv_f, ResultVarMat& A_vm_f,
const MatVar& A_mv, const VarMat& A_vm) {
for (size_t i = 0; i < A_vm_f.size(); ++i) {
A_vm_f[i].adj() = 1;
A_mv_f[i].adj() = 1;
stan::math::grad();
expect_near_rel_var("var<Matrix> vs Matrix<var> input", A_vm, A_mv, tols);
stan::math::set_zero_all_adjoints();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing me, the requires look like it's for std::vector<var>? I think that's the only way that A_vm_f[i].adj() = 1; would work because you can't assign a constant to an entire eigen matrix like that

https://godbolt.org/z/8KnPr1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this for when a var<mat> function would return a std::vector<var>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for std::vector<var> outputs, not std::vector<var<mat>>

require_std_vector_vt<is_matrix, Type>* = nullptr>
void check_return_type(const ReturnType& ret, const Type& x) {
if (ret.size() > 0 && x.size() > 0)
check_return_type(ret[0], x[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is fine here unless I'm missing something. This is a specialization for std::vector<T> that just checks that the inner type for the input/output is correct. If that's what this does then I think it's fine

@andrjohns
Copy link
Collaborator

Quick Q - Why the addition of _impl functions here? I thought I had the apply_vector_unary implementations working, or am I remembering something else?

@bbbales2
Copy link
Member Author

@andrjohns it's separate implementations for var<mat> and mat<var> types. Does that make sense?

I could make them sd overloads and then specialize the apply_vector_unary when the input is a std::vector. Now that I say that outloud I kinda like it better, though it subverts some of the apply_vector_unary pattern. Also now that you mention the _impl s I guess some of the lambdas are defunct now.

@andrjohns
Copy link
Collaborator

Is the separate implementations because mat<var> isn't compatible with the calback_vari approach or because of apply_vector_unary?

Found the branch I was thinking of, I thought I had both var<mat> and mat<var> passing tests on this branch: develop...andrjohns:feature/issue-2098-vec_unary_var_mat

@andrjohns
Copy link
Collaborator

Also, sorry if this is retreading obvious stuff with var<mat>, still catching up!

@bbbales2
Copy link
Member Author

@andrjohns yeah this pull has those changes too (so that apply_vector_unary works with var<mat>). At least I hope they're the same only very roughly checking. I didn't realize you had a branch or I would have used that -- apologies for the duplication.

It's possible to do the mat<var> and var<mat> implementations in the same code, but two things have made me stop trying to do this:

  1. We got gridlocked before trying to carefully benchmark mat<var> at the expense of ever getting var<mat> stuff in, so I'm just avoiding the mat<var> implementations now (it's easy to accidentally slow down the existing code by 10% and hard to get it benchmarked and fixed).

  2. mat<var> and var<mat>, even when we do them both with reverse_pass_callback they end up looking slightly different. (.val() isn't a problem for var<mat>, but with mat<var> you want to only do it once cuz it's slow)

@andrjohns
Copy link
Collaborator

Ah that all makes sense thanks

@bbbales2
Copy link
Member Author

@SteveBronder I made a bunch of changes:

  1. Added val(), adj() to all the varis. Let me know if you want this moved to a different pull or you want me to add tests. I just replaced all the .val_ and .adj_ calls in the current tests with .val() and .adj().

  2. Added make_callback_var

  3. I got rid of the _impl s and instead lined those up as overloads to go along with apply_vector_unary. So for each of sd, log_softmax, and log_sum_exp there's one version of the function that handles mat<var>, one version that handles var<mat>, and then an apply_vector_unary version that handles std::vector<T> (if T is a std::vector<var> apply_vector_unary changes it into an Eigen::Map). @andrjohns feel free to comment on this if you want.

SteveBronder
SteveBronder previously approved these changes Nov 24, 2020
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me! You have to fix up the docs but then ping me and I'll approve

Comment on lines +76 to +78
auto arena_diff = to_arena((x.val().array() - x.val().mean()).matrix());
double sum_of_squares = arena_diff.squaredNorm();
double sd = std::sqrt(sum_of_squares / (x.size() - 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] You could do the little loop thing here to make this faster but it's fine as is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to fix up the docs

Do you mean the doxygen docs or the function reference docs (second definitely needs updated and I wouldn't doubt the first lol)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh doxygen my bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveBronder I don't know what was going on with the docs. I changed all the variables to be named x to get them to work. That was a real hairtugger

@bbbales2
Copy link
Member Author

Woof, I had to add .val() and .adj() (and the _op) accessors to the opencl stuff. I'm firmly into I-don't-know-what-I'm-doing territory, so in the likely case this fails, I think I'll just revert the .val() and .adj() stuff.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.63 3.59 1.01 1.24% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.04 3.95% faster
eight_schools/eight_schools.stan 0.12 0.11 1.04 3.47% faster
gp_regr/gp_regr.stan 0.16 0.17 0.98 -1.54% slower
irt_2pl/irt_2pl.stan 5.65 5.7 0.99 -0.97% slower
performance.compilation 86.93 85.73 1.01 1.39% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.43 8.54 0.99 -1.21% slower
pkpd/one_comp_mm_elim_abs.stan 28.8 29.32 0.98 -1.8% slower
sir/sir.stan 134.89 135.22 1.0 -0.25% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 -0.12% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.97 0.99 -0.68% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.37 1.01 0.53% faster
arK/arK.stan 2.48 2.51 0.99 -1.12% slower
arma/arma.stan 0.61 0.6 1.01 1.42% faster
garch/garch.stan 0.74 0.74 1.0 0.04% faster
Mean result: 1.00318155097

Jenkins Console Log
Blue Ocean
Commit hash: 422c29a


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2 bbbales2 merged commit 634fc54 into develop Nov 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants