Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove v_vari #2422

Merged
merged 25 commits into from
Apr 1, 2021
Merged

Remove v_vari #2422

merged 25 commits into from
Apr 1, 2021

Conversation

SteveBronder
Copy link
Collaborator

Summary

Just a small PR that removes v_vari from the rev folder and replaces it with make_callback_var()

Tests

Cleanup so no new tests

Side Effects

No side effects but I was wondering about the callbacks that perform NaN checks like

inline var operator-(const var& a) {
  return make_callback_var(-a.val(), [a](const auto vi) mutable {
    if (unlikely(is_nan(a.val()))) {
      a.adj() = NOT_A_NUMBER;
    } else {
      a.adj() -= vi.adj();
    }
  });
}

Have we decided to not do NaN checks on the reverse pass? It feels like we are very inconsistent on this. Personally I'm pro removing them because of the inconsistency.

Release notes

Checklist

  • Math issue #(issue number)

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@t4c1
Copy link
Contributor

t4c1 commented Mar 15, 2021

I kinda agree on NaN checks.

bbbales2
bbbales2 previously approved these changes Mar 15, 2021
Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good if tests pass.

If you want to remove all the NaN logic, happy to scroll back through and check.

The only place that looked weird to me was how we handle maxes/mins with NaNs, but apparently that is undefined behavior (https://stackoverflow.com/questions/55153210/do-stdmin0-0-1-0-and-stdmax0-0-1-0-yield-undefined-behavior) so I guess it probably doesn't matter so much what we do.

return {new precomp_v_vari(NOT_A_NUMBER, a.vi_, NOT_A_NUMBER)};
if (unlikely(is_nan(a.val()))) {
return make_callback_var(a.val(),
[a](auto& vi) mutable { a.adj() = NOT_A_NUMBER; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah definitely ditch this nan check

return;
}
if (is_inf(bvi_->val_)) {
bvi_->adj_ += NOT_A_NUMBER;
bvi_->adj_ = NOT_A_NUMBER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] It looks like the conversion on gamma_p never got finished.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a ctrl+f for += NOT_A_NUMBER and that's why these changed. I'm going to do the vv ones in a future PR

Comment on lines 23 to 24
A.adj() += elt_divide(res.adj(), A.val() * LOG_TEN);
A.adj() += elt_multiply(res.adj(),
elt_divide(1.0, A.val() - square(A.val())));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t4c1 fyi I think this was wrong before? I wrote it to match up to the rev version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems so. How did the tests pass?

Anyway you can simplify this into:
A.adj() += elt_divide(res.adj(), A.val() - square(A.val()));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change and the tests failed for me locally. It looks like the right tests are running in CI. I don't know. This seems like a canary worth investigating, but if nobody can reproduce it then nobody can reproduce it :/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also I switched to the simplified A.adj() += elt_divide(res.adj(), A.val() - square(A.val()));

@SteveBronder
Copy link
Collaborator Author

Ayy @rok-cesnovar @serban-nicusor-toptal it looks like something is going on with the opencl device across a few PRs

@rok-cesnovar
Copy link
Member

Yeah, see #2442

@SteveBronder
Copy link
Collaborator Author

Cool thanks!

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.4 3.38 1.01 0.62% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 1.33% faster
eight_schools/eight_schools.stan 0.11 0.11 1.02 1.69% faster
gp_regr/gp_regr.stan 0.16 0.16 0.97 -3.13% slower
irt_2pl/irt_2pl.stan 5.34 5.37 0.99 -0.54% slower
performance.compilation 91.69 88.87 1.03 3.07% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.73 8.68 1.01 0.51% faster
pkpd/one_comp_mm_elim_abs.stan 29.34 29.32 1.0 0.06% faster
sir/sir.stan 125.38 124.17 1.01 0.96% faster
gp_regr/gen_gp_data.stan 0.03 0.03 0.98 -1.64% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.21 3.0 1.07 6.61% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.4 0.96 -4.39% slower
arK/arK.stan 2.01 1.83 1.09 8.64% faster
arma/arma.stan 0.96 0.64 1.48 32.61% faster
garch/garch.stan 0.51 0.61 0.84 -19.6% slower
Mean result: 1.03170910656

Jenkins Console Log
Blue Ocean
Commit hash: f36c574


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator Author

Besides @t4c1 's comment above this is ready for review!

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple changes and left comments. I'll leave to @t4c1 to approve and merge. I don't understand how the non-working logit got in. The tests seem to work for me now.

@@ -87,7 +87,7 @@ TEST(opencl_context, switch_devices_errors) {
EXPECT_THROW(stan::math::opencl_context.select_device(0, 99999),
std::system_error);
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should probably be turned back on

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh yes good catch

Comment on lines 23 to 24
A.adj() += elt_divide(res.adj(), A.val() * LOG_TEN);
A.adj() += elt_multiply(res.adj(),
elt_divide(1.0, A.val() - square(A.val())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change and the tests failed for me locally. It looks like the right tests are running in CI. I don't know. This seems like a canary worth investigating, but if nobody can reproduce it then nobody can reproduce it :/.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.39 3.36 1.01 0.99% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.56% slower
eight_schools/eight_schools.stan 0.11 0.12 0.95 -4.93% slower
gp_regr/gp_regr.stan 0.16 0.16 1.01 0.6% faster
irt_2pl/irt_2pl.stan 5.41 5.36 1.01 0.98% faster
performance.compilation 91.98 89.31 1.03 2.9% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.64 8.6 1.0 0.42% faster
pkpd/one_comp_mm_elim_abs.stan 31.87 29.47 1.08 7.52% faster
sir/sir.stan 134.49 121.45 1.11 9.7% faster
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -0.69% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 3.1 0.95 -4.77% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.42 0.94 -6.46% slower
arK/arK.stan 2.02 1.84 1.1 8.72% faster
arma/arma.stan 0.64 0.78 0.81 -23.38% slower
garch/garch.stan 0.51 0.65 0.78 -28.58% slower
Mean result: 0.983801236376

Jenkins Console Log
Blue Ocean
Commit hash: c489500


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@t4c1
Copy link
Contributor

t4c1 commented Mar 31, 2021

This looks good, just uncomment the test Ben mentioned.

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grand theft github points

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.36 3.34 1.0 0.46% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.5% slower
eight_schools/eight_schools.stan 0.11 0.11 0.97 -2.61% slower
gp_regr/gp_regr.stan 0.16 0.16 1.04 3.45% faster
irt_2pl/irt_2pl.stan 5.37 5.37 1.0 0.11% faster
performance.compilation 91.62 89.27 1.03 2.56% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.6 8.75 0.98 -1.67% slower
pkpd/one_comp_mm_elim_abs.stan 31.3 29.04 1.08 7.23% faster
sir/sir.stan 130.76 122.24 1.07 6.51% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.03 2.44% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.03 3.06 0.99 -0.98% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.38 1.0 -0.21% slower
arK/arK.stan 1.99 1.86 1.07 6.34% faster
arma/arma.stan 0.64 0.77 0.83 -19.99% slower
garch/garch.stan 0.51 0.64 0.8 -24.26% slower
Mean result: 0.991206904323

Jenkins Console Log
Blue Ocean
Commit hash: 16e2681


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2 bbbales2 merged commit 818be4a into develop Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants